MazeLabs Engine Architecture

From noisy operations data to replayable incident simulations.

MazeLabs turns raw logs, tickets, RCAs, runbooks, and system behavior into compressed evidence, masked AI context, deterministic simulation state, actor-driven war rooms, and measurable readiness scores.

Raw Inputs

Vault

Compressor

Masking

EvidenceTree

Simulation

War Room

Scoring

STEP 1

Ingest real operational context.

MazeLabs starts with the operational data teams already have: runbooks, SOPs, RCAs, incident tickets, CloudWatch logs, application logs, topology notes, alert payloads, and historical troubleshooting records.

Runbooks & SOPs

RCA Documents

CloudWatch Logs

App Error Traces

Incident Tickets

Alert Payloads

ERROR payment-api timeout upstream=orders-db duration=8200ms

INFO request_id=1001 status=200 path=/health

INFO request_id=1002 status=200 path=/health

ERROR payment-api timeout upstream=orders-db duration=8230ms

INFO request_id=1004 status=200 path=/health

WARN replica_lag seconds=23 cluster=orders-prod

ERROR payment-api timeout upstream=orders-db duration=8260ms

INFO request_id=1007 status=200 path=/health

INFO request_id=1008 status=200 path=/health

ERROR payment-api timeout upstream=orders-db duration=8290ms

WARN replica_lag seconds=28 cluster=orders-prod

INFO request_id=1011 status=200 path=/health

ERROR payment-api timeout upstream=orders-db duration=8320ms

INFO request_id=1013 status=200 path=/health

INFO request_id=1014 status=200 path=/health

ERROR payment-api timeout upstream=orders-db duration=8350ms

INFO request_id=1016 status=200 path=/health

INFO request_id=1017 status=200 path=/health

ERROR payment-api timeout upstream=orders-db duration=8380ms

INFO request_id=1019 status=200 path=/health

SECURE VAULT

EVIDENCE EXTRACT

STEP 2

Raw data stays in the vault.

Raw operational documents and logs are stored inside a controlled MazeLabs vault. The vault keeps source material, metadata, provenance, and extracted evidence separate.

The AI layer does not need unrestricted access to raw sensitive documents.

Tracks source provenance
Separates raw data from AI-consumable evidence
Keeps sensitive operational context controlled

STEP 3

Compress telemetry into operational evidence.

Generic LLMs struggle with high-volume logs because they are repetitive, noisy, and full of sensitive identifiers. MazeLabs uses a Log Compressor to turn raw telemetry into structured incident signals.

Raw logs are not the product. Compressed operational evidence is the product.

BEFORE10,000 lines

[ERROR] 10:04:00 db_conn timeout db-01a req_id=8821 ip=10.4.2.0

[ERROR] 10:04:01 db_conn timeout db-01a req_id=8822 ip=10.4.2.1

[ERROR] 10:04:02 db_conn timeout db-01a req_id=8823 ip=10.4.2.2

[ERROR] 11:04:03 db_conn timeout db-01a req_id=8824 ip=10.4.2.3

[ERROR] 11:04:04 db_conn timeout db-01a req_id=8825 ip=10.4.2.4

[ERROR] 11:04:05 db_conn timeout db-01a req_id=8826 ip=10.4.2.5

[ERROR] 12:04:06 db_conn timeout db-01a req_id=8827 ip=10.4.2.6

[ERROR] 12:04:07 db_conn timeout db-01a req_id=8828 ip=10.4.2.7

[ERROR] 12:04:08 db_conn timeout db-01a req_id=8829 ip=10.4.2.8

[ERROR] 13:04:09 db_conn timeout db-01a req_id=8830 ip=10.4.2.9

[ERROR] 13:04:10 db_conn timeout db-01a req_id=8831 ip=10.4.2.10

[ERROR] 13:04:11 db_conn timeout db-01a req_id=8832 ip=10.4.2.11

[ERROR] 14:04:12 db_conn timeout db-01a req_id=8833 ip=10.4.2.12

[ERROR] 14:04:13 db_conn timeout db-01a req_id=8834 ip=10.4.2.13

[ERROR] 14:04:14 db_conn timeout db-01a req_id=8835 ip=10.4.2.14

AFTER: EVIDENCE PACK8 signals

CLUSTER: TIMEOUT_SPIKE

Database Connection Timeouts

Entity: db-01a

Window: 10:04:01 - 10:04:15

Events: 14,000+

Confidence: 0.95

CLUSTER: CASCADING_IMPACT

API 500 Responses

STEP 4

Mask sensitive context before AI reasoning.

Before evidence is passed to AI agents or model providers, MazeLabs applies masking and redaction policies. The AI does not need the raw secret to understand that a timeout occurred.

RAW LOG (VAULT ONLY)

customer_id=918273 user_email=alice@company.com host=prod-db-17 ip=10.2.4.8 token=sk_live_xxx

MASKED EVIDENCE (AI VISIBLE)

customer_id=[CUSTOMER_ID] user_email=[EMAIL] host=[HOST] ip=[PRIVATE_IP] token=[SECRET]

STEP 5 & 6

Build an EvidenceTree from messy inputs.

MazeLabs combines compressed telemetry, incident tickets, RCAs, and runbooks into an EvidenceTree. This graph organizes the incident into symptoms, signals, checks, hypotheses, and valid actions.

This makes simulations replayable and auditable. Instead of relying on uncontrolled LLM memory, the runtime uses structured evidence packs and known state transitions.

1 Ticket → 20 Scenario Variants

Incident: API Latency

Symptom: 5xx Spike

Signal: Timeout

Hypothesis: DB Saturation

Action: Failover

STEP 7 & 8

Realistic actor behavior driven by state.

Simulations include actors like SREs, DBREs, and Stakeholders. They do not randomly hallucinate; their behavior is tightly bounded by scenario state, hidden evidence, and time pressure.

Adversarial Reviewer

Challenges weak assumptions and unsafe actions during the incident.

"You are treating symptoms without validating the database signal. Check replica lag first."

Customer Stakeholder

Brings business impact urgency when time elapsed exceeds thresholds.

"Customer checkouts are still failing. What is the recovery ETA? We need an update."

STEP 9, 10 & 11

P1 War Room, Scoring & Debrief.

The War Room is an operational decision surface, not just a chat screen. Every action is evaluated, scored, and mapped to a replayable timeline for post-incident learning.

Agent-Driven P1 War Room

Presents evidence, transcript, stakeholder pressure, and action ledgers in one surface.

Readiness Scoring

Investigation: +12
Ignored Signal: -8
Unsafe Failover: -15
Valid Recovery: +10

Debrief & Replay

Generates a structured debrief showing what evidence was missed, correct paths, and team capability gaps.

Why this architecture matters.

Most teams already have the knowledge needed to train better incident responders. Generic LLMs cannot safely reason over raw context. MazeLabs creates a controlled middle layer to compress, mask, and structure evidence before simulation.

This end-to-end engine turns existing operational history into reusable readiness infrastructure.

1. Ingest

Runbooks, RCAs, tickets, CloudWatch logs, application logs, topology notes

2. Vault

Store raw operational context under customer control

3. Compress

Deduplicate logs, cluster errors, extract entities, build timelines, rank signals

4. Mask

Redact secrets, IPs, hostnames, customer IDs, emails, tokens, internal URLs

5. Evidence

Build source-linked EvidenceTrees and bounded EvidencePacks

6. Simulate

Create scenario phases, hidden evidence, valid checks, wrong paths, recovery criteria

7. Actors

Drive incident commander, SRE, DBRE, support, customer, coach, and reviewer behavior from scenario state

8. Bridge

Run P1/P0 incident drills with transcript, evidence, action ledger, stakeholder pressure, and timeline

9. Score

Measure evidence use, decisions, communication, escalation, recovery, and validation

10. Debrief

Replay missed signals, wrong assumptions, recovery quality, and follow-up labs

Turn your incident history into a simulation engine.

MazeLabs converts the logs, tickets, runbooks, and RCAs you already have into private, evidence-driven war-room simulations.

Request Early Access Explore Features