Overview
Lectr is built around one hard constraint: the request path is sacred. Nothing in the observability or control layer may degrade latency, reliability, or correctness for requests passing through. This constraint produces a clear architectural boundary between two paths:- Hot path — the synchronous request/response cycle. No database. No blocking. Sub-millisecond overhead.
- Cold path — async workers that process event data after requests complete. Observability, anomaly detection, recommendations. Never touches active requests.
Components
Proxy (Go)
The core of Lectr. A Go HTTP server that handles all incoming requests toproxy.lectr.ai/v1.
Responsibilities:
- Org authentication via
X-Lectr-Key - Provider detection from model name and
X-Lectr-Providerheader - Rule evaluation against the in-memory rule cache
- Auth header normalisation per provider
- Request forwarding to provider upstream
- Streaming response passthrough via
http.Flusher - Event emission to the async channel
In-memory caches
Several fast lookup structures live in memory and are loaded at startup: Org cache — validatesX-Lectr-Key values against hashed keys. Reloaded
on key rotation.
Rule cache — holds all enabled routing rules grouped by org ID, sorted by
priority. Reloaded immediately whenever rules are created, updated, deleted,
or toggled via the dashboard. A read lock is acquired per request — no
contention under normal load.
Pricing cache — model pricing data per provider, used for cost estimation.
Loaded at startup, refreshed every 5 minutes.
Provider registry — static map of model names to provider IDs and upstream
URLs. Updated on deploy when new models are added.
Azure config cache — per-org Azure endpoint configuration. Loaded at startup,
reloaded when org provider config changes.
None of these caches make database calls on the request path. Cache misses
result in passthrough or default behaviour — never errors.
Event channel
A buffered in-memory channel that decouples request handling from observability. After a request completes, the proxy emits an event onto the channel and returns. No waiting. No blocking. The event carries all the metadata captured during the request — model, latency, tokens, cost, tags, rule applied. If the channel is full (worker is behind), the event is dropped. The proxy continues unaffected. The dashboard shows a degradation banner when drop rate is non-zero. The channel is the firewall between the hot path and everything else.Event worker
A background goroutine that reads from the event channel and writes to Postgres in batches. Batch writes reduce DB pressure — events are flushed every 5 seconds or 100 events, whichever comes first. For streaming requests, the event worker also runs the tokeniser on the accumulated response buffer to produce a measured token count before writing. If Postgres is unavailable, the worker logs the failure and continues. Events are dropped — the proxy is unaffected.Postgres
The persistent store for all event data. Storesrequest_logs — one row per
request — with all captured metadata.
Key tables:
request_logs— raw event data, one row per requestdashboard_summaries— pre-computed stats per org per periodrouting_rules— org routing rulesanomalies— detected anomaliesrecommendations— model recommendationsmodel_pricing— versioned pricing data per provider per modelmodel_downgrades— downgrade paths per provider per modelorg_invites— pending team invitationsorganizations— org settings including P95 threshold
Dashboard aggregator
A background worker that runs every 60 seconds. Reads fromrequest_logs and
pre-computes summary statistics per org per period (last_24h, last_7d,
last_30d) into dashboard_summaries.
The dashboard API reads from dashboard_summaries — a single row lookup per
org per period. It never runs aggregation queries against request_logs directly.
This keeps dashboard load times fast and consistent regardless of how much traffic
has flowed through the proxy.
Anomaly detector
A background worker that runs every 5 minutes. For each org:- Queries
request_logsfor the 7-day rolling window - Computes mean and standard deviation per metric per feature
- Compares current values against baseline
- Records anomalies when deviation exceeds 2 standard deviations
- Sends email alerts for newly detected anomalies
Recommendation engine
A background worker that runs daily. For each org, for each feature with 500+ requests in the past 30 days:- Queries aggregated signals — prompt length, completion length, error rate, latency, cost, dominant task type
- Checks task type first (high-confidence signal)
- Falls back to heuristic signals if no task type present
- Generates recommendations with confidence levels (
high,medium) - Determines downgrade aggressiveness from task type and confidence
- Looks up downgrade target from the
model_downgradescache - Upserts recommendations — one per feature per current model
Dashboard (Next.js / React + TypeScript)
A read-only web application atapp.lectr.ai. Authenticated via Auth0.
All data comes from the dashboard API (api.lectr.ai).
The dashboard never writes to request_logs directly. It reads from:
dashboard_summariesfor stat cards and overview metricsrequest_logsfor the requests table and recent failures (paginated)anomaliesfor the anomaly feedrecommendationsfor model recommendation cardsrouting_rulesfor the rules management page
Management API
REST endpoints atapi.lectr.ai for dashboard data and org management.
Protected by Auth0 session tokens for dashboard users. Protected by
LECTR_ADMIN_SECRET for operator-level org management.
Startup sequence
Shutdown sequence
Scaling considerations
Single instance (current): Rate limiting is per-instance only. A single proxy instance handles all traffic for all orgs. This is the correct default for the current scale. Horizontal scaling (future): The hot path is stateless except for in-memory caches. Adding instances requires:- Distributed rate limiting (Redis or similar)
- Cache invalidation propagation across instances (rule changes must reach all instances)
- Sticky routing is not required — any instance can handle any request
request_logs grows unboundedly. Partitioning by created_at and a TTL/archival
strategy are the correct long-term approaches. The dashboard aggregator’s
pre-computed summaries mean the dashboard remains fast even as raw event data grows.
Next
Request Lifecycle
Step-by-step walkthrough of what happens to each request.
Security & Trust
What Lectr sees, stores, and never touches.