Architecture - Lectr Docs

Overview

Lectr is built around one hard constraint: the request path is sacred. Nothing in the observability or control layer may degrade latency, reliability, or correctness for requests passing through. This constraint produces a clear architectural boundary between two paths:

Hot path — the synchronous request/response cycle. No database. No blocking. Sub-millisecond overhead.
Cold path — async workers that process event data after requests complete. Observability, anomaly detection, recommendations. Never touches active requests.

The proxy handles Auth · Detect · Route · Log on the hot path only — no database calls.

Components

Proxy (Go)

The core of Lectr. A Go HTTP server that handles all incoming requests to proxy.lectr.ai/v1. Responsibilities:

Org authentication via X-Lectr-Key
Provider detection from model name and X-Lectr-Provider header
Rule evaluation against the in-memory rule cache
Auth header normalisation per provider
Request forwarding to provider upstream
Streaming response passthrough via http.Flusher
Event emission to the async channel

The proxy never makes a database call during request handling. All lookups (org validation, rule matching, provider routing) are served from in-memory structures loaded at startup.

In-memory caches

Several fast lookup structures live in memory and are loaded at startup: Org cache — validates X-Lectr-Key values against hashed keys. Reloaded on key rotation. Rule cache — holds all enabled routing rules grouped by org ID, sorted by priority. Reloaded immediately whenever rules are created, updated, deleted, or toggled via the dashboard. A read lock is acquired per request — no contention under normal load. Pricing cache — model pricing data per provider, used for cost estimation. Loaded at startup, refreshed every 5 minutes. Provider registry — static map of model names to provider IDs and upstream URLs. Updated on deploy when new models are added. Azure config cache — per-org Azure endpoint configuration. Loaded at startup, reloaded when org provider config changes. None of these caches make database calls on the request path. Cache misses result in passthrough or default behaviour — never errors.

Event channel

A buffered in-memory channel that decouples request handling from observability. After a request completes, the proxy emits an event onto the channel and returns. No waiting. No blocking. The event carries all the metadata captured during the request — model, latency, tokens, cost, tags, rule applied. If the channel is full (worker is behind), the event is dropped. The proxy continues unaffected. The dashboard shows a degradation banner when drop rate is non-zero. The channel is the firewall between the hot path and everything else.

Event worker

A background goroutine that reads from the event channel and writes to Postgres in batches. Batch writes reduce DB pressure — events are flushed every 5 seconds or 100 events, whichever comes first. For streaming requests, the event worker also runs the tokeniser on the accumulated response buffer to produce a measured token count before writing. If Postgres is unavailable, the worker logs the failure and continues. Events are dropped — the proxy is unaffected.

Postgres

The persistent store for all event data. Stores request_logs — one row per request — with all captured metadata. Key tables:

request_logs — raw event data, one row per request
dashboard_summaries — pre-computed stats per org per period
routing_rules — org routing rules
anomalies — detected anomalies
recommendations — model recommendations
model_pricing — versioned pricing data per provider per model
model_downgrades — downgrade paths per provider per model
org_invites — pending team invitations
organizations — org settings including P95 threshold

Dashboard aggregator

A background worker that runs every 60 seconds. Reads from request_logs and pre-computes summary statistics per org per period (last_24h, last_7d, last_30d) into dashboard_summaries. The dashboard API reads from dashboard_summaries — a single row lookup per org per period. It never runs aggregation queries against request_logs directly. This keeps dashboard load times fast and consistent regardless of how much traffic has flowed through the proxy.

Anomaly detector

A background worker that runs every 5 minutes. For each org:

Queries request_logs for the 7-day rolling window
Computes mean and standard deviation per metric per feature
Compares current values against baseline
Records anomalies when deviation exceeds 2 standard deviations
Sends email alerts for newly detected anomalies

Detects: cost spikes, latency degradation, error rate spikes, traffic anomalies, model drift. Orgs with less than 48 hours of data are skipped — no baseline means no detection.

Recommendation engine

A background worker that runs daily. For each org, for each feature with 500+ requests in the past 30 days:

Queries aggregated signals — prompt length, completion length, error rate, latency, cost, dominant task type
Checks task type first (high-confidence signal)
Falls back to heuristic signals if no task type present
Generates recommendations with confidence levels (high, medium)
Determines downgrade aggressiveness from task type and confidence
Looks up downgrade target from the model_downgrades cache
Upserts recommendations — one per feature per current model

Recommendations stay within provider. Cross-provider quality comparisons require data the engine does not have.

Dashboard (Next.js / React + TypeScript)

A read-only web application at app.lectr.ai. Authenticated via Auth0. All data comes from the dashboard API (api.lectr.ai). The dashboard never writes to request_logs directly. It reads from:

dashboard_summaries for stat cards and overview metrics
request_logs for the requests table and recent failures (paginated)
anomalies for the anomaly feed
recommendations for model recommendation cards
routing_rules for the rules management page

Management API

REST endpoints at api.lectr.ai for dashboard data and org management. Protected by Auth0 session tokens for dashboard users. Protected by LECTR_ADMIN_SECRET for operator-level org management.

Startup sequence

1. Config loaded from environment
2. Database connection established, migrations run
3. In-memory caches loaded:
   - Org cache
   - Rule cache (all enabled rules)
   - Pricing cache
   - Downgrade cache
   - Azure config cache
4. Event pipeline started (channel + worker goroutine)
5. Dashboard aggregator started (60s ticker)
6. Anomaly detector started (5m ticker)
7. Recommendation engine started (24h ticker)
8. HTTP server starts accepting requests

Caches are loaded before the HTTP server starts. No request is served before the caches are ready.

Shutdown sequence

HTTP server stops accepting new requests
In-flight requests complete
Recommendation engine stopped
Anomaly detector stopped
Dashboard aggregator stopped
Event pipeline drained and stopped
Database connection closed

Workers stop in reverse startup order. The event pipeline is drained last to capture events from any requests that completed during shutdown.

Scaling considerations

Single instance (current): Rate limiting is per-instance only. A single proxy instance handles all traffic for all orgs. This is the correct default for the current scale. Horizontal scaling (future): The hot path is stateless except for in-memory caches. Adding instances requires:

Distributed rate limiting (Redis or similar)
Cache invalidation propagation across instances (rule changes must reach all instances)
Sticky routing is not required — any instance can handle any request

Database scaling: request_logs grows unboundedly. Partitioning by created_at and a TTL/archival strategy are the correct long-term approaches. The dashboard aggregator’s pre-computed summaries mean the dashboard remains fast even as raw event data grows.

Request Lifecycle

Step-by-step walkthrough of what happens to each request.

Security & Trust

What Lectr sees, stores, and never touches.

​Overview

​Components

​Proxy (Go)

​In-memory caches

​Event channel

​Event worker

​Postgres

​Dashboard aggregator

​Anomaly detector

​Recommendation engine

​Dashboard (Next.js / React + TypeScript)

​Management API

​Startup sequence

​Shutdown sequence

​Scaling considerations

​Next