> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lectr.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture

> The system design — hot path, cold path, caches, and workers.

## Overview

Lectr is built around one hard constraint: **the request path is sacred.**
Nothing in the observability or control layer may degrade latency, reliability,
or correctness for requests passing through.

This constraint produces a clear architectural boundary between two paths:

* **Hot path** — the synchronous request/response cycle. No database. No blocking. Sub-millisecond overhead.
* **Cold path** — async workers that process event data after requests complete. Observability, anomaly detection, recommendations. Never touches active requests.

```mermaid theme={null}
flowchart TB
    subgraph hot["Hot path — synchronous"]
        direction LR
        Client([Client]) --> Proxy["Proxy (Go)"]
        Proxy --> Provider([Provider])
    end

    Proxy -->|"event (async)"| Channel[Event channel]

    subgraph cold["Cold path — async"]
        direction TB
        Channel --> Worker[Event worker]
        Worker --> DB[(Postgres)]
        DB --> Pipeline["Aggregator · Detector · Engine"]
    end
```

The proxy handles **Auth · Detect · Route · Log** on the hot path only — no database calls.

***

## Components

### Proxy (Go)

The core of Lectr. A Go HTTP server that handles all incoming requests to
`proxy.lectr.ai/v1`.

Responsibilities:

* Org authentication via `X-Lectr-Key`
* Provider detection from model name and `X-Lectr-Provider` header
* Rule evaluation against the in-memory rule cache
* Auth header normalisation per provider
* Request forwarding to provider upstream
* Streaming response passthrough via `http.Flusher`
* Event emission to the async channel

The proxy never makes a database call during request handling. All lookups
(org validation, rule matching, provider routing) are served from in-memory
structures loaded at startup.

***

### In-memory caches

Several fast lookup structures live in memory and are loaded at startup:

**Org cache** — validates `X-Lectr-Key` values against hashed keys. Reloaded
on key rotation.

**Rule cache** — holds all enabled routing rules grouped by org ID, sorted by
priority. Reloaded immediately whenever rules are created, updated, deleted,
or toggled via the dashboard. A read lock is acquired per request — no
contention under normal load.

**Pricing cache** — model pricing data per provider, used for cost estimation.
Loaded at startup, refreshed every 5 minutes.

**Provider registry** — static map of model names to provider IDs and upstream
URLs. Updated on deploy when new models are added.

**Azure config cache** — per-org Azure endpoint configuration. Loaded at startup,
reloaded when org provider config changes.

None of these caches make database calls on the request path. Cache misses
result in passthrough or default behaviour — never errors.

***

### Event channel

A buffered in-memory channel that decouples request handling from observability.

After a request completes, the proxy emits an event onto the channel and returns.
No waiting. No blocking. The event carries all the metadata captured during the
request — model, latency, tokens, cost, tags, rule applied.

If the channel is full (worker is behind), the event is dropped. The proxy
continues unaffected. The dashboard shows a degradation banner when drop rate
is non-zero.

The channel is the firewall between the hot path and everything else.

***

### Event worker

A background goroutine that reads from the event channel and writes to Postgres
in batches. Batch writes reduce DB pressure — events are flushed every 5 seconds
or 100 events, whichever comes first.

For streaming requests, the event worker also runs the tokeniser on the
accumulated response buffer to produce a measured token count before writing.

If Postgres is unavailable, the worker logs the failure and continues. Events
are dropped — the proxy is unaffected.

***

### Postgres

The persistent store for all event data. Stores `request_logs` — one row per
request — with all captured metadata.

Key tables:

* `request_logs` — raw event data, one row per request
* `dashboard_summaries` — pre-computed stats per org per period
* `routing_rules` — org routing rules
* `anomalies` — detected anomalies
* `recommendations` — model recommendations
* `model_pricing` — versioned pricing data per provider per model
* `model_downgrades` — downgrade paths per provider per model
* `org_invites` — pending team invitations
* `organizations` — org settings including P95 threshold

***

### Dashboard aggregator

A background worker that runs every 60 seconds. Reads from `request_logs` and
pre-computes summary statistics per org per period (`last_24h`, `last_7d`,
`last_30d`) into `dashboard_summaries`.

The dashboard API reads from `dashboard_summaries` — a single row lookup per
org per period. It never runs aggregation queries against `request_logs` directly.
This keeps dashboard load times fast and consistent regardless of how much traffic
has flowed through the proxy.

***

### Anomaly detector

A background worker that runs every 5 minutes. For each org:

1. Queries `request_logs` for the 7-day rolling window
2. Computes mean and standard deviation per metric per feature
3. Compares current values against baseline
4. Records anomalies when deviation exceeds 2 standard deviations
5. Sends email alerts for newly detected anomalies

Detects: cost spikes, latency degradation, error rate spikes, traffic anomalies,
model drift.

Orgs with less than 48 hours of data are skipped — no baseline means no detection.

***

### Recommendation engine

A background worker that runs daily. For each org, for each feature with 500+
requests in the past 30 days:

1. Queries aggregated signals — prompt length, completion length, error rate,
   latency, cost, dominant task type
2. Checks task type first (high-confidence signal)
3. Falls back to heuristic signals if no task type present
4. Generates recommendations with confidence levels (`high`, `medium`)
5. Determines downgrade aggressiveness from task type and confidence
6. Looks up downgrade target from the `model_downgrades` cache
7. Upserts recommendations — one per feature per current model

Recommendations stay within provider. Cross-provider quality comparisons require
data the engine does not have.

***

### Dashboard (Next.js / React + TypeScript)

A read-only web application at `app.lectr.ai`. Authenticated via Auth0.
All data comes from the dashboard API (`api.lectr.ai`).

The dashboard never writes to `request_logs` directly. It reads from:

* `dashboard_summaries` for stat cards and overview metrics
* `request_logs` for the requests table and recent failures (paginated)
* `anomalies` for the anomaly feed
* `recommendations` for model recommendation cards
* `routing_rules` for the rules management page

***

### Management API

REST endpoints at `api.lectr.ai` for dashboard data and org management.
Protected by Auth0 session tokens for dashboard users. Protected by
`LECTR_ADMIN_SECRET` for operator-level org management.

***

## Startup sequence

```
1. Config loaded from environment
2. Database connection established, migrations run
3. In-memory caches loaded:
   - Org cache
   - Rule cache (all enabled rules)
   - Pricing cache
   - Downgrade cache
   - Azure config cache
4. Event pipeline started (channel + worker goroutine)
5. Dashboard aggregator started (60s ticker)
6. Anomaly detector started (5m ticker)
7. Recommendation engine started (24h ticker)
8. HTTP server starts accepting requests
```

Caches are loaded before the HTTP server starts. No request is served before
the caches are ready.

***

## Shutdown sequence

```
1. HTTP server stops accepting new requests
2. In-flight requests complete
3. Recommendation engine stopped
4. Anomaly detector stopped
5. Dashboard aggregator stopped
6. Event pipeline drained and stopped
7. Database connection closed
```

Workers stop in reverse startup order. The event pipeline is drained last to
capture events from any requests that completed during shutdown.

***

## Scaling considerations

**Single instance (current):**

Rate limiting is per-instance only. A single proxy instance handles all traffic
for all orgs. This is the correct default for the current scale.

**Horizontal scaling (future):**

The hot path is stateless except for in-memory caches. Adding instances requires:

* Distributed rate limiting (Redis or similar)
* Cache invalidation propagation across instances (rule changes must reach all instances)
* Sticky routing is not required — any instance can handle any request

**Database scaling:**

`request_logs` grows unboundedly. Partitioning by `created_at` and a TTL/archival
strategy are the correct long-term approaches. The dashboard aggregator's
pre-computed summaries mean the dashboard remains fast even as raw event data grows.

***

## Next

<CardGroup cols={2}>
  <Card title="Request Lifecycle" icon="arrows-spin" href="/concepts/request-lifecycle">
    Step-by-step walkthrough of what happens to each request.
  </Card>

  <Card title="Security & Trust" icon="shield" href="/security">
    What Lectr sees, stores, and never touches.
  </Card>
</CardGroup>
