Skip to main content

Overview

Every request to Lectr follows the same path. Understanding it helps you reason about latency, debug unexpected behaviour, and understand exactly what Lectr does and does not touch.
Client → Lectr → Provider → Lectr → Client
There are two distinct paths through Lectr — the hot path and the cold path.
  • Hot path — everything that happens while the request is in flight. Latency-critical. No database calls.
  • Cold path — everything that happens after the response is returned. Observability, anomaly detection, recommendations. Never affects the request.

Hot path — step by step

1. Request arrives

Your application sends a request to https://proxy.lectr.ai/v1/chat/completions. Lectr receives it over HTTPS and reads the request body and headers. Headers read at this stage:
  • X-Lectr-Key — org authentication
  • Authorization — provider API key
  • X-Lectr-Feature — feature tag (optional)
  • X-Lectr-Task — task type (optional)
  • X-Lectr-Provider — provider override (optional)

2. Org authentication

The X-Lectr-Key header is validated against the stored key hash. If missing or invalid, the request is rejected immediately with 401. The provider never sees it.
X-Lectr-Key present and valid → continue
X-Lectr-Key missing or invalid → 401, stop

3. Provider detection

Lectr reads the model field from the request body and maps it to a provider.
model: "gpt-4o"                     → openai
model: "claude-3-5-sonnet-20241022" → anthropic
model: "llama-3.1-70b-versatile"    → groq
model: "gemini-1.5-pro"             → gemini
If X-Lectr-Provider is present, it overrides detection entirely. If detection fails, Lectr defaults to openai and logs provider_unknown on the event.

4. Rule evaluation

Lectr checks the in-memory rule cache for the org. Rules are evaluated in priority order — the first matching enabled rule wins.
Check rule cache (in-memory, no DB)

Rule matches → rewrite model + provider in request
No match     → continue with original model + provider
If a rule matches:
  • The model field in the request body is rewritten to the rule’s target model
  • The provider is switched to the rule’s target provider
  • The original model is recorded as model_requested for the event log
This entire step adds less than 1ms of overhead.

5. Auth header normalisation

Different providers expect API keys in different header formats. Lectr normalises the outgoing auth header for the target provider so your client always sends the same format regardless of provider.
ProviderOutgoing header
OpenAIAuthorization: Bearer sk-...
Anthropicx-api-key: sk-ant-...
GroqAuthorization: Bearer gsk_...
GeminiAuthorization: Bearer ...
Azureapi-key: ...
Your provider API key is never stored. It exists in memory from step 1 through this step and is discarded after forwarding.

6. Upstream resolution

Lectr resolves the upstream URL for the target provider. For most providers this is a fixed URL from the provider registry. For Azure, it reads the per-org endpoint configuration from a fast in-memory cache.

7. Forward to provider

Lectr forwards the request to the provider upstream. For streaming requests, Lectr begins flushing response chunks to your client immediately as they arrive — no buffering. http.Flusher is used on every chunk to preserve low time-to-first-token (TTFB). For non-streaming requests, Lectr waits for the full response and forwards it.

8. Response forwarded to client

The response from your provider — headers, body, status code — is returned to your client unchanged. Lectr does not modify response content. For streaming responses, Lectr accumulates the chunks in a buffer for token counting (cold path) while simultaneously flushing them to your client. Your client sees the stream with no additional latency.

9. Hot path complete

The request lifecycle ends here from your application’s perspective. The response has been delivered. What happens next is the cold path — entirely invisible to your application.

Cold path — step by step

The cold path runs after the response is delivered. It has zero impact on request latency or reliability.

10. Event emission

After the response completes, Lectr emits an event to an in-memory buffered channel. The event contains:
  • Timestamp (captured at request completion, not DB write time)
  • Provider, model requested, model actual
  • Latency (total and TTFB)
  • Status code and error category
  • Streaming flag
  • Token counts and source
  • Cost estimate
  • Feature tag, task type
  • Rule ID (if a routing rule was applied)
  • Org ID
The channel is the firewall between the hot path and everything else. If the cold path is overloaded, events are dropped — the hot path continues unaffected. The dashboard shows a degradation banner when this happens.

11. Token counting (streaming only)

For streaming requests, the accumulated response buffer is tokenised to produce a token count. This runs after the stream completes, after the client has already received the full response. Token source labels:
  • provider — exact count from provider (non-streaming)
  • tokeniser — counted by Lectr’s tokeniser (streaming)
  • calibrated — tokeniser count adjusted by historical calibration

12. Event worker — batch DB write

A background worker reads from the event channel in batches and writes to request_logs in Postgres. Batch writes reduce DB pressure and keep write latency low. If the database is unavailable, events are dropped. The hot path is unaffected.

13. Downstream cold-path workers

Several background workers operate on the stored event data: Dashboard aggregator — runs every 60 seconds, pre-computes summary stats per org per period into dashboard_summaries. Dashboard API reads from summaries — never aggregates raw request_logs on demand. Anomaly detector — runs every 5 minutes, compares current metrics against 7-day rolling baselines, records anomalies, sends email alerts. Recommendation engine — runs daily, analyses signals per feature per model, generates model recommendations with confidence levels. None of these workers touch the hot path. They operate entirely on stored data.

Latency impact

Lectr adds overhead in two places:
StepOverhead
Org authentication< 1ms (in-memory hash check)
Provider detection< 1ms (map lookup)
Rule evaluation< 1ms (in-memory cache scan)
Auth normalisation< 1ms (string operations)
Network (proxy hop)Typically 1–5ms depending on geography
The dominant factor is the network hop — the additional round trip through Lectr’s infrastructure. For most use cases this is negligible compared to provider response times (typically 200ms–3s+).

Failure behaviour

Lectr is designed to degrade gracefully. If something goes wrong, the request passes through — it never fails because of Lectr.
FailureBehaviour
Rule cache missPassthrough unchanged
Rule evaluation errorPassthrough unchanged
Provider detection failureDefault to OpenAI
Event channel fullEvent dropped, request unaffected
DB unavailableEvents dropped, proxy unaffected
Anomaly worker downNo alerts, proxy unaffected
Auth0 downDashboard inaccessible, proxy unaffected
The proxy and the dashboard are independent systems. Dashboard failures never affect request routing or forwarding.

Next

Architecture

The system design — components, data flow, and infrastructure.

Security & Trust

Exactly what Lectr sees, stores, and never touches.