Monitor the latency of individual calls in the Call History section of the dashboard.Documentation Index
Fetch the complete documentation index at: https://documentation.uponai.com/llms.txt
Use this file to discover all available pages before exploring further.
Understanding Latency Metrics
End-to-end latency measures total time from when the user stops speaking until the agent begins responding — including processing time, network delays, and model inference.| Metric | Description |
|---|---|
| P90 | 90% of calls have latency below this value |
| Median (P50) | Half of all calls have latency below this value |
| Min | Fastest response time achieved |
Retrieve Latency via API
Use the Get Call API to retrieve detailed latency breakdowns after a call ends:Latency Breakdown Fields
| Field | Description |
|---|---|
e2e | End-to-end: user stops talking → agent starts talking (excludes network trip to frontend) |
asr | Transcription latency |
llm | LLM latency: start of LLM call → first speakable chunk. Includes websocket roundtrip for custom LLMs. |
llm_websocket_network_rtt | Websocket roundtrip between your server and UponAI. Custom LLM only. |
tts | Text-to-speech: trigger → first audio byte |
knowledge_base | Knowledge base retrieval latency. Only when agent uses knowledge base. |
s2s | Speech-to-speech: request → first byte. Only for S2S models. |
p50, p90, p95, p99, min, max, num, values.