Skip to main content

Documentation Index

Fetch the complete documentation index at: https://documentation.uponai.com/llms.txt

Use this file to discover all available pages before exploring further.

This page explains every metric and term in AI QA so you can interpret your call analysis results. When a metric fails, see Address Metric Issues for step-by-step guidance on how to fix it.

Performance Metrics

Latency

MetricDescription
Average LatencyEnd-to-end delay between user speaking and agent beginning spoken response. Measured in seconds — lower is better.
Latency P5050th percentile (median) latency. Half of all responses are faster, half are slower.

Sentiment Analysis

MetricDescription
User SentimentEmotional state of the caller inferred from speech content, tone, and pitch — positive, negative, or neutral
User Positive Sentiment RatePercentage of user interactions with positive sentiment
User Negative Sentiment RatePercentage of user interactions with negative sentiment
Negative Sentiment RateOverall rate of negative sentiment in the conversation
Agent SentimentEmotional tone expressed by the AI during speech
Agent Positive Sentiment RatePercentage of agent responses with positive sentiment
Agent Natural Tonality RateHow natural and human-like the agent’s tone sounds

Transcription Metrics

MetricDescription
WER (Word Error Rate)Percentage of words incorrectly transcribed. Calculated as: (Substitutions + Insertions + Deletions) / Total Words × 100%. Lower = better.
Mistranscribed EntitiesCount of specific entities (names, dates, numbers) incorrectly transcribed. Only critical factual errors that change meaning are counted.

Call Quality Metrics

MetricDescription
InterruptionsCount of times the user interrupted the agent. High counts may indicate the agent speaks too long or doesn’t respond appropriately.
Avg. InterruptionsAverage interruptions per call across the cohort
Agent NaturalnessHow human-like the agent sounded — pronunciation, intonation, pacing, turn-taking, absence of robotic patterns. Higher = more natural.
Natural Tonality RatePercentage of agent speech that sounds natural in tone and delivery

AI Accuracy Metrics

MetricDescription
LLM Hallucination RateRate at which the LLM generated incorrect or fabricated information not supported by context or knowledge base
Agent HallucinationHow often the agent hallucinated. High values mean the agent may be providing incorrect information to users.
KB RecallHow effectively the agent retrieved and used relevant knowledge base information. Higher = better.

Tool and Function Metrics

MetricDescription
Tool Call AccuracyRate at which the agent correctly invoked tools. Higher = agent uses right tools at right times.
Tool Call InaccuracyRate at which the agent invoked incorrect tools (inverse of Tool Call Accuracy)
Custom Tool Success RatePercentage of custom tool calls that completed successfully
Avg Custom Tool LatencyAverage time for custom tools to execute and return results

Conversation Flow Metrics

MetricDescription
Transition AccuracyAccuracy of transitions between conversation nodes. Higher = agent follows intended flow.
Node Transition InaccuracyRate of incorrect node transitions in conversation flows

Call Resolution Metrics

MetricDescription
Call Resolution RatePercentage of calls successfully resolved per your defined criteria
Average ScoreOverall quality score calculated from resolution criteria and weighted scoring
Calls AnalyzedTotal calls analyzed in the cohort

Transfer Metrics

MetricDescription
Transfer Success RatePercentage of calls successfully transferred
Transfer Wait TimeAverage time users wait before transfer completes

Statistical Terms

TermDescription
P5050th percentile (median) — half above, half below
CohortA filtered set of calls sharing common characteristics, analyzed with the same resolution criteria
Sampling PercentagePercentage of matching calls included for analysis
Weekly MaxMaximum calls analyzed per week regardless of sampling percentage

Resolution Criteria Terms

TermDescription
AI Evaluated ConditionQualitative criteria evaluated by AI from transcripts (e.g., “Call resolved”)
Performance MetricQuantitative threshold a call must meet (e.g., latency < 2s)
Weighted ScoringAssigns different weights to criteria to prioritize certain conditions over others
CalibrationManual override of automatic metric evaluations for a specific call
Some metrics show “N/A” when there isn’t sufficient data or when the metric doesn’t apply to a call type (e.g., transfer metrics on non-transfer calls).