Documentation Index
Fetch the complete documentation index at: https://documentation.uponai.com/llms.txt
Use this file to discover all available pages before exploring further.
How Audio Is Represented Digitally
Sound waves are captured by a microphone, which converts acoustic energy into electrical analog signals. These are fed into an ADC (Analog-to-Digital Converter) where two critical processes occur: sampling and quantization.Sampling
Definition
Sampling measures the amplitude of an analog signal at regular intervals. The interval rate is expressed in Hertz (Hz). For example, 44.1 kHz means the signal is sampled 44,100 times per second.
Purpose
Sampling creates a series of discrete data points that approximate the continuous analog waveform.
Quantization
Definition
Quantization converts each sampled amplitude value into a digital value by assigning a numerical quantization level to each sample.
Purpose
The range of amplitude values is divided into discrete steps, each assigned a digital value. Bit depth determines the number of possible levels — a 16-bit system can represent 65,536 (2^16) different levels.
Terminology
| Term | Definition |
|---|---|
| Sample Rate | Number of audio samples per second, measured in Hz |
| Channel Count | Number of separate audio channels (mono = 1, stereo = 2) |
| Bit Depth | Number of bits used to represent each audio sample |
Audio Encoding
Audio encoding converts audio data into a format suitable for storage, transmission, and playback — often with compression.| Format | Description |
|---|---|
| PCM | Pulse Code Modulation — most straightforward digital audio encoding. Standard for computers, CDs, and digital telephony. |
| MP3 | Compressed format with perceptual audio coding |
| AAC | Advanced Audio Coding — higher quality than MP3 at similar bitrates |
| Opus | Modern codec optimized for low-latency voice |
| μ-law | Companded PCM used in telephony (G.711) |
Audio encoding is not the same as audio format. An audio format (e.g., WAV) includes the encoding plus metadata, file headers, and container structure.
PCM Audio Representation
When audio is played, it’s typically decoded into PCM. There are two common representations:| Type | Description |
|---|---|
| Float32Array | 32-bit floating-point format. Used when capturing mic streams and setting up playback in web environments. |
| Uint8Array | 8-bit unsigned integer array. Lower-level representation used in audio processing. For 16-bit mono PCM, each sample is 2 bytes. |
Audio in UponAI
| Call Type | Audio Handling |
|---|---|
| Phone Calls | Different telephony providers use different audio codecs. UponAI’s telephony integrations handle encoding and decoding internally — no action needed. |
| Web Calls | The frontend web JS SDK abstracts all audio complexity. User audio is captured in PCM format and sent to the backend automatically. |