Previous: , Up: VoRS  


Protocol

VoRS uses the Opus codec with 20ms frames with 48kHz 1ch 16-bit S-LE sound. It uses native libopus’es Packet Loss Concealment (PLC) feature when the number of lost frame does not exceed 32 count. DTX (discontinuous transmission) is also on.

Each frame has a single byte stream identifier (unique identifier of the participant), 24-bit big-endian packet counter and 24-bit big-endian audio frame counter. Reordered packets are dropped. 24-bit counter is long enough for very long talk sessions. Audio frame counter is increased every 20ms data from microphone is read. When peer is muted, then no packets are sent, but audio frames are still counted. That gives ability to distinguish jitters and delays from lack of audio transmission.

Each packet is encrypted with ChaCha20 and authenticated with SipHash24. Their keys are generated from BLAKE2s-XOF, which is fed with completed handshake’s binding value. Then they are shared among the other participants. The stream identifier together with the packet counter is used as a nonce.

It is tuned for 24Kbps bandwidth. But remember that it has additional 8B of MAC tag, 7B VoRS, 8B UDP and 40B IPv6 headers.

Each client handshakes with the server over TCP connection using the Noise-NK protocol pattern with curve25519, ChaCha20-Poly1305 and BLAKE2s algorithms.

S <- C : e, es, NS(NS("USERNAME") || NS("ROOM") || NS("PASSWORD"))
S -> C : e, ee, NS(NS("COOKIE") || NS(COOKIE))
S <- C : UDP(COOKIE)
S -> C : NS(NS("SID") || NS(X))

S <- C : NS(NS("PING"))
S -> C : NS(NS("PONG"))
S <> C : ...

S -> C : NS(NS("ADD") || NS(SID) || NS(USERNAME) || NS(KEY))
S -> C : ...

S -> C : NS(NS("DEL") || NS(SID))
S -> C : ...

Every second the client sends UDP packet with his single-byte stream identifier, even if it’s muted. That may help punching holes in stateful firewalls.

Clients are notified about new peers appearance with ADD commands, telling their SIDs, usernames and keys. DEL notifies about leaving peers. MUTED, UNMUTED notifies peer’s mute toggling. CHAT broadcasts the message in the room.


Previous: VAD, Up: VoRS