VoRS uses the Opus codec with 20ms frames with 48kHz 1ch 16-bit S-LE
sound. It uses native libopus'es Packet Loss Concealment (PLC)
feature when the number of lost frame does not exceed 32 count.
DTX (discontinuous transmission) is also on.

Each frame has a single byte stream identifier (unique identifier of the
participant), 24-bit big-endian packet counter and 24-bit big-endian
audio frame counter. Reordered packets are dropped. 24-bit counter is
long enough for very long talk sessions. Audio frame counter is
increased every 20ms data from microphone is read. When peer is muted,
then no packets are sent, but audio frames are still counted. That gives
ability to distinguish jitters and delays from lack of audio
transmission.

Each packet is encrypted with ChaCha20 and authenticated with SipHash24.
Their keys are generated from BLAKE2b-XOF, which is fed with completed
handshake's binding value. Then they are shared among the other
participants. The stream identifier together with the packet counter is
used as a nonce.

It is tuned for 24Kbps bandwidth. But remember that it has additional 8B
of MAC tag, 7B VoRS, 8B UDP and 40B IPv6 headers.

Each client handshakes with the server over TCP connection using the
Noise-NKhfs protocol pattern with curve25519, Kyber-1024, ChaCha20-Poly1305
and BLAKE2b algorithms.
=> Noise protocol framework
=> KEM-based hybrid forward secrecy

* Client sends "VoRS v4" to the socket. Just a magic number.

* All next messages are Netstring encoded strings. Most of them contain
  netstring encoded sequence of netstrings if multiple values are expected:
    NS(NS(arg0) || NS(arg1) || ...)
  => Netstring

* Client sends initial Noise handshake message with his username, room
  name and optional BLAKE2b-256 hash of the room's password (or an empty
  string) as a payload: [USERNAME, ROOM, hash(PASSWD)].

* Server answers with final noise handshake message with the
  ["COOKIE", COOKIE], or ["ERR", MSG] failure message. It may reject a
  client if there are too many peers, its name is already taken or it
  provided an invalid room's password.

* The 128-bit cookie is sent by client over UDP to the server every
  second. If UDP packets are lost, then no connection is possible and
  after a timeout the server drops the TCP connection. That cookie means:

  * confirmation of successful handshake on client side;
  * UDP hole punching of stateful firewall or NAT;
  * fact of client's UDP traffic ability to reach the server;
  * client's UDP address knowledge (after passing NAT, its port may
    differ from known to client one)

* Server replies with ["SID", SID], where SID is single byte stream
  number client must use.

* ["PING"] and ["PONG"] messages are then sent every ten seconds as a heartbeat.

    S <- C : e, es, e1, NS(NS(USERNAME) || NS(ROOM) || NS(hash(PASSWD)))
    S -> C : e, ee, ekem1, NS(NS("COOKIE") || NS(COOKIE))
    S <- C : UDP(COOKIE)
    S -> C : NS(NS("SID") || NS(SID))

    S <- C : NS(NS("PING"))
    S -> C : NS(NS("PONG"))
    S <> C : ...

Every second the client sends UDP packet with his single-byte stream
identifier, even if it's muted. That may help punching holes in stateful
firewalls.

Clients are notified about new peers appearance with "ADD" commands,
telling their SIDs, usernames and keys. "DEL" notifies about leaving
peers.

    S -> C : NS(NS("ADD") || NS(SID) || NS(USERNAME) || NS(KEY))
    S -> C : ...

    S -> C : NS(NS("DEL") || NS(SID))
    S -> C : ...

"MUTED", "UNMUTED" notifies peer's mute toggling:

    S <- C : NS(NS("[UN]MUTED"))
    S -> C*: NS(NS("[UN]MUTED"), NS(SID))

"CHAT" broadcasts the message in the room:

    S <- C : NS(NS("CHAT"), NS(MSG))
    S -> C*: NS(NS("CHAT"), NS(SID), NS(MSG))