Protocol

VoRS uses the Opus codec with 20ms frames with 48kHz 1ch 16-bit S-LE
sound. It uses native libopus'es Packet Loss Concealment (PLC)
feature when the number of lost frame does not exceed 32 count.
DTX (discontinuous transmission) is also on.

Each frame has a single byte stream identifier (unique identifier of the
participant), 24-bit big-endian packet counter and 24-bit big-endian
audio frame counter. Reordered packets are dropped. 24-bit counter is
long enough for very long talk sessions. Audio frame counter is
increased every 20ms data from microphone is read. When peer is muted,
then no packets are sent, but audio frames are still counted. That gives
ability to distinguish jitters and delays from lack of audio
transmission.

Each packet is encrypted with ChaCha20 and authenticated with SipHash24.
Their keys are generated with HKDF taken on handshake's state. Then they
are shared among the other participants. The stream identifier together
with the packet counter is used as a nonce.

It is tuned for 24Kbps bandwidth. But remember that it has additional 8B
of MAC tag, 7B VoRS, 8B UDP and 40B IPv6 headers.

Each client handshakes with the server over TCP connection using the
PQConnect, Noise, Chempat inspired protocol. It consists of hybrid key
exchange, using static Classic McEliece 6960-119 server's public key,
static X25519, ephemeral X25519 and ephemeral Streamlined NTRU Prime 761
ones. With HKDF as a KDF and SHAKE as a hash function.

=> PQConnect
=> Noise protocol framework
=> Chempat
=> Classic McEliece
=> Streamlined NTRU Prime
=> X25519
=> HKDF
=> SHAKE

* All messages are Netstring encoded strings. Most of them contain
  netstring encoded sequence of netstrings if multiple values are expected:
    NS(NS(arg0) || NS(arg1) || ...)
  => Netstring

* Client sends NS("VoRS v5") to the socket. Just a magic number.

* Then it performs [PQHS].

* Client sends initial handshake message. Its prefinish payload message
  contains his username, room name and optional SHAKE256 hash of the
  room's password (or an empty string) as a payload:
  [USERNAME, ROOM, hash(PASSWD)].

* Server answers with final noise handshake message with the
  ["COOKIE", COOKIE], or ["ERR", MSG] failure message. It may reject a
  client if there are too many peers, its name is already taken or it
  provided an invalid room's password.

* The 128-bit cookie is sent by client over UDP to the server every
  second. If UDP packets are lost, then no connection is possible and
  after a timeout the server drops the TCP connection. That cookie means:

  * confirmation of successful handshake on client side;
  * UDP hole punching of stateful firewall or NAT;
  * fact of client's UDP traffic ability to reach the server;
  * client's UDP address knowledge (after passing NAT, its port may
    differ from known to client one)

* Server replies with ["SID", SID], where SID is single byte stream
  number client must use.

TODO

* ["PING"] and ["PONG"] messages are then sent every ten seconds as a heartbeat.

    S <- C : hello
    S -> C : hello
    S <- C : finish, NS(NS(USERNAME) || NS(ROOM) || NS(hash(PASSWD)))
    S -> C : NS(NS("COOKIE") || NS(COOKIE))
    S <- C : UDP(COOKIE)
    S -> C : NS(NS("SID") || NS(SID))

    S <- C : NS(NS("PING"))
    S -> C : NS(NS("PONG"))
    S <> C : ...

Every second the client sends UDP packet with his single-byte stream
identifier, even if it's muted. That may help punching holes in stateful
firewalls.

Clients are notified about new peers appearance with "ADD" commands,
telling their SIDs, usernames and keys. "DEL" notifies about leaving
peers.

    S -> C : NS(NS("ADD") || NS(SID) || NS(USERNAME) || NS(KEY))
    S -> C : ...

    S -> C : NS(NS("DEL") || NS(SID))
    S -> C : ...

"MUTED", "UNMUTED" notifies peer's mute toggling:

    S <- C : NS(NS("[UN]MUTED"))
    S -> C*: NS(NS("[UN]MUTED"), NS(SID))

"CHAT" broadcasts the message in the room:

    S <- C : NS(NS("CHAT"), NS(MSG))
    S -> C*: NS(NS("CHAT"), NS(SID), NS(MSG))