The big idea
The server sits at home, right next to the radio and Thetis. The client (on your laptop, PC or Android phone) can be anywhere — in the living room, at the office, on the other side of the world. Between those two runs a single thin network connection. Everything has to go over it: the audio from the receiver, the spectrum and waterfall display, the signal strength, and every control you press.
The big challenge is delay (latency). When you press the transmit key or turn the tuning knob, it has to happen now — not half a second later. That's why the whole of ThetisLink is built around one principle:
This document is about the right-hand part: the TL2 server → the network → the client. How the audio comes into being (the demodulation, the FFT channelizer) is covered in the sister document “How a VRX works”.
Contents
1. Two kinds of traffic
Everything that goes over the line falls into two groups, and they have very different requirements:
- Continuous streams — the audio, the spectrum, the S-meter. These arrive dozens of times per second, endlessly. Here what counts is: on time and smooth. A packet that arrives too late is worthless: you no longer want to hear the audio from 100 ms ago, you'd rather move on.
- One-off commands — setting a frequency, choosing a mode, sliding the volume, the transmit key. These only arrive when you do something. Here what counts is: reacting right away.
Both groups travel over the same connection, but the way they do determines whether the system feels fast and smooth. That brings us to the most important choice.
2. Why UDP and not TCP
There are two ways to send data over the internet. A comparison with the postal service helps:
- TCP is like registered mail: every packet is acknowledged, and if one gets lost, it is sent again and everything behind it waits until it has arrived. Guaranteed complete — but a single hiccup brings the whole stream to a standstill.
- UDP is like ordinary postcards: you drop them in the mailbox and they (usually) arrive, but there is no acknowledgement and no resending. If one gets lost, it's simply gone — the rest travels on unhindered.
For live audio, that second one is exactly what you want. A lost 20 ms packet is, at most, a tiny click you might hear; waiting for it would be far worse. That's why ThetisLink uses UDP — for all traffic, audio as well as commands.
3. Finding the server (auto-discovery)
Before anything can travel, the client needs to know where the server is. That can happen in two ways:
- Automatically (mDNS). The server calls out on the local network: “here I am, I'm a ThetisLink server.” The client listens and shows it in a list — you don't need to know any IP address. The same mechanism your phone uses to find a printer or Chromecast.
- Manually. If the client is on a different network (over VPN, or across the internet), that broadcasting doesn't work — there you just type in the IP address and the port yourself.
mDNS
(service type _thetislink._udp); the server puts its version and a friendly name in
the announcement. Outside the local network: manual address and port 4580.4. One packet, one envelope
Because everything travels over the same socket, the receiver of each packet must be able to see immediately what it is. That's why every packet starts with a short, fixed header of 4 bytes — like the address on an envelope:
The type is the core: from it the receiver sees whether this is a piece of RX1 audio, a spectrum row, a new frequency, an S-meter reading or a press of the transmit key. That way a single socket can carry dozens of different kinds of messages mixed together without them getting confused.
5. Audio: Opus over UDP
The demodulated audio (from the VRX chain, or directly from Thetis or a Yaesu) is a stream of audio samples. Sending those uncompressed would cost unnecessarily much bandwidth. That's why it is first compressed with the Opus codec — a modern, highly efficient audio compression that is especially good at low latency.
The audio is cut into chunks of 20 ms; each chunk is coded separately as an Opus block and put into one UDP packet. Besides the Opus bytes themselves, that packet contains two important numbers:
- a sequence number — packet 1, 2, 3, … so the client knows the correct order;
- a timestamp — when this chunk is supposed to play.
Narrow or wide — the same choice as in the VRX chain
Just as with the iFFT choice (see the VRX document), the audio can be narrowband or wideband:
- Narrowband: 8 kHz audio, ~12.8 kbit/s — fine for speech, very little data.
- Wideband: 16 kHz audio, ~20–24 kbit/s (depending on the stream) — distinctly clearer, roughly double the data.
Which of the two is decided by the client with a single switch; the server codes accordingly. (On the TX side — when you transmit yourself — wideband is always used for the best microphone quality.)
6. Many audio streams over one line
ThetisLink can play more than one thing at a time: the main receiver (RX1), the second receiver (RX2/VFO-B), the virtual receivers VRX1 and VRX2, and the audio of a connected Yaesu radio. All those streams travel through the same socket. How do they stay apart?
Very simple: each kind of stream has its own packet type (and each its own sequence-number counter). From the type label the client immediately sees which audio channel a packet belongs to and sends it to the right playback buffer.
In addition, there is a “bundled” variant that puts several channels into one packet with the same sequence number and timestamp — handy when channels must run exactly in step (for example a stereo left/right image).
7. The jitter buffer
Packets don't all travel equally fast over the network. One takes 8 ms, the next 21 ms, sometimes one arrives before its predecessor. That varying delay is called jitter. If you played each packet right away as soon as it came in, it would sound jerky and jumbled.
The solution is a small jitter buffer on the client side: a little waiting room for a handful of packets. Incoming packets are placed there in the right spot by their sequence number; playback runs through it at a fixed, steady tick. That way a bumpy arrival becomes a smooth playback.
This is exactly the trade-off from the beginning: a deeper buffer is smoother but slower; a shallow buffer is faster but more sensitive to dropouts. ThetisLink therefore keeps the buffer as small as possible and adjusts the depth automatically: if it measures little jitter, the buffer shrinks (less delay); if the network gets erratic, it grows just enough to prevent hiccups.
8. The spectrum across the line
The spectrum and waterfall display is a row of bins (see the VRX document): per bin a value that says how much energy is there. Such a row can be large — thousands to tens of thousands of bins. Sending all of it fully and at full precision every time would cost a lot of bandwidth, so here the second priority comes into play: bandwidth.
Two knobs keep that in check:
- How many bytes per bin. The current server sends 1 byte per bin (256 levels) — plenty for a smooth picture. The protocol can also describe 2 bytes per bin for more dynamic range, but that is reserved for the future.
- How many times per second. The picture is refreshed by default about 15 times per second; that is adjustable (calmer = less data, faster = smoother).
Each spectrum packet contains, besides the bins, also the center frequency, the width (span) and a reference level, so the client can draw the picture at the right place and scale. And just as with audio, every source — RX1, RX2, VRX1, VRX2 — has its own packet type.
9. The S-meter
The signal strength comes in along three different routes, depending on the source:
- Main receiver (RX1/RX2): the server sends a ready-made value — the power in dBm — in a compact packet of a few bytes. Depending on the chosen meter source, that is an average or peak from Thetis, or a value derived from the spectrum bins.
- VRX channels: for these the server sends no separate S-meter. The client computes the strength itself from the received spectrum: it sums the power of the bins within the passband and converts that, with a fixed calibration correction, to dBm (see the VRX document, chapter on the S-meter).
- Yaesu radio: the meter value is baked into the radio's status messages and comes directly via CAT from the set (its own scale, no dBm).
The scale (the familiar S-units: S9 = −73 dBm, each S-unit 6 dB) is always drawn in the client. The server delivers the raw number; the client makes the needle out of it.
10. Settings and controls
When you turn the tuning knob or choose a mode, the client sends a small command packet: usually no more than an identifier (“this is the volume”, “this is the mode”, “this is the filter width”) plus a value. Frequency and mode even have their own packet type. That's all — a few bytes, sent right away.
As already showed in chapter 2: no separate acknowledgement comes back. Instead, the server continuously reports back the current state. That reported-back value is your confirmation: if you see the frequency change along with it, the command has arrived. That way the client always stays in sync with what the radio is really doing, even if someone turns the radio itself.
11. The transmit key (PTT)
The transmit key is the most time-critical signal of all — here every millisecond counts. That's why there is no separate, slow button message. Instead, the client sets a flag in the TX audio packets: as soon as you press the key, your outgoing audio packets carry the PTT flag, and the server switches the radio to transmit. The press and your speech thus travel along in one and the same stream — nothing that has to wait on anything else.
Because several clients can be connected at the same time, the server makes sure that only one at a time transmits. If a second client wants to transmit while the first is already busy, it gets a short “denied” message back instead of them talking over each other.
Safety: the transmitter cannot stay keyed by accident
For remote operation this is crucial: a transmitter that accidentally gets “stuck” (keeps transmitting) is dangerous — for your equipment, for the band, and for your licence. That is why the transmit key is deliberately not an on/off latch, but a flag that the client re-sends and re-confirms about 50 times per second. The transmitter therefore stays keyed only as long as that stream of “PTT-on” packets keeps arriving.
If the connection drops while you are transmitting — Wi-Fi hiccups, the laptop goes to sleep, the app crashes — that stream stops by itself, and two independent safety nets take over:
- Packet timeout (~0.5 s): if no packets arrive for half a second, the server releases the transmitter automatically.
- Heartbeat timeout (~2 s): if the “heartbeat” that keeps the connection alive is also absent for more than two seconds, the server treats the connection as lost — PTT released, plus an alarm.
So the worst a network failure can do is return your transmitter to receive within a fraction of a second up to at most two seconds — never let it transmit endlessly. This is the classic dead-man's switch principle.
12. Connecting and securing
When connecting, client and server first briefly exchange what they can both do. The client tells which capabilities it supports (wideband audio, spectrum, a second receiver…), the server answers with the intersection: only what they can both handle is used. That way a newer side automatically turns off its extras toward an older one — within the same protocol version. A completely different protocol version is rejected; server and clients must then be updated together.
If the server is open on the internet, you can set a password. The password itself never goes over the line: the server sends a random “challenge”, the client answers it with a computed answer based on the password (HMAC), and optionally there is also a time code (TOTP, like an authenticator app) on top.
13. The latency budget
If you add everything up, you can see where the delay between “audio at the radio” and “audio in your headphones” comes from:
This is the network and codec budget, from the moment the audio is ready to be sent. For VRX audio there is, before the Opus coding, also the channelizer block delay on top (~16 ms, see the VRX document); for the Yaesu radios the USB CODEC and CAT come into play.
This is why the choices in this document are the way they are: UDP instead of TCP, short Opus frames, a minimal adaptive buffer, commands without slow acknowledgement, and the PTT flag along in the audio stream. One by one the same rule: remove the delay you can remove.
In closing — everything together
The server packs the audio (compressed with Opus, in chunks of 20 ms), the spectrum (bins, compactly coded) and the S-meter into numbered packets, and sends them over one UDP connection on port 4580 to the client. Your controls travel back as small command packets; the server reports the state continuously, so client and radio stay in sync. A minimal jitter buffer turns the bumpy arrival back into smooth audio, and everywhere the same thing comes first: as little delay as possible.
Together with the sister document “How a VRX works” you've then got the whole chain covered: from radio wave, via demodulation, to the audio and picture on your screen — wherever you are.