From server to client — how ThetisLink gets everything across the network

The big idea

The server sits at home, right next to the radio and Thetis. The client (on your laptop, PC or Android phone) can be anywhere — in the living room, at the office, on the other side of the world. Between those two runs a single thin network connection. Everything has to go over it: the audio from the receiver, the spectrum and waterfall display, the signal strength, and every control you press.

The big challenge is delay (latency). When you press the transmit key or turn the tuning knob, it has to happen now — not half a second later. That's why the whole of ThetisLink is built around one principle:

Design priority. Latency > bandwidth > features. As little delay as possible in the audio and the transmit key comes first. Anything that adds unnecessary delay — extra buffers, queues, acknowledgements back and forth — is avoided.

Figure 1 — The chain. The radio and Thetis produce the audio and the spectrum; the TL2 server packs it up and sends it over the network; the client plays it back and sends your controls back.

This document is about the right-hand part: the TL2 server → the network → the client. How the audio comes into being (the demodulation, the FFT channelizer) is covered in the sister document “How a VRX works”.

1. Two kinds of traffic

Everything that goes over the line falls into two groups, and they have very different requirements:

Continuous streams — the audio, the spectrum, the S-meter. These arrive dozens of times per second, endlessly. Here what counts is: on time and smooth. A packet that arrives too late is worthless: you no longer want to hear the audio from 100 ms ago, you'd rather move on.
One-off commands — setting a frequency, choosing a mode, sliding the volume, the transmit key. These only arrive when you do something. Here what counts is: reacting right away.

Both groups travel over the same connection, but the way they do determines whether the system feels fast and smooth. That brings us to the most important choice.

2. Why UDP and not TCP

There are two ways to send data over the internet. A comparison with the postal service helps:

TCP is like registered mail: every packet is acknowledged, and if one gets lost, it is sent again and everything behind it waits until it has arrived. Guaranteed complete — but a single hiccup brings the whole stream to a standstill.
UDP is like ordinary postcards: you drop them in the mailbox and they (usually) arrive, but there is no acknowledgement and no resending. If one gets lost, it's simply gone — the rest travels on unhindered.

For live audio, that second one is exactly what you want. A lost 20 ms packet is, at most, a tiny click you might hear; waiting for it would be far worse. That's why ThetisLink uses UDP — for all traffic, audio as well as commands.

But surely commands mustn't get lost? Correct. There's a different trick for that: the server continuously sends back the current state (the frequency, the mode…). If you don't see the value change after pressing a button, the client just sends it again. No slow per-packet acknowledgement; but certainty through the reported-back state.

In ThetisLink. A single UDP socket on port 4580 carries everything: audio, spectrum, S-meter, settings and the transmit key. No separate TCP connection. The server does enlarge its UDP buffers, because the Windows default is too small for the large spectrum packets.

3. Finding the server (auto-discovery)

Before anything can travel, the client needs to know where the server is. That can happen in two ways:

Automatically (mDNS). The server calls out on the local network: “here I am, I'm a ThetisLink server.” The client listens and shows it in a list — you don't need to know any IP address. The same mechanism your phone uses to find a printer or Chromecast.
Manually. If the client is on a different network (over VPN, or across the internet), that broadcasting doesn't work — there you just type in the IP address and the port yourself.

In ThetisLink. Auto-discovery uses standard mDNS (service type _thetislink._udp); the server puts its version and a friendly name in the announcement. Outside the local network: manual address and port 4580.

4. One packet, one envelope

Because everything travels over the same socket, the receiver of each packet must be able to see immediately what it is. That's why every packet starts with a short, fixed header of 4 bytes — like the address on an envelope:

Figure 2 — Every envelope opens with a marker, the protocol number, the type (audio, spectrum, frequency, S-meter, …) and a few individual flags. After that follows the payload that belongs to that type.

The type is the core: from it the receiver sees whether this is a piece of RX1 audio, a spectrum row, a new frequency, an S-meter reading or a press of the transmit key. That way a single socket can carry dozens of different kinds of messages mixed together without them getting confused.

Analogy. One mailbox, but every letter has a clear label in the top left: “AUDIO”, “PICTURE”, “BUTTON”. The mail sorter doesn't even have to open the letter to know where it should go.

5. Audio: Opus over UDP

The demodulated audio (from the VRX chain, or directly from Thetis or a Yaesu) is a stream of audio samples. Sending those uncompressed would cost unnecessarily much bandwidth. That's why it is first compressed with the Opus codec — a modern, highly efficient audio compression that is especially good at low latency.

The audio is cut into chunks of 20 ms; each chunk is coded separately as an Opus block and put into one UDP packet. Besides the Opus bytes themselves, that packet contains two important numbers:

a sequence number — packet 1, 2, 3, … so the client knows the correct order;
a timestamp — when this chunk is supposed to play.

Figure 3 — An audio packet: the standard header, a sequence number, a timestamp, the length, and then the Opus block with about 20 ms of audio. (For a VRX channel there is also a small channel number included, so VRX1 and VRX2 can be told apart.)

Narrow or wide — the same choice as in the VRX chain

Just as with the iFFT choice (see the VRX document), the audio can be narrowband or wideband:

Narrowband: 8 kHz audio, ~12.8 kbit/s — fine for speech, very little data.
Wideband: 16 kHz audio, ~20–24 kbit/s (depending on the stream) — distinctly clearer, roughly double the data.

Which of the two is decided by the client with a single switch; the server codes accordingly. (On the TX side — when you transmit yourself — wideband is always used for the best microphone quality.)

In ThetisLink. Opus, 20 ms frames. For the regular Thetis RX the codec is tuned for speech (with error correction/FEC and silence suppression/DTX, so that a lost packet is partly recoverable and silence costs almost no data). For the VRX channels, on the contrary, the silence suppression is off (and no FEC) — that's continuous reception, not a phone call, and you also want to hear the noise in between. The Yaesu audio has its own setting per narrow- or wideband.

6. Many audio streams over one line

ThetisLink can play more than one thing at a time: the main receiver (RX1), the second receiver (RX2/VFO-B), the virtual receivers VRX1 and VRX2, and the audio of a connected Yaesu radio. All those streams travel through the same socket. How do they stay apart?

Very simple: each kind of stream has its own packet type (and each its own sequence-number counter). From the type label the client immediately sees which audio channel a packet belongs to and sends it to the right playback buffer.

Figure 4 — All audio channels go through the same socket and are pulled apart again on the receiving side by their type label, each to its own playback buffer.

In addition, there is a “bundled” variant that puts several channels into one packet with the same sequence number and timestamp — handy when channels must run exactly in step (for example a stereo left/right image).

7. The jitter buffer

Packets don't all travel equally fast over the network. One takes 8 ms, the next 21 ms, sometimes one arrives before its predecessor. That varying delay is called jitter. If you played each packet right away as soon as it came in, it would sound jerky and jumbled.

The solution is a small jitter buffer on the client side: a little waiting room for a handful of packets. Incoming packets are placed there in the right spot by their sequence number; playback runs through it at a fixed, steady tick. That way a bumpy arrival becomes a smooth playback.

Figure 5 — Packets arrive unevenly and sometimes swapped (4 before 3). The buffer sorts them by sequence number and passes them on evenly. If one is missing, the codec briefly fills that gap instead of stopping.

This is exactly the trade-off from the beginning: a deeper buffer is smoother but slower; a shallow buffer is faster but more sensitive to dropouts. ThetisLink therefore keeps the buffer as small as possible and adjusts the depth automatically: if it measures little jitter, the buffer shrinks (less delay); if the network gets erratic, it grows just enough to prevent hiccups.

In ThetisLink. The buffer measures the jitter continuously (a moving average, with a fast reaction to peaks) and aims for the smallest depth that still plays smoothly — usually just a few frames, so several tens of milliseconds. A lost packet is briefly filled in by Opus by bridging the gap (PLC); on the speech RX, where FEC is on, part of it can even be truly recovered. If the buffer does run empty, playback is paused for just a moment and refilled instead of starting to stutter.

8. The spectrum across the line

The spectrum and waterfall display is a row of bins (see the VRX document): per bin a value that says how much energy is there. Such a row can be large — thousands to tens of thousands of bins. Sending all of it fully and at full precision every time would cost a lot of bandwidth, so here the second priority comes into play: bandwidth.

Two knobs keep that in check:

How many bytes per bin. The current server sends 1 byte per bin (256 levels) — plenty for a smooth picture. The protocol can also describe 2 bytes per bin for more dynamic range, but that is reserved for the future.
How many times per second. The picture is refreshed by default about 15 times per second; that is adjustable (calmer = less data, faster = smoother).

Each spectrum packet contains, besides the bins, also the center frequency, the width (span) and a reference level, so the client can draw the picture at the right place and scale. And just as with audio, every source — RX1, RX2, VRX1, VRX2 — has its own packet type.

In ThetisLink. The zooming and panning happens largely on the server: the client asks “I want this little piece, at this resolution”, and the server sends only that cut-out piece instead of always the whole wide picture. That saves a lot of data without you giving up detail where you're looking.

9. The S-meter

The signal strength comes in along three different routes, depending on the source:

Main receiver (RX1/RX2): the server sends a ready-made value — the power in dBm — in a compact packet of a few bytes. Depending on the chosen meter source, that is an average or peak from Thetis, or a value derived from the spectrum bins.
VRX channels: for these the server sends no separate S-meter. The client computes the strength itself from the received spectrum: it sums the power of the bins within the passband and converts that, with a fixed calibration correction, to dBm (see the VRX document, chapter on the S-meter).
Yaesu radio: the meter value is baked into the radio's status messages and comes directly via CAT from the set (its own scale, no dBm).

The scale (the familiar S-units: S9 = −73 dBm, each S-unit 6 dB) is always drawn in the client. The server delivers the raw number; the client makes the needle out of it.

In ThetisLink. So who computes the value differs per source, but the drawing of the S-meter is the same everywhere in the client — that way every meter looks the same.

10. Settings and controls

When you turn the tuning knob or choose a mode, the client sends a small command packet: usually no more than an identifier (“this is the volume”, “this is the mode”, “this is the filter width”) plus a value. Frequency and mode even have their own packet type. That's all — a few bytes, sent right away.

As already showed in chapter 2: no separate acknowledgement comes back. Instead, the server continuously reports back the current state. That reported-back value is your confirmation: if you see the frequency change along with it, the command has arrived. That way the client always stays in sync with what the radio is really doing, even if someone turns the radio itself.

Does the client request the whole state on connect? No. There is no big “give-me-everything” exchange. The server broadcasts its state continuously anyway, so within a second of connecting the client automatically has the frequency, mode, S-meter, spectrum and device status in.

In ThetisLink. Commands are “fire-and-forget” (one id + value). The reported-back state serves as confirmation; if it doesn't match after a press, the client can quietly send it again.

11. The transmit key (PTT)

The transmit key is the most time-critical signal of all — here every millisecond counts. That's why there is no separate, slow button message. Instead, the client sets a flag in the TX audio packets: as soon as you press the key, your outgoing audio packets carry the PTT flag, and the server switches the radio to transmit. The press and your speech thus travel along in one and the same stream — nothing that has to wait on anything else.

Because several clients can be connected at the same time, the server makes sure that only one at a time transmits. If a second client wants to transmit while the first is already busy, it gets a short “denied” message back instead of them talking over each other.

Safety: the transmitter cannot stay keyed by accident

For remote operation this is crucial: a transmitter that accidentally gets “stuck” (keeps transmitting) is dangerous — for your equipment, for the band, and for your licence. That is why the transmit key is deliberately not an on/off latch, but a flag that the client re-sends and re-confirms about 50 times per second. The transmitter therefore stays keyed only as long as that stream of “PTT-on” packets keeps arriving.

If the connection drops while you are transmitting — Wi-Fi hiccups, the laptop goes to sleep, the app crashes — that stream stops by itself, and two independent safety nets take over:

Packet timeout (~0.5 s): if no packets arrive for half a second, the server releases the transmitter automatically.
Heartbeat timeout (~2 s): if the “heartbeat” that keeps the connection alive is also absent for more than two seconds, the server treats the connection as lost — PTT released, plus an alarm.

So the worst a network failure can do is return your transmitter to receive within a fraction of a second up to at most two seconds — never let it transmit endlessly. This is the classic dead-man's switch principle.

In ThetisLink. For the Thetis transmitter, PTT is a flag in the TX audio packets, not a separate handshake. The server arranges the exclusivity (one transmitter at a time) and reports a collision with a PTT-denied message; during transmit the S-meter also switches from reception strength (dBm) to output power (watts). Two safety timers catch a dropped connection: after ~500 ms without packets, or ~2 s without a heartbeat, TX falls back to RX automatically (with an alarm). For a connected Yaesu radio the transmit key does not run via the audio flag but as a separate command, which the server sends to the set as a CAT command.

12. Connecting and securing

When connecting, client and server first briefly exchange what they can both do. The client tells which capabilities it supports (wideband audio, spectrum, a second receiver…), the server answers with the intersection: only what they can both handle is used. That way a newer side automatically turns off its extras toward an older one — within the same protocol version. A completely different protocol version is rejected; server and clients must then be updated together.

If the server is open on the internet, you can set a password. The password itself never goes over the line: the server sends a random “challenge”, the client answers it with a computed answer based on the password (HMAC), and optionally there is also a time code (TOTP, like an authenticator app) on top.

In ThetisLink. The capability exchange runs along in the heartbeat message that also keeps the connection alive. Authentication is optional and only needed if the server is reachable from outside.

13. The latency budget

If you add everything up, you can see where the delay between “audio at the radio” and “audio in your headphones” comes from:

Figure 6 — The fixed chunks (one Opus frame, the jitter buffer, the playback) are each only a few tens of milliseconds. The network journey itself is the only part that ThetisLink does not control — that's why all the rest is kept as tight as possible.

This is the network and codec budget, from the moment the audio is ready to be sent. For VRX audio there is, before the Opus coding, also the channelizer block delay on top (~16 ms, see the VRX document); for the Yaesu radios the USB CODEC and CAT come into play.

This is why the choices in this document are the way they are: UDP instead of TCP, short Opus frames, a minimal adaptive buffer, commands without slow acknowledgement, and the PTT flag along in the audio stream. One by one the same rule: remove the delay you can remove.

In closing — everything together

The server packs the audio (compressed with Opus, in chunks of 20 ms), the spectrum (bins, compactly coded) and the S-meter into numbered packets, and sends them over one UDP connection on port 4580 to the client. Your controls travel back as small command packets; the server reports the state continuously, so client and radio stay in sync. A minimal jitter buffer turns the bumpy arrival back into smooth audio, and everywhere the same thing comes first: as little delay as possible.

Together with the sister document “How a VRX works” you've then got the whole chain covered: from radio wave, via demodulation, to the audio and picture on your screen — wherever you are.