The big idea
A VRX (virtual receiver) turns one wide stream of radio data into, in principle, as many independent receivers as you like. The SDR digitises an entire band in one go. The server holds on to that wide stream and “cuts” out a narrow slice of its own for each listener — each on its own frequency, mode and bandwidth. Nobody gets in anybody else's way.
Contents
1. From radio wave to numbers (IQ)
A modern SDR (Software Defined Radio) converts a wide chunk of the band directly into numbers. At each moment it measures two values, called I and Q. Together these describe not only how strong the signal is, but also its phase — and it is precisely that phase we will need later on to recognise frequencies and modulation.
2. The wide, unfiltered signal
It starts with an unfiltered time signal, and that is wide — in our case up to as much as 1.536 MHz wide. That is a fast stream of I/Q pairs (well over one and a half million per second) containing everything there is to hear on that piece of band: all stations jumbled together, strong and weak, next to and on top of each other. At this point nothing has been filtered out yet; it is the complete, wide “photo” of the band.
Our goal: to later pull out one narrow radio channel from this (for example an SSB signal of 3 kHz), neatly filtered, ready to listen to. To do that we first need to know where the frequencies are — and that is exactly what the FFT does.
3. The FFT — time becomes frequency
The wide signal is a course over time. But we want to know which frequencies are in it — where the stations are. To do that, the whole signal goes through an FFT, which converts the time domain into the frequency domain (exactly the picture of a spectrum analyser).
FFT stands for Fast Fourier Transform. Fourier is a mathematical way to convert from the time domain to the frequency domain. It is a conversion to a different presentation: the same information, displayed differently. Nothing is lost and nothing is added — the signal “in time” and the same signal “in frequency” contain exactly the same information.
Because this is a mathematical computation model, the conversion costs some computation time, and you first need a whole block of samples before you can compute. That is the practical limitation: it costs a little bit of delay (latency).
One and the same signal — in time and in frequency
A few familiar signals, on the left as they look in time, on the right how those same signals look in the frequency domain:
The output of the FFT is a row of bins (slots), each for a narrow frequency range. That is what the next chapter is about.
For the enthusiast: the formulas of the FFT and the iFFT
The FFT is a fast way to compute the discrete Fourier transform (DFT). For
a block of N complex samples x[0]…x[N−1] the forward path (time →
frequency) is:
and the reverse path (frequency → time) is almost the same sum, with a plus sign in the exponent and
a division by N:
Each X[k] is a complex number (a hand, §1): the length is how much of
frequency k is present, the angle is the phase. Bin k belongs to
frequency fk = k · Fs / N (with Fs
the sample rate); bins above N/2 count as negative frequencies (§6).
The core is the term e−j·2π·k·n/N — via Euler
ejθ = cosθ + j·sinθ that is precisely a rotating
hand with speed k. The FFT in effect compares the signal against all those
rotation speeds at once and adds up where they “keep pace”. The fast in FFT is a
clever computation scheme that does this in N·log N steps instead of
N² — most efficient when N is a power of 2 (§8).
In this chain the forward FFT runs at the wide block size Nfft
(after multiplication with the Hann window, §5); after selecting/zeroing bins (§7, §9) the
reverse iFFT runs at the smaller size Nifft (128 or 256).
One warning if you put this next to the source code: the textbook iFFT above
divides by N, but the implementation scales by a fixed factor
1/Nfft (not 1/Nifft). The final level
is set by the AGC anyway, so that exact scale factor is not critical — but do not blindly
assume 1/N for it.
4. The spectrum in small slices (bins)
Because we will later want to filter sharply, we chop the whole wide band into very small frequency slices. Each slice is called a bin. In our case one bin is 62.5 Hz wide. The smaller the bin, the more sharply we can cut later.
The bin size follows directly from the FFT size: bin = bandwidth / number of FFT points.
To get that 62.5 Hz over a band of 1.536 MHz, we therefore need:
That is therefore a large FFT. (With a narrower band — for example 384 kHz — it is correspondingly smaller: 384,000 / 62.5 = 6,144 points.) The bin size deliberately always stays 62.5 Hz, regardless of the bandwidth, so that the rest of the arithmetic stays simple and consistent.
5. Windowing — softening the edges
The FFT works on a block of samples that we cut out hard. As a result, at the start the signal jumps abruptly from 0 to its first value, and at the end it stops just as abruptly back to 0. Those are two steps.
And you already saw in §3 what a step means according to Fourier: a broad spread of harmonics (think of the step and the square in figure 2). Precisely those artificial harmonics — caused by the abrupt edges, not by the signal itself — smear energy across bins that do not belong. That is spectral leakage. The larger the value at which the block starts or ends, the larger the step and the worse the leakage.
The solution: multiply the block by a soft curve that smoothly runs to 0 at both edges (a Hann window). Then there is no longer an abrupt step at start and end → far fewer artificial harmonics → a much cleaner spectrum. (It does not become completely leak-free — the window itself also has a shape — but the leakage drops drastically.)
For the enthusiast: why the spectrum display overlaps so much
The window tapers the block edges to zero, so an event that lands right at an edge barely counts in that block. By computing the FFT 8× as often, (a) every piece of signal lands in at least one block near the centre — full weighting, nothing is structurally lost — and (b) you get many more waterfall rows per second: a smoother image and finer time resolution, so short events (CW dots, a lightning impulse, a deep fade) aren't missed. Unlike the channelizer, the spectrum display doesn't reconstruct audio, so that extra overlap is purely for the image; the only cost is a little more computation.
6. One complex FFT (I and Q)
Our time signal is complex: it consists of I and Q together (I + jQ, §1).
That is why we use a complex FFT — which processes I and Q together in one operation.
And whether you implement it as two ordinary (real) FFTs — one for I and one for Q — or as one complex FFT, makes no difference to the final result: the outcome is exactly the same, provided you combine the two outcomes correctly (the I outcome plus j × the Q outcome). Only the implementation differs; the complex FFT is simply the natural, compact form for a complex (I/Q) signal.
Why we need both I and Q in the first place: only with both can the FFT tell positive and negative frequencies apart — see the frequencies above and below the tuning point separately. With strength only (one real number) the spectrum becomes symmetric and you cannot distinguish left and right of the centre; for an SDR that is useless, because there are different stations there.
7. You only want a small slice
After the FFT we have very many small slices of spectrum (bins). But you are only interested in a very small part: one radio channel. An SSB signal, for example, is about 3 kHz wide — that is only a handful of bins (3000 / 62.5 ≈ 48 bins) out of the many thousands.
We want to get rid of all the other bins. How we turn those few wanted bins back into audible sound, we'll see now.
8. The iFFT — back to sound
You cannot hear frequencies; we have to go back to a signal in time. That is what the inverse FFT (iFFT) does: precisely the reverse operation of the FFT. We take only the bins that we want to hear and put them into an iFFT.
The iFFT computes most efficiently when you take a power of 2 number of points (32, 64, 128, 256, …). So: take the number of bins you want to hear and round it up to the nearest power of 2. That is the minimal idea; in practice ThetisLink chooses, for practical reasons, a slightly more generous, fixed size — why that is handy you can read in §9.
The result is the information of only that slice of frequency, converted back into a time signal — the familiar I/Q signal. Neatly filtered to roughly the band you wanted to hear, and with a much lower sample rate (because we went from millions of bins back to a few hundred). The FFT + iFFT together therefore also immediately do the decimation: bringing the high IQ rate down to a low audio rate.
9. In practice: fixed sample rate + zeros = filter
In practice we do it just a bit differently, for two reasons:
- We want a fixed, tidy sample rate that fits nicely with the further audio processing in the computer.
- We want to be able to vary the bandwidth quickly — for example by dragging the filter edge in the spectrum.
That is why we do not take exactly “the bins we want to hear”, but a fixed number of bins corresponding to, for example, 8 kHz. That is more spectrum than we are interested in, but the sample rate is now fixed at 8 kS/s. (8000 / 62.5 = 128 bins → an iFFT of 128 points, and that is conveniently a tidy power of 2.)
Next, we set zeros in that iFFT on all the frequencies we do not want to hear. What remains is, in time, a signal neatly filtered to ~3 kHz, and at the same time a constant sample rate of 8 kS/s. Setting to zero is therefore the filter — razor-sharp, because we cut exactly on bin boundaries.
Want to hear a wider or narrower slice? Then you simply fill in more or fewer zeros in the right place in the iFFT. More bins open = wider passband; more zeros = narrower. That way you change your bandwidth directly, without changing anything about the sample rate.
10. Fine-tuning below the bin (the NCO)
One problem: bins lie on a fixed grid of 62.5 Hz. But an SSB station is almost never exactly on such a grid point — and 62.5 Hz off, a voice already sounds clearly too high or too low. So we must be able to tune more finely than the bin grid allows.
An NCO (numerically controlled oscillator) is in effect a small, perfectly pure tone that we multiply against our signal. As a result the whole signal shifts by exactly as many Hz as we want — smoothly, without clicks, and with arbitrarily fine precision.
11. Demodulation — getting sound out of the signal
We now have a clean, narrow I/Q signal on the right frequency. Demodulation extracts the audible sound from it, and that works just a bit differently per mode, because the information is hidden in it differently.
SSB (USB / LSB) — just the real part
With single sideband, the speech is directly in the signal once it has been shifted to zero frequency (that was already done by the bin selection and the NCO). The sound is simply the real part of the signal. Upper (USB) or lower (LSB) sideband you already determined by choosing which bins you kept.
AM — reading the envelope
With AM the information is in the strength (amplitude). We measure the magnitude of the signal — the envelope — and subtract the constant carrier (DC), so that only the speech/music remains.
FM — how fast the phase rotates
With FM the information is in the frequency: the transmitter pushes the frequency back and forth with the sound. Frequency is “how fast the phase rotates”. We measure from sample to sample how much the phase rotated — that difference is the sound.
12. AGC — automatic volume
One station is blasting, another is a whisper. The AGC (automatic gain control) adjusts the volume automatically: weak signals are amplified, strong ones turned down, so that everything comes through at a pleasant, even level in your headphones.
13. Narrowband vs wideband (8 or 16 kHz)
The iFFT choice from §9 directly determines the audio sample rate. Two settings:
- Narrowband — 128 bins = 8 kHz: audio up to ~4 kHz, fine for speech (telephone quality), little data.
- Wideband — 256 bins = 16 kHz: audio up to ~8 kHz, noticeably clearer, ~2× as much data.
The same trade-off as always: quality versus data usage. On a wired network, wideband is almost free; on a tight connection you choose narrowband.
14. Overlap-add — seamless sound
We process the signal in little blocks. If you simply stick them end to end, you hear a little click at every seam — especially because the window already tapered the edges (§5). Solution: let each block overlap the previous one by half and add them together. Where one block fades out, the next one comes up — together a smooth, uninterrupted stream.
15. The spectrum and waterfall display
The nice picture in the client — the spectrum (energy per frequency) and the waterfall (the same, scrolling downward over time) — comes from a separate, much higher-resolution FFT. The audio needs small, fast little blocks (low delay); the display instead wants large blocks (sharp resolution). Two different trade-offs, two separate FFTs.
- dB scale: strength logarithmic, so that a whisper and a cannon shot are both visible at once.
- Peak-hold with decay: peaks jump up immediately and then sink away calmly — calmer on the eye.
- Zoom and pan: zooming in on a slice of band; the display centres around your tuning frequency.
- Dragging filter edges: the translucent bar shows your passband. Drag the edges and you set your bandwidth directly — exactly the open bins from §9.
16. Signal strength — the S-meter
How strong is a signal? The basis is surprisingly simple: the power (the
“power”) of a complex sample is I² + Q² — the length of the hand
squared (think back to the clock analogy from §1). The longer the hand, the stronger the
signal.
For a whole channel you do not measure one sample, but add up the power of all bins within your passband — the fine spectrum bins from §15 (not the coarser audio bins from §9). That sum is the strength of precisely that channel — signal and noise together, within your filter. That is immediately handy: if you set your filter narrower, you measure less noise along with it.
Because signals span an enormous range (from a whisper just above the
noise to a deafening station), we use — just as with sound — a logarithmic scale: dB,
and in this case dBm (power relative to 1 milliwatt). Radio amateurs read this off as
S-units: on HF, S9 = −73 dBm, and each S-unit lower is 6 dB weaker. Above S9
one counts further in dB (“S9 + 20 dB”).
One more thing: the FFT gives a relative power (a number without an absolute unit). To turn that into a real dBm, the scale must be calibrated. That is done with a fixed calibration offset, so that the displayed value matches the actual signal strength. And to prevent the meter from twitching nervously, the value is averaged (and/or the peak held briefly) — just like the slow mechanism of an old-fashioned needle S-meter.
17. Why this is so powerful — one FFT, many receivers
The real advantage of this method becomes apparent as soon as you want to hear many different, independent signals at the same time.
The classic way: each receiver gets its own expensive high-speed downconverter and filters — physical hardware, anew for each receiver. Ten listeners = ten times that expensive chain. That quickly gets out of hand and scales poorly.
This method turns it around: you compute the large FFT only once and share it. Each listener pulls their own narrow slice out of it with a small iFFT. And pulling that slice out costs — in proportion to that one large FFT — very little computation. So you do the expensive part once; every extra listener after that is almost free.
That is why this method scales so well: all the heavy work is in that one large FFT, and every extra listener after that is almost free. Want a hundred people listening in at the same time, each on their own frequency? Then you still compute only one large FFT — not a hundred expensive receive chains. That is the power of this approach.
18. Filter properties in detail
In §9 we saw that “setting bins to zero” amounts to filtering. But what kind of filter do you then get exactly? Below are the properties you also encounter in a receiver specification — filter shape, dynamic range and group delay — with the numbers included. This is deeper material; you do not need it for the big picture.
Filter shape and steepness
Selecting bins is a rectangular mask in the frequency domain: a bin participates fully (×1) or not at all (×0). The filter edge therefore lies on a grid of one bin:
The mask choice itself is rock-hard, but the actual transition is partly determined by the main lobe of the Hann window (§5): that is ~1–2 bins wide, so the slope runs over about 60–125 Hz. (It is the same window property that also sets the stopband floor — see the next paragraph.) For an SSB filter of 2.4 kHz that is a transition region of only ~3–5% of the passband width — much steeper than a crystal or mechanical filter in a classic receiver (transition often 10–30%).
That transition in Hz is roughly constant (that ~bin), regardless of how wide your filter is set. For wide filters the shape factor (ratio of the width at a deep level to that at −6 dB) is therefore very close to 1 — almost an ideal rectangle. For a very narrow filter (CW, e.g. 300 Hz) that same fixed ~62.5 Hz slope becomes a larger fraction of the passband, and the shape factor as a ratio increases — although the slope in Hz remains equally steep.
Dynamic range and ultimate suppression
Two things bound the dynamic range (amplitude range):
- The computational side is hardly a limit. Everything runs in 32-bit floating point; the ~24 bit mantissa gives, around a chosen scale, about 144 dB of relative precision (the numerical noise floor), so the computational precision is rarely the first limit. The real ceiling lies before the channelizer: the ADC/DDC of the SDR (the ANAN has a 16-bit ADC ≈ 96 dB instantaneous, and in a narrow channel effectively more thanks to process gain).
- The ultimate out-of-band suppression is set by the window, not by the bin mask. A strong signal just outside your passband leaks via the sidelobes of the Hann window (§5) a little into the passband anyway — through the nearest edge bin (the last retained, non-zeroed bin). Two things to read the numbers by: the 0 dB reference is the passband level (hence dBc), and the distance is counted from that band edge — not from the band centre or 0 Hz (for SSB the band isn't even symmetric about 0). The first sidelobe sits at ~−31 dBc, ~1 bin (62.5 Hz) outside the edge; beyond that it drops ~18 dB per octave, where an "octave" = a doubling of the distance from the edge (≈ −49 dBc at 2 bins, −67 at 4, −85 at 8). A neighbour right next to the band (~−31 dBc) is therefore the practical limit; further away it drops quickly.
Want to suppress deeper? Then that is the classic window trade-off: a Blackman-Harris window achieves ~−90 dB sidelobes, but the main lobe (and thus the transition region) becomes ~2× as wide. Steeper and deeper at the same time simply takes more compute length — a larger window, a larger FFT, or a long FIR; it is not free (see the figure).
So the FFT/iFFT method gives very sharp filtering, but with a fixed shape (rectangular bin mask + Hann skirt). An additional FIR is only worthwhile when you actually need a different shape — for example a deliberate passband tilt, a matched/raised-cosine shape for data modes, or a notch inside the band. Making it steeper is possible too, but only with a very large FIR (hundreds of taps), not a simple one. For listening to SSB, AM and FM a flat, sharp band is exactly what you want, so an additional FIR is not needed here.
Group delay — flat, like a linear-phase FIR
This is a strong asset of the method. The bin mask is real (a bin participates or not — a number without phase rotation), the Hann window is symmetric, and the NCO is a pure frequency shift (only phase rotation, no amplitude). Together these add no skewed phase: in the passband the response is linear in phase — just like a FIR filter. Consequence:
- The group delay is flat across the whole passband: all frequencies come through equally fast, so no phase distortion. A very steep filter edge does still give some post-/pre-ringing, but thanks to the linear phase it is symmetric — not the skewed, smeared-to-one-side distortion of an analogue or IIR filter.
- Two numbers you should not confuse: the group delay itself is about
half a block length (~8 ms). The total algorithmic delay is larger, because you must
first fill a whole block before you can compute:
Nfft / Fs = 1 / bin width = 1 / 62.5 Hz ≈ 16 ms. Both are larger than with an analogue filter, but they are constant.
A classic receiver filter (crystal, mechanical, or an IIR DSP filter) instead has a low absolute delay, but exhibits a group-delay peak around the band edges — the phase does not run neatly linearly there, which smears transients. The FFT method swaps that around: a slightly higher but flat delay, without phase distortion. For listening to SSB, AM and FM that delay is inaudible; the flat phase you instead hear as a clean, “tight” sound.
Optional: a FIR after the iFFT
The bin selection already gives a steep, near-rectangular passband, but the exact shape is determined by the window (a slightly sloping edge and a sidelobe level that sits at ~−31 dB right next to the band and rolls off ~18 dB/octave beyond). Want to give the transfer a precisely specified shape? Then you can put a FIR filter after the iFFT. That is relatively cheap, because after the iFFT everything runs at the low output rate (8 or 16 kHz). Such a FIR could:
- make the skirt sharper still — hugging the ideal rectangular mask more closely;
- flatten the slightly sloping passband edge;
- push the stopband far below the window floor (real −60…−80 dB);
- set the transition region to exactly the right size.
But there is a trade-off. A symmetric (linear-phase) FIR keeps the group delay flat; a sharper skirt then costs more taps — i.e. more latency. If you want it sharper without that extra delay, you give up the linear phase and get group-delay distortion at the band edges back (just like a classic analogue or crystal filter). In short: sharpness versus flat phase/latency — a deliberate design choice. It remains the well-known recipe: coarse channel selection + decimation with the FFT, fine shaping with a FIR.
In closing — everything together
One wide time signal (up to 1.536 MHz) goes through a large complex FFT and is laid out in bins of 62.5 Hz; we keep only the bins of the desired channel and set the rest to zero (= filtering); a fixed iFFT (128/256 bins) converts that back into an I/Q time signal with a constant sample rate; the NCO shifts it precisely onto frequency; per mode we demodulate it; the AGC controls the volume; and as compact audio compressed with Opus, it goes to the client. Because this is a separate cut-out per listener, many people can listen at the same time and independently within the same band.
And unfortunately it is not free. The FFT and iFFT cost computation time that grows with the size (roughly proportionally — strictly speaking N·log N). But the delay sits mostly elsewhere: you must first collect a whole block of samples before you can compute, and the more finely you split up the spectrum (narrower bins), the longer that block must be. Fine frequency detail and low latency therefore pull against each other — the classic time-frequency trade-off. (In this chain it is fixed at 62.5 Hz per bin ≈ 16 ms block; a higher DDC rate makes the FFT larger and the view wider at the same detail and the same latency, but costs more computing power.) So a balance must always be found between delay (latency) and spectrum detail — precisely the trade-off that keeps coming back throughout this whole story.
That is, in a nutshell, how a VRX gets from radio wave to sound. Want to know how that sound (plus the spectrum, the S-meter and all the controls) then reaches your client over the network? That is in the sister document “From server to client”.