How a VRX works

From a wide “photo” of the band to the sound in your headphones — an explanation in plain language for those without a DSP background. With pictures.

Nederlands  ·  English

The big idea

A VRX (virtual receiver) turns one wide stream of radio data into, in principle, as many independent receivers as you like. The SDR digitises an entire band in one go. The server holds on to that wide stream and “cuts” out a narrow slice of its own for each listener — each on its own frequency, mode and bandwidth. Nobody gets in anybody else's way.

Analogy. Picture a satellite photo of an entire city. One shot, but a hundred people can zoom in on it at the same time, each on their own street. The VRX does that with radio spectrum instead of a photo.
Wide IQfrom the SDR FFT→ frequencies channelselect+filter iFFT→ time fine-tune(NCO) demod +volume (AGC) sound→ client
Figure 1 — The whole chain at a glance. Each block is explained below.

Contents

1. From radio wave to numbers (IQ)

A modern SDR (Software Defined Radio) converts a wide chunk of the band directly into numbers. At each moment it measures two values, called I and Q. Together these describe not only how strong the signal is, but also its phase — and it is precisely that phase we will need later on to recognise frequencies and modulation.

Analogy. I and Q are like the x and y position of a hand on a clock. One number (strength only) tells you how long the hand is; two numbers (I and Q) also tell you which way it points. That direction (phase) is half the story.

2. The wide, unfiltered signal

It starts with an unfiltered time signal, and that is wide — in our case up to as much as 1.536 MHz wide. That is a fast stream of I/Q pairs (well over one and a half million per second) containing everything there is to hear on that piece of band: all stations jumbled together, strong and weak, next to and on top of each other. At this point nothing has been filtered out yet; it is the complete, wide “photo” of the band.

Our goal: to later pull out one narrow radio channel from this (for example an SSB signal of 3 kHz), neatly filtered, ready to listen to. To do that we first need to know where the frequencies are — and that is exactly what the FFT does.

3. The FFT — time becomes frequency

The wide signal is a course over time. But we want to know which frequencies are in it — where the stations are. To do that, the whole signal goes through an FFT, which converts the time domain into the frequency domain (exactly the picture of a spectrum analyser).

FFT stands for Fast Fourier Transform. Fourier is a mathematical way to convert from the time domain to the frequency domain. It is a conversion to a different presentation: the same information, displayed differently. Nothing is lost and nothing is added — the signal “in time” and the same signal “in frequency” contain exactly the same information.

Because this is a mathematical computation model, the conversion costs some computation time, and you first need a whole block of samples before you can compute. That is the practical limitation: it costs a little bit of delay (latency).

Analogy. A prism splits white light into a rainbow of colours. The FFT is the prism for radio: it splits the signal into all its frequencies. The light before and after the prism is the same light — just presented differently.

One and the same signal — in time and in frequency

A few familiar signals, on the left as they look in time, on the right how those same signals look in the frequency domain:

in time in frequency sinestep squareimpulse sawtooth 1 frequency lots low, falling off fast fundamental + odd harmonics all frequencies equally strong (flat) fundamental + all harmonics
Figure 2 — The same signal, two presentations. A pure sine is one single frequency; a short impulse instead contains all frequencies at once; square and sawtooth are a fundamental plus a series of harmonics. You meet that short, powerful impulse in practice as lightning / an atmospheric discharge: one very brief signal that contains every frequency at once, which is why it shows up in the waterfall as a horizontal streak right across the band.

The output of the FFT is a row of bins (slots), each for a narrow frequency range. That is what the next chapter is about.

For the enthusiast: the formulas of the FFT and the iFFT

The FFT is a fast way to compute the discrete Fourier transform (DFT). For a block of N complex samples x[0]…x[N−1] the forward path (time → frequency) is:

X[k] = Σn=0N−1 x[n] · e−j·2π·k·n / N   for k = 0 … N−1

and the reverse path (frequency → time) is almost the same sum, with a plus sign in the exponent and a division by N:

x[n] = (1/N) · Σk=0N−1 X[k] · e+j·2π·k·n / N

Each X[k] is a complex number (a hand, §1): the length is how much of frequency k is present, the angle is the phase. Bin k belongs to frequency fk = k · Fs / N (with Fs the sample rate); bins above N/2 count as negative frequencies (§6).

The core is the term e−j·2π·k·n/N — via Euler e = cosθ + j·sinθ that is precisely a rotating hand with speed k. The FFT in effect compares the signal against all those rotation speeds at once and adds up where they “keep pace”. The fast in FFT is a clever computation scheme that does this in N·log N steps instead of — most efficient when N is a power of 2 (§8).

In this chain the forward FFT runs at the wide block size Nfft (after multiplication with the Hann window, §5); after selecting/zeroing bins (§7, §9) the reverse iFFT runs at the smaller size Nifft (128 or 256).

One warning if you put this next to the source code: the textbook iFFT above divides by N, but the implementation scales by a fixed factor 1/Nfft (not 1/Nifft). The final level is set by the AGC anyway, so that exact scale factor is not critical — but do not blindly assume 1/N for it.

4. The spectrum in small slices (bins)

Because we will later want to filter sharply, we chop the whole wide band into very small frequency slices. Each slice is called a bin. In our case one bin is 62.5 Hz wide. The smaller the bin, the more sharply we can cut later.

The bin size follows directly from the FFT size: bin = bandwidth / number of FFT points. To get that 62.5 Hz over a band of 1.536 MHz, we therefore need:

1,536,000 Hz ÷ 62.5 Hz = 24,576 FFT points → a large FFT (≈ 24,576 bins)

That is therefore a large FFT. (With a narrower band — for example 384 kHz — it is correspondingly smaller: 384,000 / 62.5 = 6,144 points.) The bin size deliberately always stays 62.5 Hz, regardless of the bandwidth, so that the rest of the arithmetic stays simple and consistent.

1 bin = 62.5 Hz frequency →
Figure 3 — The spectrum divided into bins of 62.5 Hz each.

5. Windowing — softening the edges

The FFT works on a block of samples that we cut out hard. As a result, at the start the signal jumps abruptly from 0 to its first value, and at the end it stops just as abruptly back to 0. Those are two steps.

And you already saw in §3 what a step means according to Fourier: a broad spread of harmonics (think of the step and the square in figure 2). Precisely those artificial harmonics — caused by the abrupt edges, not by the signal itself — smear energy across bins that do not belong. That is spectral leakage. The larger the value at which the block starts or ends, the larger the step and the worse the leakage.

The solution: multiply the block by a soft curve that smoothly runs to 0 at both edges (a Hann window). Then there is no longer an abrupt step at start and end → far fewer artificial harmonics → a much cleaner spectrum. (It does not become completely leak-free — the window itself also has a shape — but the leakage drops drastically.)

hard cut — starts at +1, ends at −1 +1 −1 big step big step big steps → strong harmonics → leakage Hann window → sharp, no leakage time → time →
Figure 4 — Left: 5.5 waves that start at +1 and end at −1. The block therefore starts with a big step from 0 to +1 and stops with a big step from −1 to 0; big steps = strong harmonics = lots of leakage. Right: the Hann window smoothly tapers both edges to 0 → no abrupt step, far less leakage.
In ThetisLink. Both the audio channelizer and the spectrum display use a Hann window. The audio channelizer overlaps successive blocks by 50% (see §14); the spectrum display overlaps even more — ~87.5% (a hop of ⅛ block) — for a smoother image.
For the enthusiast: why the spectrum display overlaps so much

The window tapers the block edges to zero, so an event that lands right at an edge barely counts in that block. By computing the FFT 8× as often, (a) every piece of signal lands in at least one block near the centre — full weighting, nothing is structurally lost — and (b) you get many more waterfall rows per second: a smoother image and finer time resolution, so short events (CW dots, a lightning impulse, a deep fade) aren't missed. Unlike the channelizer, the spectrum display doesn't reconstruct audio, so that extra overlap is purely for the image; the only cost is a little more computation.

6. One complex FFT (I and Q)

Frequently asked question. Does all of this go into one FFT?

Our time signal is complex: it consists of I and Q together (I + jQ, §1). That is why we use a complex FFT — which processes I and Q together in one operation.

And whether you implement it as two ordinary (real) FFTs — one for I and one for Q — or as one complex FFT, makes no difference to the final result: the outcome is exactly the same, provided you combine the two outcomes correctly (the I outcome plus j × the Q outcome). Only the implementation differs; the complex FFT is simply the natural, compact form for a complex (I/Q) signal.

Why we need both I and Q in the first place: only with both can the FFT tell positive and negative frequencies apart — see the frequencies above and below the tuning point separately. With strength only (one real number) the spectrum becomes symmetric and you cannot distinguish left and right of the centre; for an SDR that is useless, because there are different stations there.

Analogy. With strength only you know how loudly someone shouts, but not whether it comes from the left or the right. With I and Q you hear the difference between left and right — between frequencies above and below the centre.

7. You only want a small slice

After the FFT we have very many small slices of spectrum (bins). But you are only interested in a very small part: one radio channel. An SSB signal, for example, is about 3 kHz wide — that is only a handful of bins (3000 / 62.5 ≈ 48 bins) out of the many thousands.

We want to get rid of all the other bins. How we turn those few wanted bins back into audible sound, we'll see now.

8. The iFFT — back to sound

You cannot hear frequencies; we have to go back to a signal in time. That is what the inverse FFT (iFFT) does: precisely the reverse operation of the FFT. We take only the bins that we want to hear and put them into an iFFT.

The iFFT computes most efficiently when you take a power of 2 number of points (32, 64, 128, 256, …). So: take the number of bins you want to hear and round it up to the nearest power of 2. That is the minimal idea; in practice ThetisLink chooses, for practical reasons, a slightly more generous, fixed size — why that is handy you can read in §9.

The result is the information of only that slice of frequency, converted back into a time signal — the familiar I/Q signal. Neatly filtered to roughly the band you wanted to hear, and with a much lower sample rate (because we went from millions of bins back to a few hundred). The FFT + iFFT together therefore also immediately do the decimation: bringing the high IQ rate down to a low audio rate.

Analogy. The FFT takes the rainbow out of the light; the iFFT joins the remaining colours back together into one beam. Because we filtered out almost all the colours, what comes out is a narrow, clean signal: exactly the station you are listening to.

9. In practice: fixed sample rate + zeros = filter

In practice we do it just a bit differently, for two reasons:

That is why we do not take exactly “the bins we want to hear”, but a fixed number of bins corresponding to, for example, 8 kHz. That is more spectrum than we are interested in, but the sample rate is now fixed at 8 kS/s. (8000 / 62.5 = 128 bins → an iFFT of 128 points, and that is conveniently a tidy power of 2.)

Next, we set zeros in that iFFT on all the frequencies we do not want to hear. What remains is, in time, a signal neatly filtered to ~3 kHz, and at the same time a constant sample rate of 8 kS/s. Setting to zero is therefore the filter — razor-sharp, because we cut exactly on bin boundaries.

Want to hear a wider or narrower slice? Then you simply fill in more or fewer zeros in the right place in the iFFT. More bins open = wider passband; more zeros = narrower. That way you change your bandwidth directly, without changing anything about the sample rate.

the 128-bin iFFT (= 8 kHz, fixed rate) — USB example open (~3 kHz) zeroed (gone) frequency → = carrier (0 Hz), at index 0 (left)
Figure 5 — A fixed iFFT (here 128 bins = 8 kHz). The passband sits against the carrier (0 Hz, left): for USB you keep the bins just above it (the lowest audio frequencies) and zero the rest. More or fewer zeros = wider or narrower bandwidth. (LSB is the mirror image on the other side; AM keeps both.)
In ThetisLink. The iFFT is 128 bins in narrowband (= 8 kHz) or 256 bins in wideband (= 16 kHz). You drag the filter edges on the spectrum; the client/server translate that into “how many bins open” (in steps of 62.5 Hz). SSB keeps one sideband open, AM/FM both.

10. Fine-tuning below the bin (the NCO)

One problem: bins lie on a fixed grid of 62.5 Hz. But an SSB station is almost never exactly on such a grid point — and 62.5 Hz off, a voice already sounds clearly too high or too low. So we must be able to tune more finely than the bin grid allows.

Analogy. The bins bring you to the nearest parking space (on 62.5 Hz). But you want to be right at the door. The NCO is the last bit of walking: it shifts the signal that small remainder further (down to a fraction of a Hz).
bin bin bin nearest bin desired frequency NCO shifts this remainder (≤ 31 Hz) frequency →
Figure 6 — The bin gets you close; the NCO shifts the last remainder (at most half a bin).

An NCO (numerically controlled oscillator) is in effect a small, perfectly pure tone that we multiply against our signal. As a result the whole signal shifts by exactly as many Hz as we want — smoothly, without clicks, and with arbitrarily fine precision.

11. Demodulation — getting sound out of the signal

We now have a clean, narrow I/Q signal on the right frequency. Demodulation extracts the audible sound from it, and that works just a bit differently per mode, because the information is hidden in it differently.

SSB (USB / LSB) — just the real part

With single sideband, the speech is directly in the signal once it has been shifted to zero frequency (that was already done by the bin selection and the NCO). The sound is simply the real part of the signal. Upper (USB) or lower (LSB) sideband you already determined by choosing which bins you kept.

AM — reading the envelope

With AM the information is in the strength (amplitude). We measure the magnitude of the signal — the envelope — and subtract the constant carrier (DC), so that only the speech/music remains.

FM — how fast the phase rotates

With FM the information is in the frequency: the transmitter pushes the frequency back and forth with the sound. Frequency is “how fast the phase rotates”. We measure from sample to sample how much the phase rotated — that difference is the sound.

AM: read the envelope the orange envelope (top + bottom) touches the peaks — that is the sound time → FM: how fast the hand rotates fast rotation = high tone, slow = low tone
Figure 7 — AM: the sound is the envelope (orange). FM: the sound is how fast the phase hand rotates.
In ThetisLink. Supported: USB, LSB, AM, SAM and FM. SSB = real part, AM = magnitude, FM = phase difference between successive samples. SAM keeps — like AM — both sidebands, but does not read the envelope: it takes the real part (a coherent detector) and subtracts a slowly tracking DC value. That gives less distortion than ordinary AM, provided you are sitting neatly on the carrier (this implementation does not yet have automatic carrier tracking/PLL).

12. AGC — automatic volume

One station is blasting, another is a whisper. The AGC (automatic gain control) adjusts the volume automatically: weak signals are amplified, strong ones turned down, so that everything comes through at a pleasant, even level in your headphones.

Analogy. Like a sound engineer at an interview: if someone blasts out loud, you turn down quickly (“attack”); if it goes quiet, you bring it back up calmly (“decay”). The AGC does that automatically — here turning down quickly (10 ms) and up slowly (500 ms), so that a short peak does not immediately mute everything.

13. Narrowband vs wideband (8 or 16 kHz)

The iFFT choice from §9 directly determines the audio sample rate. Two settings:

The same trade-off as always: quality versus data usage. On a wired network, wideband is almost free; on a tight connection you choose narrowband.

In ThetisLink. One switch in the client puts the RX audio — the Thetis reception and the VRX channels — on narrow- or wideband (reception only; transmitting stays wideband). The latency stays the same; only the amount of data changes.

14. Overlap-add — seamless sound

We process the signal in little blocks. If you simply stick them end to end, you hear a little click at every seam — especially because the window already tapered the edges (§5). Solution: let each block overlap the previous one by half and add them together. Where one block fades out, the next one comes up — together a smooth, uninterrupted stream.

sum = constant, seamless audio time hop = ½ block 1 block (N samples)
Figure 8 — Overlap-add happens in time: the horizontal axis is time, not frequency. Successive, tapered audio blocks (each a Hann window, §5) start half a block length apart, so they overlap 50% and are summed. Where one block fades out, the next comes up — the sum is a flat, seamless audio stream.

15. The spectrum and waterfall display

The nice picture in the client — the spectrum (energy per frequency) and the waterfall (the same, scrolling downward over time) — comes from a separate, much higher-resolution FFT. The audio needs small, fast little blocks (low delay); the display instead wants large blocks (sharp resolution). Two different trade-offs, two separate FFTs.

Analogy. The spectrum is a snapshot (a bar chart of “how much signal where”); the waterfall is a timeline scrolling downward, so that you still see a short beep from seconds ago.

16. Signal strength — the S-meter

How strong is a signal? The basis is surprisingly simple: the power (the “power”) of a complex sample is I² + Q² — the length of the hand squared (think back to the clock analogy from §1). The longer the hand, the stronger the signal.

For a whole channel you do not measure one sample, but add up the power of all bins within your passband — the fine spectrum bins from §15 (not the coarser audio bins from §9). That sum is the strength of precisely that channel — signal and noise together, within your filter. That is immediately handy: if you set your filter narrower, you measure less noise along with it.

power of the bins in the band → add up → dBm frequency → S135 7S9 +20+40+60 S9 = −73 dBm
Figure 9 — The strength of a channel = the summed power of the bins in the passband, converted to dBm and plotted on the familiar S scale.

Because signals span an enormous range (from a whisper just above the noise to a deafening station), we use — just as with sound — a logarithmic scale: dB, and in this case dBm (power relative to 1 milliwatt). Radio amateurs read this off as S-units: on HF, S9 = −73 dBm, and each S-unit lower is 6 dB weaker. Above S9 one counts further in dB (“S9 + 20 dB”).

One more thing: the FFT gives a relative power (a number without an absolute unit). To turn that into a real dBm, the scale must be calibrated. That is done with a fixed calibration offset, so that the displayed value matches the actual signal strength. And to prevent the meter from twitching nervously, the value is averaged (and/or the peak held briefly) — just like the slow mechanism of an old-fashioned needle S-meter.

In ThetisLink. For the VRX channels, the client itself computes the S-meter: it integrates the power of the received spectrum bins within the passband, converts that with a fixed (empirically calibrated) offset to dBm, and applies meter ballistics — fast attack, slow decay, plus a short peak-hold. The ordinary main RX (RX1/RX2) has its own route: there the server delivers a ready-made value.

17. Why this is so powerful — one FFT, many receivers

The real advantage of this method becomes apparent as soon as you want to hear many different, independent signals at the same time.

The classic way: each receiver gets its own expensive high-speed downconverter and filters — physical hardware, anew for each receiver. Ten listeners = ten times that expensive chain. That quickly gets out of hand and scales poorly.

This method turns it around: you compute the large FFT only once and share it. Each listener pulls their own narrow slice out of it with a small iFFT. And pulling that slice out costs — in proportion to that one large FFT — very little computation. So you do the expensive part once; every extra listener after that is almost free.

one large FFT — all bins, shared by everyone frequency → small iFFT small iFFT small iFFT listener 13.630 LSB listener 23.700 USB listener 33.800 AM
Figure 10 — The principle: compute the large FFT once; after that each listener taps off their slice with their own small, cheap iFFT — each on a different frequency, mode and bandwidth.

That is why this method scales so well: all the heavy work is in that one large FFT, and every extra listener after that is almost free. Want a hundred people listening in at the same time, each on their own frequency? Then you still compute only one large FFT — not a hundred expensive receive chains. That is the power of this approach.

Back to the satellite photo. You take that one expensive photo once. Letting a hundred people zoom in on it afterwards costs almost nothing — exactly like one FFT feeding a hundred cheap iFFT cut-outs.
In ThetisLink (TL2). This is the principle that makes the method so powerful with very many simultaneous users. TL2 itself uses it more modestly: there is one VRX per receiver — VRX1 on VFO-A (RX1) and VRX2 on VFO-B (RX2). Multiple clients can listen in to the same VRX, but they then share the settings of that channel. The scalability above is the headroom this architecture offers, not a limit that is already being approached.

18. Filter properties in detail

In §9 we saw that “setting bins to zero” amounts to filtering. But what kind of filter do you then get exactly? Below are the properties you also encounter in a receiver specification — filter shape, dynamic range and group delay — with the numbers included. This is deeper material; you do not need it for the big picture.

Filter shape and steepness

Selecting bins is a rectangular mask in the frequency domain: a bin participates fully (×1) or not at all (×0). The filter edge therefore lies on a grid of one bin:

bin width = Fs / Nfft = 1,536,000 / 24,576 = 62.5 Hz

The mask choice itself is rock-hard, but the actual transition is partly determined by the main lobe of the Hann window (§5): that is ~1–2 bins wide, so the slope runs over about 60–125 Hz. (It is the same window property that also sets the stopband floor — see the next paragraph.) For an SSB filter of 2.4 kHz that is a transition region of only ~3–5% of the passband width — much steeper than a crystal or mechanical filter in a classic receiver (transition often 10–30%).

That transition in Hz is roughly constant (that ~bin), regardless of how wide your filter is set. For wide filters the shape factor (ratio of the width at a deep level to that at −6 dB) is therefore very close to 1 — almost an ideal rectangle. For a very narrow filter (CW, e.g. 300 Hz) that same fixed ~62.5 Hz slope becomes a larger fraction of the passband, and the shape factor as a ratio increases — although the slope in Hz remains equally steep.

Dynamic range and ultimate suppression

Two things bound the dynamic range (amplitude range):

Want to suppress deeper? Then that is the classic window trade-off: a Blackman-Harris window achieves ~−90 dB sidelobes, but the main lobe (and thus the transition region) becomes ~2× as wide. Steeper and deeper at the same time simply takes more compute length — a larger window, a larger FFT, or a long FIR; it is not free (see the figure).

0−30−60 −90−120−150 dBc (rel. to passband) −6−5−4 −3−2−1 0+1+2+3 frequency rel. to tuning point (kHz) — 1 bin = 62.5 Hz passband (LSB, 3 kHz: −3…0 kHz) 32-bit compute/quantisation noise ~−145 dBc bins outside the edge: 1:−31 2:−49 4:−67 8:−85 … ideal mask resultant
Figure 11 — The filter response for a concrete 3 kHz LSB channel, with the tuning point (the carrier) at 0 kHz. LSB sits below the carrier, so the passband runs from −3 to 0 kHz; for convenience the upper band edge falls exactly on 0 Hz (left −6 kHz, right +3 kHz). The resultant (blue) is flat at 0 dBc in the band; at each edge it descends the real Hann cascade: ~−31 dBc at 1 bin (62.5 Hz) outside the edge, then ~18 dB per octave — so −49 at 2 bins, −67 at 4, −85 at 8, −103 at 16… — sinking down to the 32-bit compute/quantisation noise floor (~−145 dBc), the real lower bound (not the −60 drawn earlier for simplicity). The x-axis is frequency relative to the tuning point; 1 bin = 62.5 Hz, and 3 kHz = 48 bins. The ideal mask (grey) is the perfect rectangle.

So the FFT/iFFT method gives very sharp filtering, but with a fixed shape (rectangular bin mask + Hann skirt). An additional FIR is only worthwhile when you actually need a different shape — for example a deliberate passband tilt, a matched/raised-cosine shape for data modes, or a notch inside the band. Making it steeper is possible too, but only with a very large FIR (hundreds of taps), not a simple one. For listening to SSB, AM and FM a flat, sharp band is exactly what you want, so an additional FIR is not needed here.

Group delay — flat, like a linear-phase FIR

This is a strong asset of the method. The bin mask is real (a bin participates or not — a number without phase rotation), the Hann window is symmetric, and the NCO is a pure frequency shift (only phase rotation, no amplitude). Together these add no skewed phase: in the passband the response is linear in phase — just like a FIR filter. Consequence:

A classic receiver filter (crystal, mechanical, or an IIR DSP filter) instead has a low absolute delay, but exhibits a group-delay peak around the band edges — the phase does not run neatly linearly there, which smears transients. The FFT method swaps that around: a slightly higher but flat delay, without phase distortion. For listening to SSB, AM and FM that delay is inaudible; the flat phase you instead hear as a clean, “tight” sound.

Optional: a FIR after the iFFT

The bin selection already gives a steep, near-rectangular passband, but the exact shape is determined by the window (a slightly sloping edge and a sidelobe level that sits at ~−31 dB right next to the band and rolls off ~18 dB/octave beyond). Want to give the transfer a precisely specified shape? Then you can put a FIR filter after the iFFT. That is relatively cheap, because after the iFFT everything runs at the low output rate (8 or 16 kHz). Such a FIR could:

But there is a trade-off. A symmetric (linear-phase) FIR keeps the group delay flat; a sharper skirt then costs more taps — i.e. more latency. If you want it sharper without that extra delay, you give up the linear phase and get group-delay distortion at the band edges back (just like a classic analogue or crystal filter). In short: sharpness versus flat phase/latency — a deliberate design choice. It remains the well-known recipe: coarse channel selection + decimation with the FFT, fine shaping with a FIR.

In ThetisLink. The current chain uses a Hann analysis window with 50% overlap-add and the rectangular bin mask — no extra FIR and no synthesis window. For amateur SSB/AM/FM that response is more than enough; the steep slopes (~62.5 Hz transition) and the flat group delay come for free. A post-processing FIR is the logical knob if a textbook-exact passband shape or deeper stopband is ever needed.

In closing — everything together

One wide time signal (up to 1.536 MHz) goes through a large complex FFT and is laid out in bins of 62.5 Hz; we keep only the bins of the desired channel and set the rest to zero (= filtering); a fixed iFFT (128/256 bins) converts that back into an I/Q time signal with a constant sample rate; the NCO shifts it precisely onto frequency; per mode we demodulate it; the AGC controls the volume; and as compact audio compressed with Opus, it goes to the client. Because this is a separate cut-out per listener, many people can listen at the same time and independently within the same band.

And unfortunately it is not free. The FFT and iFFT cost computation time that grows with the size (roughly proportionally — strictly speaking N·log N). But the delay sits mostly elsewhere: you must first collect a whole block of samples before you can compute, and the more finely you split up the spectrum (narrower bins), the longer that block must be. Fine frequency detail and low latency therefore pull against each other — the classic time-frequency trade-off. (In this chain it is fixed at 62.5 Hz per bin ≈ 16 ms block; a higher DDC rate makes the FFT larger and the view wider at the same detail and the same latency, but costs more computing power.) So a balance must always be found between delay (latency) and spectrum detail — precisely the trade-off that keeps coming back throughout this whole story.

That is, in a nutshell, how a VRX gets from radio wave to sound. Want to know how that sound (plus the spectrum, the S-meter and all the controls) then reaches your client over the network? That is in the sister document “From server to client”.