Ant64 Audio System — More information...
Vision
The Ant64 audio section should match and exceed the specification of the best synthesizers currently available (reference: Waldorf Quantum MK2, Sequential Prophet X), while being capable of authentic analog synthesis emulation from an entirely digital signal path, with an optional analog output/character stage. All DSP runs in the FireStorm using fixed-point arithmetic exclusively — no floating point anywhere in the signal path.
Competitive Reference — State of the Art (2025/26)
| Synth | Voices | Osc/voice | Synthesis types | Filters/voice | Analog? |
|---|---|---|---|---|---|
| Waldorf Quantum MK2 | 16 | 3 (×5 algos) | WT, VA, Granular, Resonator, Kernel/FM | 2× analog 24/12dB LP + digital SVF | Hybrid |
| Sequential Prophet X | 16 | 4 (2 VA + 2 sample) | VA + sample | Stereo analog LP | Hybrid |
| Sequential Prophet-10 | 10 | 2 | VA (subtractive) | Analog LP | Full analog |
| Moog One | 8/16 | 3 | VA (subtractive) | Analog LP/HP/Notch | Full analog |
| Access Virus TI2 | 80 | 3 | VA + WT + FM | Digital multimode | Full digital |
| Waldorf Kyra (FPGA) | 128 | 10 | VA + WT | Digital 12/24dB LP/BP/HP | Full digital |
The Kyra is the key benchmark — a commercially shipped pure-FPGA synth proving 128 voices with 10 oscillators each is achievable in real hardware. Ant64 FireStorm targets this as a minimum, not a ceiling.
Ant64 targets:
- 128 voices minimum for VA and FM engines (matching Kyra, far exceeding all others)
- 256 voices for sample playback and simpler FM configurations
- Three first-class synthesis paradigms — Analog Style, Digital Sample, FM/OP
- All three available simultaneously and mixable per patch
- Fixed-point FPGA DSP throughout, optional analog output character stage
Three Synthesis Paradigms
The Ant64 treats synthesis as three equal, first-class paradigms — not one engine with bolt-on extras. Any voice slot can run any engine. Patches can layer all three simultaneously.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ PARADIGM 1 │ │ PARADIGM 2 │ │ PARADIGM 3 │
│ │ │ │ │ │
│ ANALOG STYLE │ │ DIGITAL SAMPLE │ │ YAMAHA / FM │
│ (VA / Subtr.) │ │ (S&S / PCM) │ │ (OP-based) │
│ │ │ │ │ │
│ Juno · Moog │ │ Korg M1 · JD800 │ │ DX7 · TX81Z │
│ Prophet · 303 │ │ ROMpler · S&S │ │ OPL · OPN │
│ M-86 · Hoover │ │ Piano · Strings │ │ FM bass · EP │
│ 128 voices │ │ 256 voices │ │ 128 voices │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
└───────────────────►▼◄────────────────────┘
Voice Mix Bus (S32)
Global FX Chain · Output Stage
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Pulse RP2350 │
│ MIDI in/out · patch management · parameter control │
│ voice allocation · sequencer · jog dials │
└───────────────────────┬─────────────────────────────────┘
│ QSPI + 4-line PIO fast transfer
┌───────────────────────▼─────────────────────────────────┐
│ FireStorm (GoWin 138k / 60k) │
│ │
│ Time-multiplexed DSP pipeline — 128–256 voices │
│ (one physical pipeline serves all voices sequentially) │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Voice Mix Bus (S36, all voices accumulated) │ │
│ └──────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼───────────────────────────┐ │
│ │ Global FX chain (Reverb · EQ · Compressor) │ │
│ └──────────────────────┬───────────────────────────┘ │
└─────────────────────────┼───────────────────────────────┘
│ Stereo S24 PCM → WM8958/60 codec
┌───────────────▼─────────────────┐
│ Optional Analog Stage │
│ (character · saturation · VCF) │
└───────────────┬─────────────────┘
│ Line out / headphones / HDMI / optical
FireStorm Clock Architecture & Voice Budget
Clock Domains
The GoWin GW5AT-138 is fabricated on TSMC 22nm ULP process. It contains several independent clock domains fed from on-chip PLLs:
| Domain | Clock | Source | Notes |
|---|---|---|---|
| BSRAM | 380MHz | Hard silicon spec | Dedicated hard memory blocks — not constrained by fabric routing |
| Fabric | ~200MHz (target) | PLL → synthesis result | Programmable LUT/FF logic — speed depends on critical path |
| DSP blocks | ~300MHz+ | Hard silicon | Dedicated multiply-accumulate blocks, independent of fabric |
| Pixel clock | 74.25MHz (720p) · 148.5MHz (1080p) | PLL | HDMI timing generation |
| Audio clock | 12.288MHz (derived) | PLL | 48kHz × 256 = standard I²S master clock |
The fabric clock is not a fixed silicon spec — it is the maximum frequency at which the synthesised logic meets timing after place-and-route. A deeply pipelined design with short combinational paths achieves a higher fabric clock than one with long unbroken logic chains.
Fabric Clock Estimation — Derived from BSRAM
The BSRAM:fabric clock ratio on comparable FPGAs:
| Device | Process | BSRAM | Fabric (complex design) | Ratio |
|---|---|---|---|---|
| Xilinx Artix-7 | 28nm | ~600MHz | 250–350MHz | 1.7–2.4× |
| Intel Cyclone 10 GX | 20nm | ~600MHz | 200–350MHz | 1.7–3× |
| GoWin GW2A | 55nm | ~200MHz | ~150MHz | ~1.3× |
| GoWin GW5AT (this device) | 22nm | 380MHz | ~200–250MHz | ~1.5–1.9× |
For a complex pipelined design combining an audio DSP engine and a 2D rasterizer, a ratio of 1.5–1.9× is typical. This gives a fabric clock estimate of:
Conservative target: 200MHz (ratio 1.9×)
Optimistic target: 250MHz (ratio 1.5× — achievable with careful timing closure)
200MHz is used as the design target throughout this document. All voice counts and performance figures are stated at 200MHz. 250MHz figures are noted where relevant as the optimistic case.
A key property of the 380MHz BSRAM is that at 200MHz fabric:
1 fabric clock period = 5.0ns
1 BSRAM clock period = 2.63ns
BSRAM cycles per fabric cycle: 5.0 / 2.63 = 1.9×
A BSRAM read issued at one fabric clock edge returns its result before the next fabric clock edge. BSRAM reads in the audio DSP pipeline are effectively zero-stall — the tanh lookup table, wavetable samples, BLEP correction table, and BBD delay taps are all available within a single fabric pipeline stage.
Cycle Budget — 200MHz at 48kHz
Fabric clock: 200,000,000 Hz
Audio sample rate: 48,000 Hz
Cycles per sample period: 4,167 cycles
Allocation:
┌────────────────────────────────────────┬───────────────────┐
│ Voice pipeline (all engines) │ ~3,700 cycles │
│ Post-mix global effects │ ~385 cycles │
│ Control / state management │ ~82 cycles │
│ │ │
│ TOTAL │ 4,167 cycles │
└────────────────────────────────────────┴───────────────────┘
Rasterizer: runs on SEPARATE fabric resources and SRAM B simultaneously.
Does not consume any of the above 4,167 cycles.
Voice Pipeline — Cycle Cost Per Voice
The audio engine is time-multiplexed: one physical DSP pipeline is shared across all voices. Each voice is processed in sequence within a single sample period. The pipeline is fully pipelined — a new voice enters each clock cycle, independent of other voices (no inter-voice data dependency).
VA Voice (2 oscillators, ladder filter 4-pole, SVF, BBD, VCA, 2 envelopes, 2 LFOs)
Stage Cycles Notes
─────────────────────────────────────────────────────────────────
Load voice state from BSRAM 1 380MHz BSRAM — returns within 1 fabric cycle
Phase accumulator × 2 oscillators 2 Parallel
Wavetable reads × 2 (BSRAM) 2 Parallel, BLEP table also read here
Oscillator mix 1
Ladder pole 1 (multiply + tanh + sum) 3 tanh → BSRAM LUT, 1 cycle read
Ladder pole 2 3 Sequentially dependent on pole 1
Ladder pole 3 3 Sequentially dependent on pole 2
Ladder pole 4 3 Sequentially dependent on pole 3
SVF (2 integrators, parallel) 3
BBD delay tap read + wet/dry mix 1 BSRAM, zero-stall
VCA 1
Envelope ADSR update 2
LFO update 1 Can overlap with envelope
Write state back to BSRAM + BBD write 1 Pipelined, no stall
Mix bus accumulate 1
─────────────────────────────────────────────────────────────────
TOTAL 29 cycles per VA voice
The four ladder poles are the critical serialised path — each pole's output feeds the next pole's input within the same sample, so they cannot be parallelised. This is the physics of the ladder filter topology and applies to any correct implementation. The tanh nonlinearity is a BSRAM lookup at 380MHz — it adds only 1 cycle, not a multi-cycle stall as it would be with a computed approximation.
FM Voice (6-operator, DX7-compatible algorithms with partial branch parallelism)
Stage Cycles Notes
─────────────────────────────────────────────────────────────────
Phase accumulator × 6 operators 12 2 cycles each
Sine LUT × 6 (BSRAM) 6 1 cycle each, pipelined
Modulation apply + level × 6 12 2 cycles each
Parallel branch reduction (algorithm) 6 Some algorithms allow 2 ops simultaneously
Output routing + carrier sum 3
Envelope per operator × 6 6 Overlap with pipeline stages above
─────────────────────────────────────────────────────────────────
TOTAL (with ~30% parallelism saving) ~28 cycles per 6-op FM voice
DX7 algorithms with parallel carrier branches (e.g. algorithms 1, 2, 5, 6) allow two operators to be computed simultaneously, reducing effective cycle count. The 28-cycle figure reflects this average across all 32 algorithms.
Sample Voice (S&S playback with linear interpolation)
Stage Cycles Notes
─────────────────────────────────────────────────────────────────
Sample read × 2 (for interpolation) 2 BSRAM or SRAM A depending on sample size
Linear interpolation (multiply + add) 2
Loop boundary check + wrap 1
Stereo pan matrix (2 multiplies) 2
VCA 1
─────────────────────────────────────────────────────────────────
TOTAL 7 cycles per sample voice
Voice Count Derivation
Available cycles for voice pipeline: ~3,700 (leaving ~467 for effects + control)
| Engine | Cycles/voice | Voices at 200MHz | Voices at 250MHz |
|---|---|---|---|
| VA (full: 2 osc, ladder+SVF, BBD) | 29 | 128 | 192 |
| VA (optimised timing closure) | 29 | 192 | 256 |
| FM 6-operator | 28 | 128 | 192 |
| FM 4-operator | ~19 | 192 | 256 |
| Sample playback | 7 | 512 | 512 |
128 VA voices is the conservative, guaranteed target at 200MHz — matching the Waldorf Kyra, the only comparable shipped FPGA synthesiser, but with a far more complex per-voice signal chain (nonlinear ladder + SVF + BBD vs the Kyra's simpler digital filter). 192 VA voices is achievable with careful timing closure and pipeline optimisation at 200MHz, and 256 at 250MHz.
The Kyra achieved 128 voices × 10 oscillators on a Xilinx Artix-7 (28nm) with a simpler filter architecture. The FireStorm runs on 22nm silicon with a faster BSRAM and a deeper per-voice chain. The comparison is favourable.
Mixed-Engine Voice Allocation
When multiple engines run simultaneously, the cycle budget is shared:
At 200MHz (3,700 cycle voice budget):
128 VA (29×128 = 3,712) ............. full budget — VA only
96 VA + 48 FM (2,784 + 1,344) ..... 4,128 — slightly over, needs optimisation
96 VA + 32 FM + 64 sample .......... 2,784 + 896 + 448 = 4,128 — same
64 VA + 64 FM + 64 sample .......... 1,856 + 1,792 + 448 = 4,096 — comfortable
At 250MHz (4,800 cycle voice budget):
192 VA ................................ 5,568 — exceeds, use 160 VA at 250MHz
128 VA + 64 FM + 64 sample .......... 3,712 + 1,792 + 448 = 5,952 — use 128+48+64
128 VA + 48 FM + 64 sample .......... 3,712 + 1,344 + 448 = 5,504 — comfortable
The voice allocator in the workstation app manages the cycle budget dynamically. Patches declare their engine type; the allocator enforces the total cycle budget and notifies the musician if a configuration exceeds capacity.
Practical default targets (stated in marketing/spec sheet):
| Spec | Value | Basis |
|---|---|---|
| VA voices | 128 | Guaranteed at 200MHz, conservative |
| FM voices (6-op) | 128 | Guaranteed at 200MHz |
| Sample voices | 512 | Comfortable at 200MHz |
| Mixed simultaneous | up to 192 | Depends on engine mix — allocator managed |
| At optimised 250MHz | up to 256 | All engines, with timing closure |
Post-Mix Global Effects — Cycle Budget
Global effects run after all voice outputs are summed to a stereo bus. They are not per-voice — the cycle cost is the same regardless of voice count.
Effect Cycles Notes
──────────────────────────────────────────────────────────────────
FDN Reverb (8×8 feedback delay network,
8 delay lines, full matrix multiply) 300 Largest single effect
4-band parametric EQ (biquad × 4) 40 Stereo — 2 channels × 4 × 5 muls
Stereo compressor / limiter 30
STFT analysis hop amortised 15 512-pt FFT ÷ 512 samples = ~15/sample
Mix bus saturation (soft clip) 5
──────────────────────────────────────────────────────────────────
TOTAL 390 cycles (9.4% of 4,167 sample budget)
The effects budget is trivially small relative to the total. The reverb, EQ, and compressor together consume fewer cycles than processing two VA voices.
Rasterizer Performance
The rasteriser runs on completely independent fabric and SRAM B — orthogonal to the audio DSP pipeline. It consumes none of the 4,167-cycle audio sample budget. Full rasteriser performance figures (triangles/frame, cycle costs, spectrogram render timing) are documented in the Display Architecture reference.
BSRAM Capacity Allocation
GoWin GW5AT-138 total BSRAM: 6.3Mb = ~787KB
| Use | Size | Notes |
|---|---|---|
| Voice state (256 VA voices × 63 bytes) | 16KB | Phases, filter states, envelopes, LFOs |
| Wavetable (256pt × 16-bit) | 512B | One cycle per waveform — multiple stored |
| BLEP correction table (1024 × 16-bit) | 2KB | Anti-aliasing correction per oscillator |
| tanh lookup (4096 × 36-bit) | 18KB | Nonlinear ladder saturation |
| Sine LUT for FM (4096 × 16-bit) | 8KB | FM operator sine generation |
| Font atlas (128 glyphs × 16×16 × 1-bit) | 4KB | ImGui text rendering |
| Colour LUT for spectrogram (256 × 3B) | 768B | Viridis / Inferno / etc. — swap at runtime |
| Scratch / intermediate | ~30KB | Pipeline staging, STFT twiddle factors |
| Total used | ~79KB | 10% of available BSRAM |
BSRAM usage is light — 90% is available for additional tables, larger LUTs, expanded wavetable sets, or future features. The BBD delay lines are intentionally kept in SRAM A (dedicated 36-bit audio bus) rather than BSRAM, because 128 voices × 25ms stereo delay = 13.8MB, which far exceeds the 787KB BSRAM ceiling. SRAM A handles this cleanly with no bus contention.
Per-Voice Engine
Each voice runs time-multiplexed on a shared fixed-point DSP pipeline in FireStorm. At 200MHz with 128 VA voices, each voice gets approximately 29 pipeline stages per sample period — all within the 4,167-cycle budget at 48kHz.
Oscillator Block (3 per voice)
Each oscillator independently selects its synthesis algorithm:
1. Virtual Analog (VA) — subtractive, Juno/Prophet/Moog territory
- Phase accumulator:
U32tuning word (gives ~0.01 cent resolution at 48 kHz) - Waveforms: sawtooth, PWM-sawtooth (Alpha Juno style — essential for M-86/hoover), pulse, triangle, sine, square, sub-octave
- BLEP anti-aliasing via pre-computed correction table in BRAM
- Hard sync between oscillators
- Cross-modulation (oscillator 1 FM-ing oscillator 2)
2. Wavetable — PPG/Waldorf territory
- 128-step wavetables, up to 128 tables per bank
- Smooth interpolation between table positions (bilinear, S24 fixed-point)
- Position sweepable by LFO, envelope, mod matrix, or MIDI note
- Custom wavetable upload from AntOS via DBFS
3. FM — Yamaha DX7 and beyond
- Up to 6 operators per oscillator slot (configurable algorithm routing)
S24.8fixed-point for operator levels- All classic DX7 algorithms plus free routing mode
- Operator envelopes: 4-stage (matches DX7 rate/level model)
4. Granular
- Circular sample buffer (up to 4 seconds at 48 kHz, stored in FPGA BRAM or external SRAM)
- Per-grain: position scatter, pitch scatter, duration, envelope shape, pan
- Grain density: 1–200 grains/second
- Live input granularisation supported
5. Physical Modeling
- Karplus-Strong string model (delay line in BRAM + one-pole LP damping filter)
- Waveguide wind/reed model
- Suitable for plucked strings, bowed strings, blown tubes
6. Additive
- 64 partials per voice, each with independent amplitude/frequency envelope
- IFFT resynthesis path (optional, higher latency)
Filter Block (2 per voice)
Two fully independent filter units per voice, series or parallel routing:
Filter 1 — Nonlinear Ladder (Moog/Roland character)
- Huovilainen model: 4× cascaded one-pole LP stages with nonlinear tanh feedback
- tanh implemented as 12-bit addressed BRAM lookup table (pre-computed,
S1.15format) - Modes: 24 dB/oct LP · 12 dB/oct LP · 18 dB/oct LP
- Full resonance to self-oscillation
- Input:
S24, internal state:S32with careful scaling to prevent overflow
Filter 2 — State Variable Filter (SVF)
- Simultaneous LP / HP / BP / Notch outputs selectable per voice
- 12 or 24 dB/oct
- Suitable for formant filtering, comb filter mode, phaser stages
- Topology: Chamberlin two-integrator loop,
S32fixed-point
Filter routing
- Serial: F1 → F2 (4-pole into SVF — very flexible)
- Parallel: F1 + F2 mixed (phase cancellation effects possible)
- F2 only, F1 only
Envelope Generators (4 per voice)
Independent assignment to any modulation destination:
- ENV1 — standard ADSR with optional hold segment
- ENV2 — ADSR (typically filter)
- ENV3 — ADSR (free assign)
- ENV4 — Multi-stage loopable (up to 8 segments, Waldorf-style)
Each envelope is exponential (not linear) for musical response. Implemented as a multiply-accumulate with a per-segment decay constant. Rate responds to keyboard tracking (higher notes = faster envelopes, like real analog).
LFO Block (4 per voice + 2 global)
Per-voice LFOs (4):
- Waveforms: sine, triangle, sawtooth, reverse saw, square, sample & hold, smoothed S&H
- Rate: 0.01 Hz – 20 kHz (audio-rate modulation supported)
- Sync: free, tempo-sync, note-triggered, key-sync
- Delay + fade-in time
Global LFOs (2):
- Same spec but shared across all voices (authentic to Juno/JP-8 behaviour)
- Can be switched to per-voice for richer polymodulation
Modulation Matrix
64 slots, each: Source → Destination × Amount
Sources include: ENV1-4, LFO1-6, velocity, aftertouch (mono + poly), mod wheel, pitch bend, key tracking, note number, random (per-note), MIDI CC 0–127, oscillator output (audio-rate mod)
Destinations include: all oscillator parameters (pitch, PW, wavetable pos, FM ratio/index), both filter cutoffs, both resonances, all envelope rates/levels, all LFO rates/depths, VCA level, pan, effect parameters
This exceeds the Quantum's modulation depth and matches a mid-size Eurorack system in routing flexibility.
Amplitude (VCA)
- One VCA per voice:
sample × envelope_level, singleS32multiply - Velocity scaling: linear or exponential curve, configurable per patch
- Pan position: per-voice stereo placement (constant power)
Paradigm 2 — Digital Sample Engine (S&S / PCM + Live Sampling)
Inspired by the Korg M1, Roland JD-800/JV series, and the S&S (Sample and Synthesis) approach of the late 1980s–90s. The DAC chip's audio input extends this far beyond a traditional ROMpler — the Ant64 can sample both ahead of time and in real time.
This is a significant competitive differentiator. The Sequential Prophet X has no live audio input at all — samples must be loaded via USB. The Waldorf Quantum MK2 had live input but has been discontinued (April 2025). No current production synth at any price combines live real-time sampling with the full synthesis architecture described here.
Voice budget: up to 256 simultaneous voices — sample playback is computationally cheaper than VA (no nonlinear filter), so the pipeline can run more simultaneously.
Sample Sources — Four Modes
Mode A — Pre-loaded Samples (traditional S&S)
- Samples stored in DBFS as BLOBs, loaded to SRAM on patch activate
- Sample format: 16-bit signed PCM, mono or stereo, 44.1/48 kHz
- Velocity layers: up to 8 per note zone
- Full keyboard mapping: different sample per key range
- Import from SD card, USB, or AntOS file manager
- Use case: piano, strings, brass, choir, drum kits — realistic acoustic instruments
Mode B — Real-Time Live Sampling
- Audio input → ADC on DAC chip → DMA ring buffer in SRAM
- Capture on demand: press record, play a note/chord/phrase — captured immediately
- Auto-loop detection: FireStorm DSP finds zero-crossings for clean loop points
- Latency from capture to playable: < 5ms (one DMA buffer period)
- Use case: sample a guitar chord, a vocal phrase, an external synth — play it instantly across the keyboard at any pitch
Mode C — Resample the Synth Output
- Route the Ant64's own mixed output back through the ADC input
- Capture a complex layered patch as a single sample
- Then play that sample back through a new synthesis layer on top
- Classic technique: Quantum's "self-recording", Ensoniq workflow
- Use case: freeze a complex evolving pad as a static sample, layer it with VA leads; capture a granular texture and play it chromatically; reduce polyphony load by freezing background layers
Mode D — Live Granular (streaming granular)
- Audio input streams directly into the granular engine without pre-recording
- No buffer delay: granular parameters applied to live input in real time
- Grain position jitter, pitch scatter, density, envelope all modulatable live
- Use case: real-time granular processing of a vocalist, guitarist, or any audio source — turns the Ant64 into a live granular effects processor as well as a synthesizer
Sample Oscillator (all modes)
- Phase accumulator reads through sample buffer (interpolated, U32 pointer)
- Pitch shifting: ±48 semitones from root pitch with fine-tune
- Loop modes: no loop · forward loop · ping-pong · release loop
- Loop crossfade: short crossfade window at loop point removes clicks (FireStorm DSP)
- Stereo samples: both channels preserved through stereo voice path
Sample Filter (SVF)
- Chamberlin two-integrator SVF: LP / HP / BP / Notch, 12 or 24 dB/oct
- Lighter compute than nonlinear ladder — enables 256 voice budget
- ENV2 → filter cutoff for classic S&S brightness shaping
Envelopes (2 per voice, ADSR)
- ENV1 → VCA (volume) — exponential, keyboard-tracked rates
- ENV2 → filter cutoff
Granular Engine (all sample sources)
- Grain size: 1ms – 500ms
- Grain density: 1 – 200 grains/second
- Position scatter: random offset from playhead position (creates texture)
- Pitch scatter: random pitch variation per grain (±24 semitones)
- Pan scatter: random stereo position per grain
- Grain envelope: Gaussian, trapezoidal, or rectangular window
- Reverse grains: per-grain random reversal flag
Layering with Other Paradigms
Any voice slot can layer a sample engine voice under or over a VA or FM voice. A patch can combine: FM electric piano (P3) + VA analog pad (P1) + live granular texture from mic input (P2 Mode D) — all playing simultaneously, all going through the shared filter and effects chain. No current hardware synth does all three at once.
Paradigm 3 — FM / Operator Engine (Yamaha Style)
Full frequency modulation synthesis in the DX7 / TX81Z tradition. An FM voice is built from operators — each operator is a simple unit: a sine wave oscillator with its own ADSR envelope and output level. Operators modulate each other according to an algorithm (a routing diagram), producing complex, harmonically rich timbres from simple building blocks.
Voice budget: up to 128 simultaneous voices at 6 operators per voice.
What an Operator Is
┌───────────────────────────────────┐
│ OPERATOR │
│ │
│ Frequency ratio (coarse + fine) │
│ ──► Phase accumulator (U32) │
│ ──► Sine lookup (BRAM, 1024pt) │
│ ──► × Output level (TL, 0–99) │
│ ──► × ADSR envelope │
│ ──► × Velocity scaling │
│ ──► × Key rate scaling │
│ ──► Output (modulates or sums) │
└───────────────────────────────────┘
Each operator produces a sine wave (or alternative waveform — see below) at a ratio of the base pitch, shaped by its own envelope and level. Carriers sum to audio output. Modulators feed their output into the phase of another operator, adding harmonics.
Operator Count and Waveforms
| Mode | Operators/voice | Voices | Reference |
|---|---|---|---|
| 2-op | 2 | 256 | OPL2 (AdLib) |
| 4-op | 4 | 192 | TX81Z, OPN |
| 6-op | 6 | 128 | DX7, DX5 |
| 8-op | 8 | 96 | Beyond DX7 — Ant64 exclusive |
Waveforms per operator — the TX81Z already extended DX7's sine-only operators to 8 waveforms. Ant64 supports 16 per operator, stored as 1024-point tables in BRAM:
| # | Waveform | Character |
|---|---|---|
| 0 | Sine | Classic FM, pure |
| 1 | Half sine | Brighter, more even harmonics |
| 2 | Absolute sine | Full-wave rectified, buzzy |
| 3 | Quarter sine (pulse) | Hollow, wooden |
| 4 | Sawtooth | Harsh, bright |
| 5 | Square | Hollow, 303-like when FM modulated |
| 6 | Triangle | Soft, flute-like |
| 7–15 | Custom wavetable | User-defined, uploaded via AntOS |
Mixing waveforms across operators gives far more timbral variety than DX7's sine-only approach — closer to the Vector synthesis of the Prophet VS.
FM Algorithms
An algorithm defines which operators are carriers (summed to output) and which are modulators (fed into another operator's phase). Ant64 supports:
- All 32 DX7 algorithms (fully compatible — DX7 patch import supported)
- All 8 TX81Z/DX11 4-op algorithms
- Free routing mode — any operator can modulate any other, including self-feedback, parallel chains, stacked towers — a full modular FM signal graph per voice
Per-Operator ADSR
Each operator has an independent 4-stage envelope controlling its output level:
- Attack rate, Decay rate, Sustain level, Release rate
- Exponential curves (matches DX7 behaviour — rate, not time)
- Keyboard rate scaling: higher notes use faster envelope rates (natural instrument feel)
- Velocity sensitivity per operator: velocity can boost modulator depth for dynamic timbre
This is the key musical insight of FM — modulator envelope depth controls brightness. A fast modulator attack with slow decay gives a percussive click + evolving tone. Slow modulator attack gives a swelling, building timbre. This is what makes DX7 electric pianos, bells, marimbas, and basses so expressive.
Fixed-Point FM Implementation
FM synthesis is entirely integer arithmetic — perfect for fixed-point FPGA:
// Per operator per sample:
phase_acc += frequency_word; // U32 add — wraps naturally
mod_input = previous_operator_out; // S16 from modulating operator
index = (phase_acc >> 22) + (mod_input >> feedback_shift); // 10-bit table address
output = sine_lut[index & 0x3FF]; // BRAM lookup → S16
output = (output * envelope_level) >> 15; // S16 × U15 → S16
The entire 6-operator voice runs in ~60 multiplies and ~60 BRAM lookups per sample cycle. At a 200MHz fabric clock with 48kHz audio, that's 4,167 cycles per sample period — shared across all voices in a time-multiplexed pipeline. At ~28 cycles per 6-op FM voice, 128 simultaneous FM voices fit within the cycle budget with headroom for post-mix effects. See the Clock Architecture & Voice Budget section for full derivation.
DX7 Patch Compatibility
The FM engine is designed to be DX7 sysex-compatible — DX7 .syx patch banks can be
imported via AntOS and converted to Ant64 FM patch format. This gives instant access to
the vast library of DX7 patches: electric pianos, basses, bells, mallets, brass, strings.
Global Effects Chain (post-mix)
Running after all 128–256 voices are summed:
BBD Chorus (M-86 / Juno authentic)
- Circular delay buffer: 2048 samples (~42 ms at 48 kHz)
- Two modulated read taps (one per stereo channel)
- Modulation LFO: triangle, rate/depth configurable
- Mode I (subtle) and Mode II (deep) matching Alpha Juno BBD character
- Sub-sample linear interpolation to prevent zipper noise
- Additional modes: ensemble (3 taps), flanger (short delay + feedback), rotary
Reverb
- Algorithmic: Schroeder/FDN hybrid (8 delay lines in BRAM)
- Types: room, hall, plate, spring (spring particularly useful for guitar/analog character)
- Pre-delay: 0–250 ms
Delay
- Stereo delay, up to 2 seconds
- Tap tempo sync, ping-pong mode
- High-frequency damping per feedback tap (analog tape character)
Distortion / Saturation
- Soft clip (tanh lookup) and hard clip stages
- Bitcrusher (sample rate reduction + bit depth reduction)
- Useful for lo-fi / rave character on individual voices or globally
Real-Time Sound FX Engine
The Ant64 is a home computer as well as a synthesizer. Games, demos, AntOS system events, and applications all need real-time sound effects — not music, but responsive, low-latency audio events triggered by code. The SFX engine is a dedicated subsystem within FireStorm, separate from the music voice pool, providing guaranteed voice availability for non-musical audio regardless of what the music engine is doing.
SFX Voice Pool
A reserved partition of FireStorm voices dedicated to SFX — not shared with music voices. Default allocation: 32 SFX voices (drawn from the total FireStorm budget, leaving 96–224 for music depending on model). Configurable at boot — a game might want 48 SFX voices; a pure music application might release them all to music.
FireStorm total voice budget (200MHz, 48kHz):
┌─────────────────────────────┬──────────────────────────────┐
│ Music voices (default) │ SFX voices (default) │
│ 96 VA + 48 FM + 64 sample │ 32 voices (any engine type) │
│ = 208 music voices │ = reserved, always available│
└─────────────────────────────┴──────────────────────────────┘
Partition is configurable — SFX pool size set at application launch.
SFX voices use the same DSP engines (VA, sample, FM) as music voices — any synthesis type is available for sound effects, not just sample playback. The distinction is ownership and priority: music voices are managed by the sequencer and workstation app; SFX voices are managed by the SFX API.
Priority System
When all 32 SFX voices are in use and a new SFX is triggered, the priority system decides which voice to steal:
| Priority level | Description | Stealing behaviour |
|---|---|---|
| Critical | System sounds, UI feedback | Never stolen — reserved slots |
| High | Important game events (player death, explosion) | Steals from Low first |
| Normal | General SFX (footsteps, impacts, pickups) | Default level |
| Low | Ambient, background, non-essential | Stolen first |
Oldest-first stealing within a priority level — the voice that has been playing longest is the one that gets cut when a new higher-priority sound needs a slot.
Procedural SFX Generation (SFXR-style)
SFX do not need to be pre-recorded samples. Procedural generation from parameterised waveforms covers the entire vocabulary of classic game and system audio — and does it in a few bytes of patch data rather than kilobytes of sample data.
Inspired by DrPetter's SFXR (2007) — the tool that established the vocabulary of indie game audio — the Ant64 SFX engine generates sounds from a compact parameter set evaluated in real time on FireStorm:
SFX Patch (compact binary, ~32 bytes):
┌─────────────────────────────────────────────────────────────┐
│ Waveform type: square / saw / sine / noise / triangle │
│ Base frequency: Hz (or note) │
│ Frequency sweep: Hz/sec (positive = rising, neg = falling)│
│ Frequency delta: acceleration on sweep (exponential feel) │
│ │
│ Amplitude envelope: │
│ Attack: ms │
│ Sustain: ms │
│ Punch: 0–1 (instant volume spike at note on) │
│ Decay: ms │
│ │
│ Duty cycle (square wave): 0–1, with sweep │
│ Vibrato: rate + depth │
│ Arpeggiate: frequency multiplier + speed (chip tune jumps) │
│ │
│ Low-pass filter: cutoff + resonance + cutoff sweep │
│ High-pass filter: cutoff + sweep │
│ │
│ Phaser: offset + sweep + feedback │
│ Retro noise: bit depth + sample rate reduction │
└─────────────────────────────────────────────────────────────┘
Built-in SFX archetypes (one-touch generation, randomisable):
| Archetype | Parameters tuned for | Example |
|---|---|---|
| Coin / pickup | Rising square freq sweep, short decay | Mario coin, item collect |
| Laser / shoot | Falling saw sweep, fast decay | Retro shoot-em-up shot |
| Explosion | Noise + low-pass, long decay, punch | Any explosion |
| Jump | Rising freq sweep, medium decay | Platform game jump |
| Power-up | Arpeggiated rising sequence | Level complete, item upgrade |
| Hit / hurt | Noise burst, short, pitch drop | Damage received |
| Select / blip | Short square tone, fast attack/decay | Menu navigation |
| Zap | Noise + frequency wobble, medium decay | Electric / magic effect |
| Rumble | Sub-bass noise, long sustain | Earthquake, engine |
| Ping | Sine, slow decay | Sonar, notification |
| Alert | Two-tone alternating, repeating | Alarm, warning |
| Ambient hum | Sine + slight vibrato, sustained | Engine, machinery |
Each archetype has randomisable parameters — press RANDOMISE to get a variant in
the same family. This is the SFXR workflow: generate, audition, adjust, accept.
SFX API (AntOS Scripting Bindings + Big Core C++ API)
From AntOS scripts / games (little core):
-- Play a named SFX from the library
sfx.play("coin_pickup")
-- Play with parameter overrides
sfx.play("laser", { pitch = 880, sweep = -200, volume = 0.8 })
-- Play a procedurally generated SFX from a patch struct
sfx.play_patch(my_patch)
-- Trigger at a specific FireStorm voice (bypass priority system)
sfx.play_voice(14, "explosion")
-- Play with spatial position (stereo pan derived from position)
sfx.play_spatial("footstep", { x = 0.3, distance = 1.0 })
-- Stop all SFX of a given name
sfx.stop("ambient_hum")
-- Set global SFX volume (independent of music volume)
sfx.set_volume(0.7)
From big core C++ (games / bare-metal applications):
// Immediate trigger — lowest latency path, direct QSPI write to FireStorm
SFX::play("coin_pickup");
// Parameterised
SFX::play("laser", SFXParams{ .pitch = 880.0f, .sweep = -200.0f });
// Procedural patch
SFXPatch patch = SFXPresets::explosion();
patch.decay_ms = 800;
SFX::play(patch);
// Spatial (2D game — x position maps to stereo pan)
SFX::play_spatial("footstep", Vec2{player.x, player.y});
The big core path writes directly to FireStorm voice registers via QSPI — the lowest possible latency, no OS involvement. The little core path goes through AntOS IPC but is still sub-millisecond for a register write sequence.
SFX Library in DBFS
SFX patches are stored in DBFS as 32-byte compact binary structs — the same storage system used for music patches. A full library of 256 SFX patches occupies 8KB. The library ships with a default set covering the archetypes above and is fully replaceable by the user.
DBFS SFX entry (~32 bytes):
├─ name: char[16] "coin_pickup\0"
├─ waveform: uint8 WAVE_SQUARE
├─ base_freq: float32 523.25 (C5)
├─ envelope: 4× float32 attack/sustain/punch/decay
├─ sweep: float32 freq sweep rate
├─ filter: 4× float32 lp_cutoff/lp_res/hp_cutoff/sweep
├─ arp: 2× float32 multiplier/speed
└─ flags: uint8 retro_noise | phaser | vibrato
Optional Analog Output Stage
A small analog board between the DAC and the line output can add genuine analog character without compromising digital precision:
Option A — Passive (simplest):
- Output transformer for warmth and harmonic saturation
- No active components in signal path
Option B — Active character stage:
- Op-amp saturation stage (TL072 or similar, run warm)
- Single-pole analog LP filter for gentle HF rolloff (removes any DAC artifacts)
- Drives to line level (phono out) and internal speakers
Option C — Analog VCF (maximum authenticity):
- CEM3320 / SSI2144 (modern reissue of the Prophet-5 filter chip) or
- Discrete transistor ladder (Moog style)
- DAC output → analog VCF → line out
- Controlled via a dedicated CV output (DAC-driven CV from FireStorm)
- This gives a genuine analog filter stage identical to a Prophet-5 or Minimoog
- Can be bypassed digitally for clean output
Option C is the most ambitious but gives the Ant64 something the Waldorf Quantum doesn't have: a real, classic analog filter chip in the signal path.
Environmental / Acoustic Post-Processing
Any audio source on the Ant64 — music voices, SFX, MOD player, live input, speech synthesis — can be routed through an environmental processing preset that simulates the acoustic character of a physical space or transmission medium. Underwater, cavern, large hall, metal pipe, telephone, outer space — the processing transforms the dry sound into something that belongs in that environment.
This is distinct from the global FX chain (which applies to the music mix). Environmental processing operates as insert or send buses on individual voices or voice groups, and as a global environment applied to the entire output mix. Multiple environments can run simultaneously — SFX voices in one space, music voices in another.
Implementation on FireStorm
Each environment preset is a configuration of existing FireStorm DSP blocks — no new hardware is needed. The blocks are already present: FDN reverb, parametric EQ, ladder filter, BBD chorus, distortion/saturation, delay, bitcrusher. An environment is a named set of parameters applied to these blocks.
Any audio source
│
├──→ [EQ curve] ← shape the frequency response
│
├──→ [Modulation] ← pitch wobble / tremolo / chorus
│
├──→ [FDN Reverb] ← room size, decay, diffusion, damping
│
├──→ [Delay] ← pre-delay, echo, flutter
│
├──→ [Noise floor] ← add ambient background noise
│
└──→ Processed output
All blocks are in the existing FireStorm DSP pipeline — applying an environment preset is a register write from the big core or AntOS, taking effect within one sample period. Crossfading between environments (smooth transition as a character moves from a room into a corridor) is a linear parameter interpolation over a configurable number of bars or seconds.
Environment Presets
Underwater
The defining acoustic characteristics of underwater audio: dramatic high-frequency absorption (water absorbs high frequencies rapidly), pressure-induced pitch variation, soft low-frequency resonance, and the physical sensation of sound transmitted through a dense medium.
Processing chain:
├─ Low-pass filter: aggressive, -24dB/oct at 600–900Hz
│ cutoff slowly wobbles ±50Hz at 0.3Hz (pressure variation)
├─ Resonance: mild peak at 400Hz (water column resonance character)
├─ Chorus: 2-tap, slow rate (0.2Hz), moderate depth — water diffusion
├─ Pitch modulation: ±4 cents at 0.15Hz (density-of-medium effect)
├─ Reverb: medium decay (1.2s), high diffusion, heavily damped highs
│ pre-delay 8ms — sound travels slower in water (relative feel)
├─ Volume: -4dB overall — water absorbs energy
└─ Optional: broadband noise floor at -48dB — bubbles, water movement
Distinctive and immediately recognisable. Works on any source: muffled underwater music, distant underwater explosions, speech that sounds like it's heard through a pool wall.
Cavern / Cave
Stone surfaces reflect sound with moderate HF absorption. Long reverb tails, strong early reflections from close walls, low-frequency resonance in the cave body, potential flutter echo between parallel surfaces.
Processing chain:
├─ EQ: -3dB shelf above 6kHz (stone absorbs some high frequencies)
│ +2dB at 200–400Hz (room resonance / low-frequency build)
├─ Early reflections: 4–6 discrete delays at 15–60ms, -6 to -18dB
│ simulating close stone walls
├─ Reverb: long decay (2.5–4s), medium diffusion, stone character
│ decay time varies with cavern size preset
├─ Flutter echo: optional — delay at ~80ms with feedback 0.5–0.7 for
│ parallel wall flutter (narrow canyon feel)
├─ Pre-delay: 20–40ms (distance to nearest wall)
└─ Sub-bass boost: +3dB below 80Hz — caves resonate at low frequencies
Variants: small cave (tighter reflections, shorter decay), large cavern (longer pre-delay, 4–6s decay), ice cave (brighter reflections, less HF absorption), lava tube (more low-end, slight distortion character).
Large Hall / Cathedral
The classic reverberant space. Long pre-delay (distance to first reflection), very long decay, high diffusion, wide stereo spread. The sound of music meant to fill a large resonant space.
Processing chain:
├─ EQ: gentle air boost (+2dB at 10kHz) — hall brightness
│ slight low-mid cut (-1.5dB at 300Hz) — reduce muddiness
├─ Pre-delay: 40–80ms (distance to first wall in a large hall)
├─ Reverb: very long decay (3–8s), very high diffusion
│ early reflections at 40–120ms
├─ Stereo spread: maximum — reverb tail fills the full stereo field
└─ Late reverb: gradual HF rolloff in tail (air absorption over distance)
Small Room / Studio
Close, intimate acoustic space. Short reverb, audible early reflections, relatively dry compared to hall. The sound of a padded room or recording booth.
Processing chain:
├─ EQ: slight boxiness (+1dB at 400Hz — close wall resonance)
├─ Early reflections: 4 reflections at 8–25ms, -4 to -10dB
├─ Reverb: short decay (0.3–0.8s), low diffusion
└─ Pre-delay: 2–8ms
Metal Pipe / Tunnel
Resonant cylindrical geometry creates strong modal resonances — specific frequencies ring loudly while others are suppressed. Flutter echo between parallel surfaces. The distinctive metallic coloration of sounds heard through a pipe or ventilation shaft.
Processing chain:
├─ Resonant EQ: sharp peaks at pipe resonant frequencies
│ f_n = n × c / (2L) where L = pipe length, c = 344m/s
│ Example: 5m pipe → resonances at 34Hz, 68Hz, 103Hz...
│ Implemented as 4–6 narrow bandpass peaks in EQ
├─ HF cut: -18dB above 3kHz (pipe walls absorb high frequencies)
├─ Flutter echo: delay at ~30ms (pipe diameter), feedback 0.6–0.75
│ creates the metallic ringing character
├─ Reverb: short, low diffusion (cylindrical geometry = coherent echo)
└─ Distortion: mild saturation (metallic surface coloration)
Telephone / Radio Transmission
Bandpass filtering to match the frequency response of telephone audio (300–3400Hz) or AM radio. Adds noise, mild compression, slight saturation. Immediately recognisable as "heard over a communication channel."
Telephone:
├─ Bandpass: 300Hz–3,400Hz (ITU-T G.711 telephone band)
├─ Distortion: mild saturation (analogue circuit character)
├─ Noise: white noise at -50dB (line noise)
├─ Compression: heavy (3:1, fast attack) — telephone dynamic range limiting
└─ Volume: -2dB overall
AM Radio:
├─ Bandpass: 100Hz–5,000Hz (AM broadcast bandwidth)
├─ Noise: pink noise at -40dB + occasional crackle bursts
├─ Distortion: moderate saturation (AM demodulator character)
└─ Slight flutter: 0.5Hz pitch modulation at ±2 cents (carrier instability)
Walkie-talkie / CB:
├─ Bandpass: 400Hz–2,800Hz (narrower than telephone)
├─ Hard clipping: aggressive — squelch character
├─ White noise: -35dB (radio static)
└─ Gate: noise gate opens on signal (squelch simulation)
Outer Space (Sci-Fi Convention)
Physically, space is silent — no medium to carry sound. The sci-fi convention is a large, reverberant, pristine space with very slow decay and no air absorption. The sound of something massive happening in a vacuum, as heard by the audience rather than physics.
Processing chain:
├─ EQ: flat — no air absorption, all frequencies preserved equally
├─ Reverb: very long decay (6–15s), very high diffusion
│ no HF rolloff in the tail (no air = no absorption)
├─ Pre-delay: 100–200ms (great distance, vastness of space)
├─ Stereo: extreme width — a 180° spatial impression
├─ Pitch shift: very slight down (-5 cents) — gravitational scale suggestion
└─ No noise floor: space is completely silent between events
Variants: close explosion (short pre-delay, massive low-end boost), distant signal (more pre-delay, high-frequency roll-off simulating transmission distance).
Custom Environment
All environment parameters are exposed via the workstation app and AntOS API. Any combination of EQ, reverb, delay, modulation, noise, distortion, and filter settings can be saved as a named custom environment preset in DBFS. Environments are small parameter structs (~128 bytes) and shareable over the gossip network.
Routing Architecture
FireStorm audio sources:
┌────────────────────────────────────────────────────────────┐
│ Music voices (VA, FM, Sample, 303) │
│ SFX voices (procedural, sample-based) │
│ MOD player channels │
│ Live audio input (ADC) │
│ SAM speech synthesis (from Pulse via PIO) │
└──────────────────────┬─────────────────────────────────────┘
│
┌──────────▼──────────────────────────┐
│ Environment Bus Router │
│ │
│ Per-voice or per-group assignment: │
│ ├─ Music → Environment A │
│ ├─ SFX → Environment B │
│ ├─ Voice 1–8 → Environment C │
│ └─ ADC input → Environment D │
└──────────┬──────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
[Env A: Hall] [Env B: Cavern] [Env C: Underwater]
│ │ │
└───────────────┴───────────────┘
│
[Master mix bus]
│
[Global FX chain]
│
[WM8958 output]
Up to 4 simultaneous environments at full polyphony — constrained only by the FireStorm cycle budget. Each active environment is a separate FDN reverb instance plus EQ and modulation chain. At 200MHz fabric, 4 × 390 cycles (effects budget) = 1,560 cycles, leaving 2,607 cycles for voice DSP — still comfortable at 64–96 VA voices per environment.
API
Big core C++ (workstation app, bare-metal):
// Apply environment to a voice group
Audio::setEnvironment(GROUP_SFX, ENV_CAVERN);
Audio::setEnvironment(GROUP_MUSIC, ENV_HALL);
Audio::setEnvironment(GROUP_ADC, ENV_UNDERWATER);
// Crossfade to a new environment over 2 seconds
Audio::crossfadeEnvironment(GROUP_MUSIC, ENV_TUNNEL, 2.0f);
// Custom environment from a preset struct
EnvironmentPreset my_env = EnvironmentPresets::cavern_large();
my_env.reverb_decay_s = 5.0f;
Audio::setCustomEnvironment(GROUP_SFX, my_env);
AntOS scripting bindings:
-- Set environment for a voice group
audio.set_env("sfx", "cavern")
audio.set_env("music", "hall")
-- Smooth crossfade
audio.crossfade_env("music", "underwater", 3.0)
-- Load custom environment from DBFS
local env = dbfs.load_env("my_cave_preset")
audio.set_custom_env("all", env)
The TB-303 is a special case that warrants its own dedicated engine mode rather than being shoehorned into the general voice architecture. What makes it distinctive is not just the filter — it is the complete interaction between the oscillator, the dual envelope system, the accent circuit, the slide/portamento, and the step sequencer. Get any one of these wrong and it stops sounding like acid.
Why the 303 is Hard to Clone
Most 303 clones fail because they treat it as "sawtooth + 18dB filter." The real story:
- The filter is actually 4-pole but with interacting (non-buffered) poles, giving an effective ~18 dB/oct rolloff with a distinctive resonance character and an unusual 10 Hz peak in the resonance feedback circuit (Tim Stinchcombe's 2009 analysis)
- There are two independent envelope generators — not one
- The accent circuit is an RC-based sweep that accumulates across consecutive accented notes — this is what gives the "increasingly distressed animal cry" when accents repeat
- The slide uses a fixed time (not fixed rate) — so the pitch change slows down as the interval shrinks
- The square wave is derived from the sawtooth via single-transistor waveshaping, giving it a subtly different character to a clean square wave
- Overdrive of the output is intrinsic to the acid sound — the filter output into a slightly clipping output stage adds harmonics essential to the genre
TB-303 Engine Architecture (FireStorm)
Step Sequencer (Pulse RP2350)
├─ Note pitch
├─ Gate (normal / extended for slide)
├─ Slide flag → portamento circuit
└─ Accent flag → accent circuit
VCO
├─ Sawtooth (phase accumulator, U32)
└─ Square (derived: sawtooth → single-pole HP waveshaper, not clean square)
VEG (Volume Envelope Generator)
├─ Attack: sharp (near-instant), fixed
├─ Decay: exponential, FIXED long time (~200ms typical)
└─ Drives VCA only
MEG (Main Envelope Generator)
├─ Attack: sharp, fixed
├─ Decay: variable (Decay pot) on normal notes
├─ Decay: FORCED to minimum on accented notes
└─ Drives VCF cutoff via Env Mod amount
Accent Circuit ← triggered only on accented notes
├─ Takes MEG output through RC smoothing network (47kΩ + 33nF)
├─ Adds smoothed voltage to VCA level (louder note)
├─ Adds smoothed voltage to VCF cutoff (brighter note)
└─ CRITICAL: RC does not fully discharge between consecutive accents
→ Repeated accents cause progressively higher filter peaks
→ This stacking behaviour IS the acid sound
Portamento / Slide Circuit
├─ Fixed slide TIME (not rate) — glide always takes same duration
├─ Activated when current note has slide flag set
└─ Stays active until the note after the slid-to note begins
VCF — Diode Ladder (303-specific model)
├─ 4-pole with non-buffered interacting stages (unlike Moog buffered ladder)
├─ Deliberately mismatched capacitor on first pole (C18 ≈ half value)
→ Prevents self-oscillation; overdrive occurs instead
├─ Resonance → overdrive, not self-osc — implement as tanh saturation at
resonance feedback path with clipping asymptote
├─ High-pass filter in resonance feedback loop (unusual 10 Hz peak)
└─ Additional HP filtering between filter output and VCA (bass rolloff)
VCA → Output Saturation Stage
└─ Soft-clip (tanh) at output — essential for acid harmonic character
Fixed-Point Implementation Notes
The accent accumulation is the trickiest part to get right in fixed-point:
// Conceptual fixed-point accent RC model
// Each sample tick:
accent_voltage = accent_voltage * decay_coeff; // S1.30 multiply
if (accent_triggered) {
accent_voltage += meg_output * accent_amount; // add MEG contribution
}
vcf_cutoff += accent_voltage >> accent_depth_shift;
vca_level += (accent_voltage * vca_accent_scale) >> 16;
The RC time constant maps to a decay_coeff close to 1.0 — approximately
1 - (1 / (sample_rate * RC_time)). At 48 kHz with RC ≈ 47kΩ × 33nF = 1.55ms,
decay_coeff ≈ 0.99999 in S1.30 fixed-point. Needs sufficient precision — use S2.30
or S1.62 (64-bit accumulator) to avoid losing the accent voltage to rounding.
The mismatched first pole in the diode ladder is modelled by giving the first one-pole stage a slightly different frequency coefficient — approximately 0.5× the capacitance means ~2× the cutoff frequency for that stage. This is what prevents the feedback phase from reaching 360° and suppresses self-oscillation.
303 Step Sequencer
The sequencer is as much part of the 303 sound as the synth itself. The workstation app drives the 303 engine via the Pulse sequencer:
- 16 steps (extendable to 32)
- Per-step: pitch (3 octave range), gate length (normal / extended), slide, accent
- Tempo sync to MIDI clock or internal BPM
- Pattern storage in DBFS as compact binary (16 bytes per pattern)
- Live step entry mode (authentic 303-style programming)
- Pattern chain and randomise modes
303 vs General Voice Engine
The 303 engine runs as a dedicated single-voice mode in FireStorm — it does not use the general 32-voice architecture. When activated it takes one voice slot and runs the 303-specific DSP pipeline with the dual envelope and accent circuit. Multiple 303 engines can run simultaneously (one per voice slot) for multi-track acid patterns.
Reference Sounds / Targets
| Sound | Origin | Key requirements |
|---|---|---|
| TB-303 acid bass | Roland TB-303 | Diode ladder + dual ENV + accent accumulation + slide + output overdrive |
| M-86 / Hoover | Alpha Juno patch #86 "What The?" | PWM-sawtooth, fast LFO on PWM, BBD chorus, dropping pitch env |
| Juno pad | Juno-60 / 106 | Sawtooth + sub, 4-pole LP, BBD chorus |
| Prophet brass | Prophet-5 | 2× saw, LP filter with env attack, slight detune |
| DX7 bass/piano | Yamaha DX7 | 6-op FM, algorithm 5 (piano), algorithm 14 (bass) |
| Minimoog lead | Minimoog Model D | 3× oscillator unison, ladder filter self-osc |
| Wavetable sweep | PPG Wave / Waldorf | WT position swept by envelope, resonant LP |
| Karplus string | Physical modeling | Short exciter noise burst, delay line damping |
| Granular cloud | Any granular | Live input, position scatter, long grains |
Fixed-Point Arithmetic Summary
All signal-path arithmetic uses fixed-point. No floating-point units instantiated in FireStorm.
| Block | Format | Notes |
|---|---|---|
| Phase accumulator | U32 |
Natural overflow = waveform wrap |
| Audio samples | S24 |
144 dB dynamic range, well beyond DAC |
| Mix bus accumulation | S32 |
Headroom for 128+ voices summed without saturation |
| Filter state variables | S32 |
Prevents overflow in resonant feedback path |
| Filter tanh | S1.15 lookup |
12-bit address, BRAM, pre-computed |
| Envelope levels | U24 |
Exponential segments via multiply-accumulate |
| LFO | S16 |
Sub-audio, no precision issue |
| FM operator levels | S24.8 |
Extra fractional bits for smooth FM ratios |
| Wavetable samples | S16 |
Storage; interpolated to S24 in DSP path |
| Modulation amounts | S16 |
Signed, allows bipolar modulation |
Integration with AntOS
- Pulse RP2350: receives MIDI, runs audio sequencing, jog dial input, joypad input, retro SAM-style speech synthesis; sends sequencer events and register writes to FireStorm over QSPI (control plane); streams bulk data including rendered speech PCM to FireStorm over the 4-line PIO fast transfer bus; hi-speed UART to SG2000; acts as SPI slave to DeMon. Note: Pulse has no audio codec — audio I/O is handled directly by FireStorm
- DeMon RP2350: system supervisor; JTAG to SG2000 and FireStorm for debug/programming; QSPI to FireStorm for boot and control; SPI slave to SG2000; hi-speed UART to SG2000 for debug streams and boot messaging; SPI master to Pulse
- FireStorm (FPGA): pure sample-rate DSP, register-mapped voice parameters, no OS; contains the FireStorm EE execution engine alongside audio DSP and rasterizer; WM8958/WM8960 audio codec connected directly — handles all audio output routing: 2× internal stereo speakers, phono (line) out, HDMI audio embed, optical out (S/PDIF / TOSLINK); plus stereo audio input for live sampling
- SG2000 big core (C906 @ 1GHz, bare metal): the Music Workstation App — a full native C++ application compiled for the big core. No OS. Direct hardware access. Dear ImGui UI rendered to FireStorm. Controls all voice parameters, patch management, sequencer, and the visual editing pages (A, D, W, H, S, R, E, F, M). This is the primary user-facing application — think of it as the instrument's firmware.
- AntOS (little core, C906 @ 700MHz): system OS — shell, networking, file management, debug server/client, MIDI routing config, gossip P2P. All audio system libraries are exposed with AntOS scripting bindings so scripts can query and control any aspect of the audio engine — voice parameters, patch loading, sequencer state, FFT data feeds. AntOS is not responsible for real-time audio; that belongs entirely to the big core and FireStorm.
- DBFS: patches stored as compact binary structs (~256 bytes per patch), banks as BLOB
- USB MIDI: handled by Pulse, exposed to AntOS as a virtual MIDI port
Exceeding the Waldorf Quantum MK2 — and Everything Else
Note: The Waldorf Quantum MK2 was discontinued in April 2025. No current production synthesizer combines the full feature set described below. The Ant64 is targeting a space that currently has no occupant.
| Feature | Quantum MK2 †disc. | Prophet X | Virus TI2 | Waldorf Kyra | Ant64 Target |
|---|---|---|---|---|---|
| Max voices | 16 | 16 | 80 | 128 | 128–256 |
| Synthesis paradigms | 1 hybrid | 1 hybrid | 1 VA+WT | 1 VA+WT | 3 (VA · Sample · FM) |
| FM engine | Kernel only | No | Yes | No | Full 8-op, free routing |
| Full S&S sample engine | Granular only | Yes (150GB lib) | No | No | Yes + user samples |
| DX7 sysex import | No | No | No | No | Yes |
| Live audio input | Yes | No | No | No | Yes |
| Real-time sampling | Yes (limited) | No | No | No | Yes — 4 modes |
| Resample own output | Yes | No | No | No | Yes |
| Live granular input | Yes (limited) | No | No | No | Yes — streaming |
| 303 acid engine | No | No | No | No | Yes (dedicated) |
| M-86 / Hoover authentic | No | No | No | No | Yes |
| Real analog filter option | Hybrid on-board | Yes (analog LP) | No | No | Optional SSI2144 |
| RGB performance UI | No | No | No | No | 8× RGB jog dials |
| Open / hackable DSP | No | No | No | No | Yes — FPGA bitstream |
| Still in production | No | Yes | Yes | Yes | Yes — Ant64S/Ant64/Ant64C |
| Price | ~€4,800 | ~€3,800 | ~€3,000 | ~€1,800 | Multiple tiers |
Physical UI — 8 Jog Dials (Pulse RP2350)
Eight endless rotary encoders with integrated push buttons are connected directly to Pulse. Each dial optionally has an RGB LED (e.g. WS2812B/SK6812) driven via a single DMA-backed data line from Pulse — all 8 LEDs chained, full strip refresh in ~30µs, updated at 60 Hz.
Input Model (per dial)
Each dial provides three distinct physical inputs:
| Input | Action |
|---|---|
| Rotate | Increment / decrement current parameter (relative, no jump-on-pickup) |
| Push | Context action: confirm / reset to default / toggle mode |
| Push + Rotate | Fine adjust (smaller step size) or alternate parameter |
With one dial designated as Shift (hold push, turn others), effective logical control count is 24 without adding hardware.
Parameter Page System
8 dials cannot cover the full synth engine in one view. A paged system is used, with the current page indicated by dial LED colour. Turning any dial while on a page instantly updates that parameter in the FireStorm voice registers via Pulse.
| Page | LED Colour | Dial assignments |
|---|---|---|
| OSC | Amber | Osc1 pitch, Osc1 PW/WT pos, Osc2 pitch, Osc2 detune, Osc mix, Sub level, Noise level, Sync/mode |
| FILTER | Blue | Cutoff, Resonance, Filter type, Env amount, Env polarity, Key tracking, Drive, Filter routing |
| ENV | Green | Attack, Decay, Sustain, Release, ENV2 Attack, ENV2 Decay, ENV2 Sustain, ENV2 Release |
| LFO | Purple | LFO1 rate, LFO1 depth, LFO1 waveform, LFO1 dest, LFO2 rate, LFO2 depth, LFO2 waveform, LFO2 dest |
| FX | Cyan | Chorus rate, Chorus depth, Chorus mode, Reverb size, Reverb mix, Delay time, Delay feedback, Master FX mix |
| MOD | White | Mod slot select, Source, Destination, Amount, ×4 quick-assign slots |
| 303 | Red | Cutoff, Resonance, Env mod, Decay, Accent, Slide time, Waveform (push=toggle), Tempo |
| PATCH | Magenta | Patch select, Bank select, Save, Compare, Voice count, Unison detune, Bend range, Portamento |
Page is selected by a dedicated page button (or double-tap any dial push), or via AntOS UI.
RGB LED Behaviour
Colour = Parameter Value (on active page)
Hue sweeps across the current parameter's range as the dial is turned:
Low value ────────────────────────────── High value
Blue → Cyan → Green → Yellow → Orange → Red
Brightness = how far from default. At default value: dim. At extreme: full brightness. This means a quick glance shows the "shape" of a patch across all 8 parameters.
Breathing = Live Modulation
If a parameter is being modulated by an LFO or envelope, its LED breathes — pulsing in brightness at the modulation rate. Immediately shows what is moving without any display. At audio-rate modulation (FM), LED glows solid at modulation depth colour.
Page Colour Identity
When switching pages all 8 dials briefly flash their new page colour then settle into value-hue mode. You always know which page you are on from the tint of the LEDs.
303 Mode — Sequencer Feedback
In 303 mode the LEDs reflect the live sequencer state:
| Dial | RGB behaviour |
|---|---|
| Cutoff (1) | Flashes briefly on each MEG envelope trigger |
| Resonance (2) | Brightness tracks resonance value continuously |
| Env Mod (3) | Pulses on each note gate |
| Decay (4) | Glow duration tracks current decay time visually |
| Accent (5) | Builds in brightness across consecutive accented steps — mirrors the RC accent accumulation in hardware. Resets when accent chain breaks. |
| Slide (6) | Glows cyan during active portamento slide |
| Waveform (7) | Amber = sawtooth, Blue = square |
| Tempo (8) | Pulses white on every beat (16th note flash, brighter on beat 1) |
The accent dial building in brightness across repeated accents makes the accumulation circuit visible — an immediate diagnostic and a striking performance visual.
Shift / Modifier State
- Dial whose button is held as Shift: glows white
- Dials with available shift-functions: glow their page colour at reduced brightness
- Dials with no shift-function: go dark
Limit Warning
When a parameter reaches its minimum or maximum, the dial flashes white once — a tactile+visual "end stop" replacing any on-screen message.
Voice Activity (optional poly view)
In a dedicated "voice view" mode (hold page button), the 8 dials represent 8 of the 32 voice slots: lit = voice currently sounding, brightness = amplitude, colour = synthesis engine type (VA=amber, WT=blue, FM=yellow, granular=green, 303=red).
WS2812B / SK6812 Implementation on Pulse RP2350
Pulse GPIO (1 pin) ──► WS2812B chain ──► LED0 ──► LED1 ──► ... ──► LED7
- RP2350 PIO state machine generates the WS2812 protocol (standard PIO program)
- DMA transfers 8× 24-bit GRB values (192 bits) per frame — negligible CPU overhead
- Refresh rate: 60 Hz (16.7ms frame time, well within WS2812B timing)
- Total data per frame: 24 bytes
- Power: ~20mA per LED at full white × 8 = 160mA max; run at 30-50% brightness for ~50-80mA total — easily supplied from the Pulse board regulator
Pulse maintains a uint8_t led_grb[8][3] buffer. The synth engine and sequencer write
into this buffer; a 60 Hz timer DMA-blasts it to the LED chain. No blocking, no polling.
MIDI Connectivity
The Ant64 has the full professional MIDI connectivity stack — both traditional and modern. All MIDI handling runs on Pulse RP2350.
DIN MIDI (5-pin) — Ant64C only
- MIDI In — receive notes, CC, program change, sysex, clock from any hardware device
- MIDI Out — transmit from internal sequencer to external hardware synths, drum machines, rack modules, vintage gear — anything made since 1983
- MIDI Thru — hardwired pass-through of MIDI In signal, no latency, no CPU involvement
DIN MIDI is present only on the Ant64C (Creative Edition). The Ant64 and Ant64S have USB MIDI only. Having all three DIN ports on the Ant64C makes it a first-class citizen in a traditional hardware MIDI studio — no adaptors needed for vintage gear.
MIDI Thru is increasingly rare on modern synths. The Waldorf Quantum MK2 had no DIN MIDI at all. The Ant64C restores the full traditional MIDI port set.
USB MIDI
- Class-compliant USB MIDI device — no driver needed on any OS
- Simultaneously acts as MIDI host — can connect USB MIDI controllers directly (keyboards, pad controllers, wind controllers) without a computer in the chain
- MIDI over USB alongside DIN simultaneously — computer DAW + hardware rack at the same time
- AntOS exposes USB MIDI as a virtual port accessible from scripts and the workstation app
MIDI Feature Set (Pulse firmware)
- Full 16-channel receive and transmit
- MIDI clock master and slave — sync internal sequencer to external gear or DAW
- MIDI Machine Control (MMC) — transport control from DAW
- Sysex passthrough and sysex-based patch dump/restore
- MPE (MIDI Polyphonic Expression) receive — per-note pitch bend, pressure, slide from MPE controllers (Roli Seaboard, Linnstrument, Expressive E Osmose)
- MIDI Learn on all synth parameters — any knob/slider on any controller can map to any Ant64 parameter in real time
Pulse Sequencer
The Pulse RP2350 runs a polyphonic multi-track sequencer alongside the MIDI and voice allocation engine. This is not a simple arpeggiator — it is a full composition tool.
Architecture
Pulse Sequencer
├─ 16 tracks (any combination of types below)
├─ Up to 64 steps per pattern, variable step length
├─ Pattern chain → Song mode (patterns → arrangement)
├─ MIDI clock sync (master or slave)
└─ All output routed to:
├─ Internal FireStorm voice engine (any paradigm)
├─ DIN MIDI Out (control external hardware)
└─ USB MIDI Out (control DAW / computer)
Track Types
| Track | Description |
|---|---|
| Melodic | Polyphonic pitch sequence, velocity, gate length per step |
| Drum/Rhythm | 16-slot per-step pattern, each slot → different MIDI note / voice |
| 303 Acid | Dedicated 303 sequencer (pitch, gate, slide, accent per step) |
| Chord | Step-based chord sequence with voicing control |
| CC Automation | Records and plays back MIDI CC curves — automate any parameter |
| Arpeggiator | MIDI note input → arpeggiated output, multiple modes |
| Euclidean | Mathematical rhythm generation (hits, steps, rotation, offset) |
Sequencer Features
- Real-time recording — play notes live, capture to sequencer
- Step entry — program steps one at a time, 303-style
- Probability per step — 0–100% chance of triggering (generative / evolving patterns)
- Parameter locks — per-step value overrides for any synth parameter (Elektron-style)
- Swing / shuffle — adjustable timing offset on even steps
- Polyrhythm — each track can have independent step count and time division
- Pattern chain — order patterns into a song arrangement
- Live pattern switching — seamless pattern change on bar boundary
- MIDI thru routing — incoming MIDI notes merged with sequencer output on MIDI Out
303 Track (dedicated)
As described in the 303 acid engine section — full per-step pitch, gate type (normal/ extended), slide flag, accent flag. Authentic Roland-style programming workflow. Multiple 303 tracks can run simultaneously (full acid setup: bass line + lead line + chord stab from one box).
Amiga MOD Player (Pulse) + MOD Editor (SG2000 Big Core)
The MOD system follows the same editor/player split as the live coder: the player runs bare-metal on Pulse; the editor is a page in the workstation app on the SG2000 big core. The little core's role is limited to the IPC API that routes edited pattern data from the big core to Pulse.
Pulse runs the player engine — pattern sequencer, effect processor, sample mixer, Paula emulation. It reads module files from DBFS or SD card and handles all real-time playback duties bare-metal at 300MHz.
SG2000 big core (Page K in the workstation app) provides the visual editor — pattern grid, track view, instrument editor, sample browser, song arranger. The musician edits here; changes are serialised and sent to Pulse via the little core IPC.
The original ProTracker MOD player ran on a 7.09MHz Motorola 68000. Pulse runs at 300MHz — roughly 40× the headroom — making a faithful, extended MOD player trivially accommodated alongside the live code player, sequencer, MIDI, jog dials, and SAM speech synthesis.
Supported Formats
| Format | Origin | Channels | Notes |
|---|---|---|---|
| .MOD | Amiga ProTracker / SoundTracker | 4–8 | Original Amiga format — 31-sample, 64-pattern |
| .XM | FastTracker 2 (DOS) | Up to 32 | Extended patterns, envelopes, vibrato table |
| .S3M | ScreamTracker 3 (DOS) | Up to 32 | Stereo panning, OPL2 channel support |
| .IT | Impulse Tracker (DOS) | Up to 64 | Most expressive tracker format — filters, NNA |
The MOD player reads module files from DBFS or the SD card. Samples are streamed from storage for large modules rather than pre-loaded entirely into RAM.
Architecture on Pulse
DBFS / SD card
│
│ Module file (samples + pattern data + song order)
▼
Pattern sequencer
├─ Song position counter (order list → pattern index)
├─ Pattern step counter (rows 0–63 / 0–255)
├─ Per-channel note/instrument/effect parsing
└─ Tick clock (BPM × ticks-per-row, default 6 ticks/row at 125 BPM)
│
▼
Effect processor (per channel, per tick)
├─ 0xy Arpeggio ├─ 3xx Tone portamento
├─ 1xx Porta up ├─ 4xy Vibrato
├─ 2xx Porta down ├─ Axy Volume slide
├─ 5xy Porta + vol ├─ Bxx Pattern jump
├─ 6xy Vibrato + vol ├─ Cxx Set volume
├─ 7xy Tremolo ├─ Dxx Pattern break
├─ 9xx Sample offset ├─ Exx Extended effects
└─ Fxx Set speed/BPM └─ Gxx Set global volume
│
▼
Sample mixer (fixed-point, per channel)
├─ Linear interpolation between samples
├─ Amiga-accurate Paula hardware emulation (optional)
│ └─ Low-pass filter characteristic (RC filter on Paula output)
├─ Per-channel volume + panning
└─ Stereo mix bus → FireStorm audio output
Amiga Paula Emulation
The distinctive Amiga sound is partly the DAC and partly the Paula chip's hardware low-pass filter — a simple RC filter that rounded off the harsh edges of 8-bit samples. The MOD player offers:
- Accurate mode: Paula RC filter emulated per-channel — the authentic warm Amiga sound. Samples retain the characteristic 8-bit warmth.
- Clean mode: no filter — full-fidelity playback. 16-bit or 24-bit samples sound as sharp as the original recording allows.
- FireStorm mode: MOD channels routed as voice inputs to the VA engine — each channel feeds a ladder filter and BBD chorus. Amiga samples processed through fully analog-modelled circuitry. The intersection of 1985 and 2026.
Integration with the Sequencer and FireStorm
The MOD player is not isolated from the rest of the audio system:
- MOD channels can be routed to any FireStorm voice slot — the sample data feeds the sample engine directly, with all VA/filter/chorus processing available on top
- The MOD player's BPM clock can sync to the Pulse sequencer clock or MIDI clock — MOD patterns and tracker sequences run in tight synchronisation
- MOD note events can trigger the live coding interpreter (see below) — a MOD file can drive a live coding performance as its clock source
SMPS Player (Pulse) + SMPS Editor (SG2000 Big Core)
SMPS (Sega Music Player System) is the music driver format used across the Sega Mega Drive / Genesis library — Sonic the Hedgehog, Streets of Rage, Gunstar Heroes, Phantasy Star IV, and hundreds of others. It describes music in terms of FM operator parameters, note sequences, envelopes, LFO settings, tempos, and PSG (square wave) voices.
The Ant64 plays SMPS data natively through FireStorm's own synthesis engines — no chip emulation. The FireStorm FM engine is already a superset of what the YM2612 provides. SMPS is understood as a music format and its data mapped directly to FireStorm voice parameters. The result sounds like the original — or better, because FireStorm has no DAC ladder noise, no 8-bit depth limitation, and the full VA and effects chain available if wanted.
What SMPS Describes
SMPS files contain:
Header
├─ Tempo (tick rate)
├─ Channel count (FM channels + PSG channels)
├─ Pointers to per-channel data blocks
│
Per FM channel data block:
├─ Instrument (FM voice) data:
│ ├─ Algorithm (0–7) — operator routing
│ ├─ Feedback level
│ └─ Per-operator (4 operators):
│ ├─ Multiple (frequency ratio)
│ ├─ Detune
│ ├─ Total level (volume/carrier level)
│ ├─ Key scaling
│ ├─ Attack rate
│ ├─ First decay rate
│ ├─ Second decay rate (sustain rate)
│ ├─ Sustain level
│ ├─ Release rate
│ ├─ AM enable
│ └─ SSG-EG (envelope generator shape)
├─ Note sequence (pitch + duration codes)
├─ Volume / pan settings
├─ LFO settings (rate, PM depth, AM depth)
├─ Modulation (vibrato, tremolo patterns)
└─ Loop / repeat / jump commands
│
Per PSG channel data block:
├─ Note sequence (pitch + duration codes)
├─ Volume envelope pointer
└─ Noise mode (channel 4 only: white / periodic)
│
DAC channel (channel 6, if used):
└─ PCM sample index + playback commands
This is complete, self-contained music data. The synthesis parameters are all present in the file — there is nothing chip-specific about them. Operator ratios, envelope shapes, and note sequences are universal FM synthesis concepts. Algorithm 4 on a YM2612 is the same carrier/modulator topology as Algorithm 4 on any 4-operator FM engine.
Mapping SMPS to FireStorm
FM Channels (6 per SMPS file → FireStorm FM engine)
The YM2612 is a 4-operator, 8-algorithm FM engine with sine-only waveforms. FireStorm is 6-operator with free routing and 16 waveforms per operator — a proper superset. SMPS FM voices map directly:
SMPS FM instrument → FireStorm FM voice
──────────────────────────────────────────────────
Algorithm 0–7 (4-op routing) → Operators 1–4, routing per algorithm
Operators 5–6 unused (or add harmonics)
Feedback level → Operator 1 self-modulation depth
Operator multiple → Operator frequency ratio
Operator detune → Fine detune (same concept)
Total level → Operator output level
Key scaling → Key scale rate
Attack / decay / sustain → EG stage rates (same parameters)
Release rate → EG release
AM enable → LFO AM sensitivity
SSG-EG → Extended EG shape (FireStorm EG superset)
The 8 YM2612 algorithms map to FireStorm operator routing configurations:
Algo 0: [1→2→3→4] Serial stack (deepest FM modulation)
Algo 1: [(1+2)→3→4] Two carriers into 3
Algo 2: [(1+(2→3))→4] Mixed
Algo 3: [((1→2)+3)→4] Mixed
Algo 4: [(1→2)+(3→4)] Two pairs — FM bass + FM lead simultaneously
Algo 5: [(1→(2+3+4))] One modulator into three carriers
Algo 6: [(1→2)+(3)+(4)] One FM pair + two pure carriers
Algo 7: [1+2+3+4] All carriers — brightest, most additive
All 8 map exactly to FireStorm routing configurations. The additional operators (5, 6) can be left silent for faithful reproduction, or used to add harmonic richness beyond what the original could produce.
PSG Channels (3 square wave + 1 noise → FireStorm VA engine)
The SN76489 PSG provides three square wave oscillators and a noise channel. These map directly to FireStorm VA voices with the square/pulse waveform selected:
SMPS PSG channel → FireStorm VA voice
──────────────────────────────────────────────────────
Note pitch → Phase accumulator frequency
Volume envelope → VCA amplitude envelope
Channel 4 white noise → VA noise oscillator
Channel 4 periodic noise → VA oscillator at low frequency (buzzy pitch)
No filter or chorus added for default playback — clean square waves, faithfully reproducing the PSG character. Optional: VSA engine post-processing (slight LP filter to soften edges, or BitCrusher for authentic 4-bit volume steps).
DAC Channel (Channel 6 PCM → FireStorm Sample Engine)
SMPS Channel 6 switches between FM synthesis and 8-bit PCM playback — famously used for the electric bass slap in Sonic 1 and drum hits in many titles. PCM samples are stored in the SMPS data or referenced externally. These load directly into the FireStorm sample engine:
SMPS DAC sample data (8-bit, ~8kHz) → FireStorm sample voice
├─ Upsampled to 48kHz (linear interpolation)
├─ Bit depth extended to 16-bit (dithered)
└─ Played at original pitch via sample engine pitch tracking
Optional: the sample engine can apply the ladder filter and BBD chorus to DAC samples — the classic Mega Drive bass through a Juno chorus is an immediately recognisable and striking combination.
SMPS Format Variants
SMPS was extended and modified across Sega's library. The Pulse player handles the documented variants:
| Variant | Example games | Notes |
|---|---|---|
| SMPS Z80 | Sonic 1, Sonic 2, early MD | Original — data structure well documented |
| SMPS/68k | Sonic 3, S&K | 68k-resident driver, same data model |
| SMPS2 | Streets of Rage series | Extended PSG envelopes, extra features |
| Clone Driver v2 | Homebrew / fan games | Community standard, fully documented |
The community disassembly projects (S1/S2/S3 disassemblies, SMPS2ASM) produce clean, well-labelled SMPS data — these are ideal input for both the player and the editor.
Pulse SMPS Player Architecture
SMPS file (from DBFS or SD card)
│
▼
Pulse SMPS parser (bare-metal, RP2350 @ 300MHz)
├─ Read header → channel count, tempo, channel pointers
├─ Load FM instrument tables into working memory
├─ Load note/duration/command sequences per channel
└─ Initialise tick clock at specified tempo
│
▼
Pulse tick engine (same infrastructure as MOD and live code players)
Per tick, per channel:
├─ Advance note pointer
├─ Decode note / rest / loop / jump commands
├─ Apply modulation (vibrato depth, tremolo)
├─ Apply volume / pan updates
└─ Write voice parameters to FireStorm via QSPI
│
▼
FireStorm FM engine (channels 1–6)
FireStorm VA engine (PSG channels 1–3 + noise)
FireStorm Sample engine (DAC channel 6)
│
▼
Mix bus → WM8958 → audio output
The tick clock is shared with the MOD player and live coder — SMPS, MOD, and live code can run simultaneously, all locked to the same tempo reference.
SMPS Editor (Page V — Workstation App, SG2000 Big Core)
A dedicated workstation page for viewing, editing, and authoring SMPS data. The editor shows the music as it is — note sequences, FM instrument parameters, PSG envelopes — in a form the musician can understand and modify.
┌─────────────────────────────────────────────────────────────┐
│ PAGE V — SMPS EDITOR │
│ │
│ File: green_hill.smps Variant: SMPS Z80 [PLAY] [STOP] │
│ │
│ FM CH1 ████░░░░░░ Instrument 3 C4 tick 12/48 │
│ FM CH2 ██████░░░░ Instrument 7 G3 tick 12/48 │
│ FM CH3 ░░░░░░░░░░ (rest) │
│ FM CH4 ████░░░░░░ Instrument 1 E2 tick 06/48 │
│ FM CH5 ██░░░░░░░░ Instrument 5 A3 tick 12/48 │
│ FM CH6 ██████████ DAC sample 2 (bass slap) │
│ PSG 1 ████░░░░░░ C5 vol env 4 │
│ PSG 2 ░░░░░░░░░░ (rest) │
│ PSG 3 ██░░░░░░░░ G5 vol env 2 │
│ NOISE ░░████░░░░ periodic vol 10 │
│ │
│ [INSTRUMENTS] [PATTERNS] [SONG ORDER] [EXPORT] │
└─────────────────────────────────────────────────────────────┘
Instrument editor: all 4-operator FM parameters for each SMPS instrument, with an algorithm diagram (same as Page F but constrained to 4-operator / 8-algorithm topology). Changes sent live to Pulse player while music is playing.
Pattern editor: note grid per channel — pitch, duration, volume, modulation commands. Familiar to anyone who has used a tracker.
Song editor: the SMPS channel data is structured as patterns with loop/jump commands — the song editor visualises this as an arrangement of segments.
Export: write modified data back as a valid SMPS binary — compatible with Mega Drive homebrew tools and emulators. Useful for fan composers and ROM hackers.
Convert to native: export the SMPS data as a native Ant64 FM patch set and piano roll sequence in Page R — converting Mega Drive music into a fully editable Ant64 arrangement with all native synthesis features available.
The Ant64 implements a live music coding interpreter in the tradition of Sonic Pi, TidalCycles, ORCA, and Foxdot — systems where music is written as code, evaluated while playing, and changes take effect at the next musical boundary without stopping or interrupting playback.
The key property that distinguishes live coding from conventional sequencing is temporal hot-swap: the musician edits and evaluates code while the music plays. The interpreter accepts the new definition, compiles it to an event stream, and slots it in at the next quantisation boundary — the next bar, the next 4 bars, or the next phrase, depending on configuration. No gap. No click. No restart.
Architecture
The system is split across three processors by role, with clear ownership at each level:
SG2000 big core SG2000 little core Pulse RP2350
───────────────── ────────────────── ─────────────────────
Workstation App (bare metal) AntOS (OS duties) Hard real-time engines
──────────────────────────── ───────────────── ─────────────────────
Page L — Live Coder editor IPC / transfer API Live code player
├─ Code editor (syntax hl) ├─ Receives compiled ├─ Expression evaluator
├─ Evaluation feedback │ data from big core ├─ Pattern → event stream
├─ Error display inline ├─ Routes to Pulse ├─ Tick clock
├─ Code history / versions └─ OS / networking ├─ Event queue
└─ Output console ├─ Quantised hot-swap
├─ MIDI output
MOD Editor ├─ FireStorm regs
├─ Pattern / track view │
├─ Sample browser MOD player
├─ Instrument editor ├─ Pattern sequencer
└─ Song arranger ├─ Effect processor
├─ Sample mixer
└─ Paula emulation
Both editors live on the SG2000 big core, in the bare-metal workstation app alongside all other pages. Page L is the live coding editor. The MOD editor is a separate page (Page K — see below). The musician writes and edits everything here.
The little core handles OS duties only — AntOS networking, file management, debug server, gossip. Its role in the music pipeline is narrow and passive: it provides the API and IPC channel through which the big core's editors send compiled pattern data down to Pulse. It does not parse, evaluate, or schedule anything musical.
Pulse owns both player engines — the MOD player and the live code player run bare-metal on Pulse at 300MHz. Pulse receives compiled pattern data from the big core (via little core IPC), holds it, and performs the atomic hot-swap at the next quantisation boundary. Pulse is the only processor that touches the tick clock and event queue. Both engines share Pulse's scheduler infrastructure — the same tick clock, the same event queue, the same FireStorm register write path.
The Live Coding Language
A small, purpose-designed expression language — not a general scripting language. Concise enough to type live, expressive enough to describe complex rhythmic and harmonic structures. Inspired by the pattern notation of TidalCycles and the readability of Sonic Pi.
-- Basic pattern: note sequence at specified intervals
play [c4 e4 g4 c5] every 1 bar
-- Euclidean rhythm with velocity variation
drum kick every euclid(3, 8) vel [100 80 90]
-- Conditional: alternate every 2 bars
play [c4 e4] |> alt [g3 b3] every 2 bars
-- Parameter modulation: filter cutoff swept over 4 bars
sweep cutoff 200 4000 over 4 bars
-- Polyrhythm: two patterns at different cycle lengths
play [c4 d4 e4 f4] every 3 steps
play [g3 a3] every 2 steps
-- Sample trigger from MOD player sample bank
trig sample "snare_amiga" every euclid(5, 16)
-- Reference a FireStorm voice by name and set parameters
voice "bass" | filter 800 | res 0.7 | play [c2 c2 g1 c2]
-- Route output to MIDI channel
play [c4 e4 g4] on midi 1
Patterns evaluate to event streams. The interpreter resolves them relative to the
current BPM and sends timestamped note-on/note-off and parameter events to Pulse's
scheduler. Every evaluation is quantised — every 1 bar means the change takes
effect at the next bar boundary, never mid-bar.
Hot-Swap Quantisation
Musician edits code → hits Evaluate (Page L, big core)
│
▼
Big core serialises the expression to a compact binary form
and passes it to little core via hardware mailbox IPC
│
▼
Little core (AntOS IPC API) routes the data to Pulse
over hi-speed UART — its only role in this pipeline
│
▼
Pulse receives the compiled expression
Pulse evaluator parses and generates the event stream
(Pulse at 300MHz — typically < 2ms, well ahead of any bar boundary)
│
▼
Pulse holds the new stream in a pending slot
continues playing the current active stream uninterrupted
│
▼
Next quantisation boundary (bar / phrase / configured interval)
│
▼
Pulse atomic swap: pending stream becomes active
old stream discarded
→ No gap, no click, no restart — music continues uninterrupted
The quantisation interval is configurable: 1 beat, 1 bar, 2 bars, 4 bars, or manual (swap only when the musician explicitly triggers it). A tighter interval means faster response to edits; a wider interval means more musical coherence between changes.
Integration with Other Systems
The live coder is not isolated — it drives and interacts with the full audio system:
| Integration | Description |
|---|---|
| FireStorm voices | Any voice by name — full VA/FM/sample engine access |
| MOD player | Trigger MOD samples by name, sync to MOD BPM as clock source |
| 303 engine | Generate acid patterns procedurally — acid [c2 eb2 f2] slide [3,7] |
| Jog dials | Any dial assignable as a live variable — tempo = dial_1 * 200 |
| MIDI out | Drive external hardware — DIN MIDI, USB MIDI |
| FFT / spectrogram | Read live spectral data as input to pattern conditions |
| Euclidean rhythms | First-class euclid(hits, steps, rotation) operator |
| Probability | maybe(0.7) — 70% chance of triggering |
| Randomness | rand, choose [...], shuffle [...] — seeded or free |
Page L — Live Coder Editor (Workstation App, SG2000 Big Core)
A dedicated page in the workstation app. The musician writes and edits code here while the music plays. The editor renders to the HDMI output via the FireStorm rasteriser alongside any other active page.
┌────────────────────────────────────────────────────────┐
│ PAGE L — LIVE CODER │
│ │
│ > play [c4 e4 g4 c5] every 1 bar │ ← active (green)
│ > drum kick every euclid(3,8) │ ← active (green)
│ > sweep cutoff 200 4000 over 4 bars_ │ ← cursor
│ │
│ [EVAL] at next bar · BPM: 124 · Bar: 003.2 │
│ │
│ ✓ pattern: c4 e4 g4 c5 — swapped at bar 003 │
│ ✓ drum: euclid(3,8) — active │
│ ! sweep: parse error — missing 'over' │ ← error (amber)
└────────────────────────────────────────────────────────┘
Features:
- Syntax highlighting — notes, keywords, operators, voice names in distinct colours
- Inline error display — errors shown on the relevant line, not a separate panel
- Evaluation status — each active expression shown with its swap status
- Bar/beat position counter — shows exactly where in the musical timeline the cursor is
- Code history — previous evaluations stored, navigable with jog dial
- Split view — two code buffers side by side, each evaluating independently
MUTEper expression — silence an active pattern without deleting itSOLO— mute all other active patterns, play only the selected one
Music Generation
The Ant64 supports music generation at three independent tiers, each with different latency, capability, and network dependency characteristics. All three produce the same output: note and pattern data in the same format as hand-edited content. Generated data flows into the editors — Page L, Page K, or Page R — where the musician can inspect, edit, reject, or accept it before it plays. Generation is a source, not an override.
Generation tier
───────────────
Tier 1: Algorithmic (big core, instant, offline) ──┐
Tier 2: TPU inference (SG2000, local, fast) ──┼──→ Note / pattern data
Tier 3: AI API (little core async, network) ──┘ (same format as edited)
│
▼
Page L — live coder
Page K — MOD editor
Page R — piano roll
│
▼
Pulse player
Tier 1 — Algorithmic Generation (SG2000 Big Core)
Deterministic mathematical algorithms running bare-metal on the big core. Instant output, no network, no model loading. These are the tools of formal and stochastic composition — the same methods used by Xenakis, Messiaen, Steve Reich, and the demoscene tracker community.
Euclidean Rhythms (Bjorklund's Algorithm)
Distribute N hits across M steps as evenly as possible. The resulting patterns correspond precisely to the core rhythmic patterns of every major world music tradition — they emerge from small integer ratios, not from cultural convention.
euclid(2, 8) = [x . . . x . . .] ← Habanera / basic clave
euclid(3, 8) = [x . . x . . x .] ← Afro-Cuban clave, tresillo
euclid(4, 8) = [x . x . x . x .] ← Standard four-on-the-floor (even)
euclid(5, 8) = [x . x x . x x .] ← Quintillo (Cuban)
euclid(3, 16) = [x . . . . x . . . . . x . . . .] ← Sparse kick pattern
euclid(5, 12) = [x . . x . x . . x . x .] ← Yoruba bell pattern (Bembé)
euclid(7, 12) = [x . x x . x . x x . x .] ← Bembé variation
euclid(9, 16) = [x . x x . x . x . x x . x . x .] ← Complex polyrhythm
The rotation parameter shifts the pattern by N steps — same hits, different downbeat:
euclid(3, 8, rotation: 2) starts two positions into the pattern, changing the
phrasing without changing the density.
First-class syntax in the live coder: drum kick every euclid(3, 8)
Markov Chains
Build a transition probability matrix from an existing sequence — a melody, a chord progression, a rhythm — and generate new sequences with the same statistical character. The generated output has the same "feel" as the input without repeating it literally.
Training sequence (input): C4 E4 G4 E4 C4 D4 F4 A4 F4 D4 ...
│
Build transition matrix:
From C4 → E4 (0.6), D4 (0.3), G4 (0.1)
From E4 → G4 (0.5), C4 (0.3), D4 (0.2)
From G4 → E4 (0.7), A4 (0.2), C5 (0.1)
│
Generate new sequence:
C4 → E4 → G4 → A4 → F4 → D4 → F4 → E4 ...
(statistically similar, not identical)
Order of the Markov chain determines how much context is used:
- Order 1: each note depends only on the previous note — loose, improvisational feel
- Order 2: each note depends on the previous two — more phrase coherence
- Order 3+: longer-range dependencies — closer to the training material's style
Training material can be: a melody played live into the sequencer, a loaded MIDI file, a MOD module's pattern data, or a manually entered note sequence in Page R.
L-Systems (Lindenmayer Systems)
Recursive string rewriting rules that produce self-similar, fractal-like structures. Originally developed to model plant growth, they produce musical phrases with natural hierarchical structure — phrases within phrases within phrases, all related by the same generative rule.
Simple melodic L-system:
Axiom: A
Rules: A → A B A (A expands to three elements)
B → B A B (B expands to three elements)
Depth 1: A B A (3 notes)
Depth 2: ABA BAB ABA (9 notes)
Depth 3: (27 notes — self-similar phrase structure at 3 levels)
Map symbols to musical elements:
A = root note (C4), duration = 1 beat
B = fifth (G4), duration = 0.5 beat
→ Generates a self-similar melody with Fibonacci-length phrases
More complex rules can encode pitch, duration, dynamics, and articulation separately. L-systems produce the kind of recursive phrase structure found in Bach and in generative ambient music — coherent at multiple timescales simultaneously.
Cellular Automata
A grid of cells, each alive or dead, updated each step by local neighbourhood rules. Musical mapping: each row = one time step, each column = one pitch or drum voice. Live cells = note triggers. The evolution of the grid becomes the evolution of the music.
Wolfram Rule 110 (computationally universal — generates complex non-repeating patterns):
Step 0: . . . . . . . . X . . . . . . . ← single seed
Step 1: . . . . . . . X X . . . . . . .
Step 2: . . . . . . X X . . . . . . . .
Step 3: . . . . . X X X . . . . . . . .
Step 4: . . . . X X . X . . . . . . . .
... (continues, never exactly repeating)
Map to drum grid: X = hit, . = rest
16 columns = 16 pitches or drum voices
Each row = one bar step
→ Evolving, non-repeating rhythmic pattern from a single starting cell
Conway's Life mapped to a piano roll: stable structures (still lifes) = sustained chords; oscillators = repeating rhythmic figures; gliders = melodic lines that travel across the pitch space over time.
The seed pattern and the rule number are the only inputs. Different seeds with the same rule produce related but distinct patterns — a generative variation system with mathematical coherence.
Functional Harmony Generator
Generates chord progressions using tonal harmony rules: Roman numeral grammar, voice leading constraints, tension/resolution patterns, secondary dominants, borrowed chords, modal interchange.
Key: C major
Style: jazz (allows substitutions, extensions)
Generate:
I Maj7 → VI m7 → II m7 → V7 → I Maj7 ← basic ii-V-I with turnaround
I Maj7 → ♭VII7 → IV Maj7 → I ← modal interchange (Mixolydian ♭VII)
I Maj9 → ♯IV ø7 → IV Maj7 → III7 → VI7 ← tritone substitution chain
Voice leading:
├─ Common tones held across chord changes where possible
├─ Contrary motion preferred over parallel motion
├─ No parallel fifths (classical) / allowed (jazz)
└─ Voice range constraints per part (SATB or lead+bass)
Output: chord symbols + voice-led individual note sequences, ready to feed into the piano roll or live coder.
Stochastic / Probabilistic Generation
Xenakis formalised the use of probability distributions in musical composition. The Ant64 implements several:
| Distribution | Characteristic | Musical use |
|---|---|---|
| Gaussian | Bell curve around a centre | Pitch clusters around a tonic; velocity variation around a target level |
| Poisson | Event frequency over time | Note density — sparse or dense passages with statistical consistency |
| Random walk | Each step ±Δ from previous | Melodic lines that wander plausibly — not random jumps |
| Brownian (1/f²) | Slower drift than random walk | Slow harmonic movement, pad evolution |
| Pink noise (1/f) | Statistics match real music | Rhythmic and melodic sequences whose density variation matches natural music |
| Cauchy | Heavy-tailed — occasional large jumps | Surprising melodic leaps in otherwise stepwise lines |
Pink noise (1/f) is particularly significant: the amplitude spectrum of real music falls as approximately 1/f. Sequences generated with 1/f statistics are statistically indistinguishable from real music at the macro level — they have the same distribution of phrase lengths, interval sizes, and dynamic variation.
Fractals
Self-similar structures at multiple timescales. The Mandelbrot and Julia sets map to pitch and duration via boundary proximity. The dragon curve and Koch snowflake produce rhythmic patterns with fractal self-similarity.
More musically direct: fractal melody using the midpoint displacement algorithm. Start with two notes, recursively insert midpoints with a random offset that halves each recursion — produces a melody that is smooth at large scales but detailed at small scales. Sounds like natural melodic improvisation.
Tier 2 — Local Neural Inference (SG2000 TPU)
The SG2000 contains a 0.5 TOPS INT8 tensor processing unit — sufficient for running small trained neural networks locally with no network dependency.
What fits in 0.5 TOPS INT8:
| Model type | Use | Notes |
|---|---|---|
| Small LSTM (1–2 layers, 128–256 hidden) | Melody continuation | Trained on a corpus — continues a started phrase in the same style |
| Tiny Transformer (2–4 heads, 64–128 dim) | Chord suggestion | Given a melody fragment, suggest appropriate harmony |
| Style classifier | Genre/mood detection | Classify the current playing style, feed back to generation |
| Groove quantiser | Humanise timing | Learned timing offsets per beat position — makes quantised patterns feel live |
The TPU runs inference on the SG2000's big core side — a model is loaded from DBFS, given a context window (the last N notes or bars), and returns a continuation. Inference time for a small LSTM on a few hundred tokens is well under 100ms — fast enough to generate the next bar before Pulse needs it.
Model distribution via gossip network: trained model files are compact (LSTM ~500KB, tiny transformer ~2MB) and shareable over the Ant64 mesh. Musicians can share style models the same way they share patches.
Tier 3 — External AI Generation (AntOS AI Library, Little Core)
The AntOS AI library provides async API access to external large language and music generation models over WiFi or Ethernet. The little core handles the network request; the big core sent the prompt and receives the result; the result is queued as pattern data and plays when it arrives — latency is irrelevant because playback is quantised.
Supported API targets (configurable):
| Service | Generation type | Notes |
|---|---|---|
| OpenAI (GPT-4o / o1) | Natural language → note data | Prompt in plain English, result parsed to events |
| Anthropic (Claude) | Same | Structured music generation via system prompt |
| Google (Gemini) | Same | Also supports music description and analysis |
| MusicGen / AudioCraft | Audio generation | Result → Page S sample, not note data |
| ABC Notation endpoints | Score generation | ABC notation → note events — clean round-trip |
| Custom endpoint | Configurable | Point to any compatible API |
The request/response cycle:
Musician types prompt on Page L or a dedicated generation panel
(e.g. "16-bar walking bass line in Cm, jazz feel, medium tempo")
│
▼
Big core formats the request (note data format spec in system prompt)
Passes to little core via mailbox IPC
│
▼
Little core (AntOS AI library) makes the API call over WiFi/Ethernet
Async — does not block the big core or Pulse
│
▼
Response received — little core parses to note event format
Passes result to big core via mailbox IPC
│
▼
Big core inserts result into Page L / Page K / Page R for review
Musician can edit, transpose, truncate, loop before committing
│
▼
Commit → Pulse player queue → plays at next quantisation boundary
Natural language generation examples (live coder syntax):
-- Ask AI for a pattern (async — plays when response arrives)
play ai("funky 16th-note bassline in E minor, syncopated")
-- With style reference
play ai("melody in the style of Boards of Canada, melancholic, C major")
-- Rhythmic only
drum ai("complex polyrhythmic pattern, 5 against 4, kick and hi-hat")
-- Constrained — must fit current harmony
play ai("fill in the gaps of this melody", context: current_bar)
-- Chord progression
chords ai("jazz reharmonisation of a I-IV-V in Bb")
The AI library formats a structured system prompt that specifies the output format (note name, octave, duration, velocity as JSON or a compact domain-specific format), ensuring the response can be reliably parsed to playable event data. The musician never sees the raw API response — only the musical result in the editor.
Offline graceful degradation: if no network is available, Tier 3 calls return an error with a suggestion to use a Tier 1 or Tier 2 equivalent. The system never hangs waiting for a network response — the timeout is configurable and short.
The Generative Feedback Loop
The three tiers combine with the spectral analysis pipeline to create a closed generative loop — generate, analyse, refine, regenerate:
Tier 1/2/3 generation
│
▼
Pattern data → Pulse plays it
│
▼
Audio output → Page D spectrogram (live input mode)
│
▼
Spectral analysis → harmonic content extracted
│
├──→ Feed harmonic data back as constraint to Tier 1
│ ("generate variations that preserve these harmonics")
│
├──→ SEND to Page H → edit harmonics → IFFT → new wavetable
│ (the generated music reshapes the synthesis voice)
│
└──→ Feed to AI prompt as musical context (Tier 3)
("here is the spectral analysis of what I just played —
generate a complementary melody")
Generated music becomes input to synthesis source creation, which changes the timbre of the instrument that plays the next generation pass. Each cycle can produce something genuinely novel without any repetition.
Generation Language Extensions (Live Coder)
The live coder syntax gains first-class generation operators:
-- Euclidean rhythm
drum kick every euclid(3, 8)
drum snare every euclid(2, 8, rotation: 4)
-- Markov chain continuation from a seed sequence
play markov(seed: [c4 e4 g4 e4 c4], order: 2, steps: 16)
-- L-system melody
play lsystem(axiom: "A", rules: {A: "ABA", B: "BAB"}, depth: 3, map: {A: c4, B: g4})
-- Cellular automaton rhythm (Rule 110, 16-column drum grid)
drum ca(rule: 110, seed: current_bar, voices: [kick snare hihat])
-- Harmonic progression (functional harmony generator)
chords harmony(key: "Cm", style: "jazz", bars: 8)
-- Stochastic melody (Gaussian pitch around tonic, Poisson density)
play stochastic(centre: c4, spread: 5, density: 0.6, steps: 16)
-- Pink noise sequence (1/f statistics)
play pink(root: c4, scale: "minor", steps: 32)
-- Random walk melody
play walk(start: c4, step: 2, steps: 16, scale: "dorian")
-- Local TPU model continuation
play tpu(model: "blues_lstm", context: last_bars(4), steps: 8)
-- External AI (async, plays when ready)
play ai("melodic fill, 2 bars, match current harmony")
-- Hybrid: generate with algorithm, humanise with TPU groove model
play euclid(5, 16) |> humanise(model: "jazz_groove")
All generation operators produce event streams in the same format as hand-typed patterns. They can be piped through any live coder operator — transpose, reverse, stretch, filter, humanise — regardless of how they were generated.
Page G — Music Generator (Workstation App, SG2000 Big Core)
A dedicated workstation page for generation — separate from Page L (live coding) so the generation workflow has its own space without cluttering the code editor.
┌────────────────────────────────────────────────────────────┐
│ PAGE G — MUSIC GENERATOR │
│ │
│ MODE: [Algorithmic ▼] KEY: [Cm] BARS: [8] BPM: [124] │
│ │
│ Algorithm ○ Euclidean ● Markov ○ L-System ○ CA │
│ ○ Harmony ○ Stoch. ○ Pink noise │
│ │
│ Markov order: [2] Seed: [current selection in Page R] │
│ Steps: [16] Scale: [natural minor] │
│ │
│ [GENERATE] [PREVIEW] [SEND TO PAGE R] [SEND TO L] │
│ │
│ AI PROMPT ───────────────────────────────────────────── │
│ > funky bassline in Cm, 8 bars, syncopated_ │
│ [ASK AI] Status: waiting for response... │
│ │
│ TPU MODEL: [blues_lstm ▼] Context: [last 4 bars] │
│ [CONTINUE] │
└────────────────────────────────────────────────────────────┘
All three tiers accessible from one page. Generated output previews in a small piano
roll at the bottom of the panel before committing. SEND TO PAGE R inserts into the
arrangement. SEND TO L inserts as a live coder expression.
Page G is added to the workstation page list alongside A, D, K, L, W, H, S, R, E, F, M.
Video System — Light Synth
The Ant64 has integrated audio-reactive video output — a light synth driven by the same synthesis engine data that produces the audio. Four modes: audio-reactive visualiser (waveform, FFT, Lissajous), synth parameter visualiser (each voice rendered as a visual element), generative light synth (MIDI/sequencer events drive procedural visuals), and VJ tool (MIDI-triggered clip playback with parameter control).
All audio system parameters are exposed as input data to the video system. Voice pitch, envelope state, LFO modulation, filter cutoff, and note velocity all map to visual parameters — position, colour, size, motion, brightness. The light synth treats video the way the audio engine treats sound: parametric, generative, driven from synthesis data.
Full light synth documentation — modes, synthesis analogy table, scripting API, and hardware output specifications — is in the Display Architecture reference.
The audio system's role is to make its live data (voice states, FFT data, MIDI events, sequencer position) available to the video system via the AntOS scripting bindings. The display hardware (FireStorm compositor, output clocking, layer system) is entirely the display system's concern.
Complete System Overview
Pulling all capabilities together, the Ant64 is not a synthesizer with extras — it is a complete audio-visual instrument and MIDI hub.
┌──────────────────────────────────────────────────────────────────────────────┐
│ ANT64 │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ SG2002 / SG2000 SoC │ │
│ │ ┌──────────────────────────┐ ┌───────────────────────────────┐ │ │
│ │ │ BIG CORE (C906 @ 1GHz) │ │ LITTLE CORE (C906 @ 700MHz) │ │ │
│ │ │ Bare metal · no OS │ │ AntOS · Luau · DBFS │ │ │
│ │ │ Music Workstation App │◄─►│ Gossip · Shell · File mgmt │ │ │
│ │ │ C++ · Dear ImGui │ │ Hardware Mailbox IPC │ │ │
│ │ └────────────┬─────────────┘ └───────────────────────────────┘ │ │
│ │ │ QSPI (NOR mode) — control plane │ │
│ │ │ MIPI 4-lane × 2 — data plane (future) │ │
│ └───────────────┼─────────────────────────────────────────────────────┘ │
│ │ ▲ SPI slave │
│ ┌───────────────▼──────────────────┴──────────────────────────────────┐ │
│ │ FireStorm (FPGA) │ │
│ │ GoWin 138k (Ant64/Ant64C) · GoWin 60k (Ant64S) │ │
│ │ │ │
│ │ FireStorm EE · Audio DSP 128+ voices · 2D Rasterizer │ │
│ │ Hard RISC-V debug core [Ant64/Ant64C only — GoWin 138k] │ │
│ │ VA · FM · Sample · Granular · Filters · BBD · Effects │ │
│ │ Fixed-point · No float · HDMI/VGA/(DisplayPort Ant64C) │ │
│ │ WM8958/WM8960 audio codec (direct) · Optical out │ │
│ │ → 2× internal stereo speakers · Phono out · HDMI audio │ │
│ │ │ │
│ │ SRAM A (36-bit) — audio · SRAM B (36-bit) — rasterizer │ │
│ │ DDR3 (32-bit, 1/2GB) — samples/wavetables [Ant64/Ant64C] │ │
│ │ 8MB embedded PSRAM [Ant64S] │ │
│ └──┬──────────────────────────────────────────┬───────────────────────┘ │
│ │ QSPI — register writes / control │ JTAG (debug/program) │
│ │ 4-line PIO — fast bulk data transfer │ │
│ ┌──▼──────────────────────────────┐ ┌─────────▼──────────────────────┐ │
│ │ Pulse RP2350 │ │ DeMon RP2350 │ │
│ │ MIDI · Joypads · Jog dials │ │ System supervisor │ │
│ │ Audio sequencer · SAM speech │ │ JTAG → SG2000 + FireStorm │ │
│ │ 4× 3.5mm trigger/CV inputs │ │ │ │
│ │ USB MIDI host+device │ │ QSPI → FireStorm │ │
│ │ DIN MIDI (Ant64C) │ │ SPI slave ← SG2000 │ │
│ │ 8× RGB jog dials (WS2812B) │ │ Hi-speed UART ↔ SG2000 │ │
│ │ Hi-speed UART ↔ SG2000 │ │ SPI master → Pulse │ │
│ │ SPI slave ← DeMon │ │ Boot · Watchdog · Debug │ │
│ │ │ │ WiFi/BT (ESP-C5) · RTC │ │
│ └─────────────────────────────────┘ └────────────────────────────────┘ │
│ │
│ OUTPUTS: 2× internal stereo speakers · Phono out · Optical audio │
│ HDMI/VGA/(DP Ant64C) · DIN MIDI (Ant64C) · USB MIDI │
│ INPUTS: Stereo audio (phono) · DIN MIDI (Ant64C) · USB MIDI · Keyboard │
└──────────────────────────────────────────────────────────────────────────────┘
How Does It Compare? Is There Anything Like It?
Short answer: No. Nothing like it exists or has existed.
The Ant64 occupies a category of one. To understand why, consider what you would need to buy to match its combined capabilities in 2026:
| Capability | Best dedicated hardware | Price |
|---|---|---|
| 128-voice VA + WT synth | Waldorf Kyra | ~€1,800 (discont.) |
| Full FM synthesis (6-op, DX7 compat) | Yamaha Montage M | ~€4,500 |
| Live sampling + S&S engine | Sequential Prophet X | ~€3,800 |
| Live granular input | Waldorf Quantum MK2 | ~€4,800 (discont.) |
| Multi-track hardware sequencer | Squarp Pyramid MK3 | ~€700 |
| DIN MIDI In/Out/Thru hub | iConnectivity mioXL | ~€400 |
| Video synthesizer / visualiser | Critter & Guitari EYESY | ~€500 |
| 303 acid engine | Roland TB-03 | ~€350 |
| Total | 8 separate devices | ~€16,850+ |
And that stack still wouldn't have:
- All three synthesis paradigms layerable per voice
- RGB performance UI on the synth controls
- DX7 sysex import
- Audio-reactive video tied directly to the synthesis engine
- An open, hackable FPGA DSP layer
- A unified OS (AntOS) tying everything together with a scripting language
The Closest Historical Precedents
Fairlight CMI (1979–85) — combined sampler + synthesis + sequencer + video display. Cost £20,000–£50,000. The Ant64 is architecturally more capable in synthesis depth and has video output where the Fairlight had only a CRT display.
Synclavier (1975–92) — FM + sampling + sequencer, professional studio standard. Cost $200,000+. The Ant64 has comparable synthesis capability in a hobby platform.
Edirol/Roland CG-8 Visual Synthesizer (2003) — combined MIDI-controlled audio and video synthesis. Discontinued. No audio synthesis — it was a video processor only. No current equivalent exists.
Conclusion: The Ant64 at full spec is the first hobbyist-accessible device to combine professional-grade polyphonic synthesis (all paradigms), a hardware sequencer, full MIDI connectivity, live sampling, and integrated video synthesis in a single open platform. The commercial equivalent does not exist in 2026.
Visual Editing System — Fairlight-Inspired
The Fairlight CMI's defining characteristic was not just its sound but its visual interface — a light pen on a green CRT that let you literally draw sounds, compose sequences on a grid, and sculpt waveforms by hand. In 1979–85 this cost £30,000. The Ant64 implements the same paradigm as a dedicated native C++ application — the Ant64 Music Workstation App — using Dear ImGui as the UI framework, rendered to the HDMI video output and navigated via the 8 jog dials plus mouse or stylus input.
This is a full standalone bare-metal C++ application on the SG2000 big core, using Dear ImGui as the UI framework, rendered to the HDMI video output via the FireStorm rasteriser, navigated via the 8 jog dials plus mouse or stylus. It communicates with FireStorm, Pulse, and DBFS over defined IPC interfaces. Think of it the way a DAW relates to an OS — it runs on the platform but is its own substantial piece of software with direct hardware access and no OS overhead.
The Fairlight's Pages — and Their Ant64 Equivalents
| Fairlight page | Function | Ant64 equivalent |
|---|---|---|
| Page 4 | 32 harmonic amplitude sliders | Page H — Harmonic Editor |
| Page 5 | Per-harmonic envelope profiles | Page H ENV view |
| Page 6 | Freehand waveform drawing | Page W — Waveform Drawing |
| Page 8 | Sample recording and display | Page S — Sample Editor |
| Page D | 3D spectral waterfall (STFT) | Page D — Spectral Waterfall (extended) |
| Page R | Graphical grid sequencer | Page R — Piano Roll Sequencer |
| (none) | Envelope ADSR visual editor | Page E — Envelope Editor |
| (none) | FM algorithm node graph | Page F — FM Algorithm Editor |
| (none) | Modulation matrix visual | Page M — Mod Matrix Editor |
| (none) | Phasor / DFT decomposition | Page A — Audio Analyser (new) |
| (none) | Live music coding editor | Page L — Live Coder (new) |
| (none) | MOD / tracker editor | Page K — MOD Editor (new) |
| (none) | SMPS / VGM editor | Page V — SMPS Editor (new) |
| (none) | Music generation | Page G — Music Generator (new) |
Input Devices for Visual Editing
The Fairlight used a light pen — a stylus held against the CRT that detected the electron beam position. The Ant64 equivalents:
| Input | Role |
|---|---|
| Mouse (USB, via Pulse) | Primary cursor control — point, click, drag to draw |
| 8 jog dials | Dial 1 = X cursor; Dial 2 = Y/amplitude; Dial 3–8 = context parameters; push = confirm/select |
| QWERTY keyboard (USB) | Command entry (Fairlight-style: type SAW, TRI, SIN) |
| Stylus tablet (USB, optional) | Pressure-sensitive drawing — stroke pressure → waveform amplitude |
| MIDI keyboard | Enter notes in sequencer grid by playing them live |
The jog-dial-as-cursor approach is particularly natural for waveform editing: Dial 1 steps through sample points, Dial 2 adjusts amplitude — fully navigable without a mouse if preferred. Exactly the tactile feel the Fairlight was going for.
Page W — Waveform Drawing (Fairlight Page 6)
Draw and sculpt a waveform cycle directly on screen. 256 points per cycle, amplitude range −128 to +127. Changes heard immediately — FireStorm updates within one frame.
Drawing modes:
- DRAW — freehand: drag cursor to paint amplitude continuously
- JOIN — each point joins the last with a straight line (Fairlight default)
- PLOT — set individual points without affecting neighbours
Macro waveforms — type or button-press fills current segment:
SAW · SQ n (square, pulse width n) · TRI · SIN · NOISE
Transform operations:
- INV — invert vertically (flip around zero)
- REV — reverse horizontally (mirror in time)
- SQZ — squeeze (compress amplitude toward zero — soften the wave)
- MRG — merge: interpolate all segments between two defined endpoints
- MIX — blend two waveforms at a user ratio (crossfade between timbres)
- CPY — copy segment to a range of segments
128-segment system — matching the Fairlight exactly:
- The full sound is divided into 128 time segments, each with its own waveform
- The sound evolves through all 128 as it plays — this is how you get organic, living timbres that change over time rather than a static waveform looping
- Draw segment 1 as a bright sawtooth, segment 64 as a sine, MRG between them: the sound smoothly morphs from harsh to pure over its duration
- This is the Fairlight's characteristic evolving pad / orchestral hit sound
Live preview — [PLAY] triggers the voice immediately on any keypress so you
hear your drawn waveform in context as you work.
Page H — Harmonic Editor (Fairlight Page 4/5)
Additive synthesis through visual harmonic control. 32 vertical sliders — one per harmonic partial — showing amplitude. Drag to sculpt the frequency spectrum directly.
Views:
- AMP — harmonic amplitudes (the fundamental timbral shape)
- PHASE — phase offset per partial (subtle textural effect)
- ENV — per-harmonic envelope decay rate (each partial fades at its own speed)
Per-harmonic envelopes are the Fairlight's secret for convincing acoustic sounds. High harmonics in a real piano decay faster than low ones. Draw steeper ENV curves for H8–H32 and flatter ones for H1–H4 — the result sounds like a real instrument's natural harmonic decay, not a filter envelope smearing everything together.
[→ PAGE W] converts the current harmonic profile to a waveform via IFFT — bridges
the additive synthesis view and the waveform drawing view.
Page S — Sample Editor (Fairlight Page 8/D)
Full visual waveform display for recorded samples. Scrollable, zoomable, with all standard sample editing operations controlled graphically.
Visual controls:
- Drag START, END, LOOP S, LOOP E markers along the waveform
- Shaded region shows the active loop with crossfade zone highlighted
- Zoom from full sample view down to individual sample cycles for precision loop setting
Operations:
- TRIM — remove audio outside markers · NORM — normalise to full scale
- REVERSE — flip in time · X-FADE — set loop crossfade length (0–100ms)
- RECORD — live waveform display as audio is captured — watch it draw itself in
All changes feed directly to FireStorm sample parameters. Loop crossfade length maps to the hardware crossfade DSP block in real time — no bounce/reload needed.
Page R — Graphical Sequencer (Fairlight Page R)
The piano roll. Click to place notes, drag to resize them, sweep to paint rhythms. The direct ancestor of every DAW piano roll in use today — the Ant64 has the original paradigm, hardware-native.
Grid editing:
- Click empty cell → place note · Drag right → extend length · Right-click → delete
- Hold and sweep across cells → paint multiple notes in one gesture
- MIDI keyboard input → notes appear at current step position in real time
Per-step detail (zoom in):
- Velocity shown as vertical fill within the cell
- Gate length shown as horizontal fill
- Probability shown as partial transparency (50% = 50% chance on each cycle)
- Parameter lock shown as a coloured dot (per-step timbre/filter/pitch override)
8 tracks, each independently configurable:
- Track type: melodic · drum · CC automation · 303 acid
- Track voice: any FireStorm voice, DIN MIDI channel, or USB MIDI channel
- Track step count: 1–64 independent per track (polyrhythm built-in)
Song editor — patterns chain into a full arrangement: Patterns displayed as blocks in a timeline; drag to reorder, double-click to edit.
Page E — Envelope Editor
Visual multi-stage envelope editing. Drag breakpoints with mouse or navigate with Dial 1/2. All four envelopes per voice displayed simultaneously for comparison. ENV4 (8-stage loopable) shows all 8 breakpoints; drag to reshape complex LFO-like envelopes. Changes update FireStorm envelope parameters in real time.
Page F — FM Algorithm Editor
Visual node graph for FM operator routing. Each of the 8 operators is a box; modulation connections are arrows. Carriers (→ audio output) shown in green; modulators in amber.
Drag from one operator to another to create or destroy a modulation path. Click an operator to expand its parameters inline: frequency ratio, output level, envelope shape (mini-display), feedback amount.
Supports all 32 DX7 algorithms displayed as the original Yamaha diagrams, plus
completely free routing beyond anything DX7 offered — any operator can modulate any
other, including chains, stacks, and feedback loops. DX7 .syx import populates the
node graph automatically.
Page M — Modulation Matrix Editor
All 64 modulation slots displayed as a visual grid. Each active slot shown as a line: source on left, destination on right, line thickness proportional to modulation amount. Active modulators animate — thickness pulses with the modulator's live value so you can see every LFO and envelope moving in the display while the patch plays.
Page A — Audio Analyser (Phasor / DFT Decomposition)
Inspired by Sebastian Lague's Fourier transform visualisation. A single FFT frame of audio decomposed into individual rotating phasors — one circle per frequency component. Each circle's radius equals the component's amplitude; it rotates at that component's frequency. The tip of the final (outermost) phasor traces the reconstructed waveform in real time. This is the geometric intuition behind the Fourier transform made visible and interactive.
┌──────────────────────────────────────────────────────┐
│ PAGE A — PHASOR DECOMPOSITION │
│ │
│ ○──────────────────── fundamental │
│ │ ○──────────── 2nd harmonic │
│ │ │ ○──── 3rd harmonic │
│ │ │ │ ● tip traces waveform │
│ │ │ │ │
│ [amplitude rings rotating at each frequency] │
│ │
│ Right panel: reconstructed waveform (sum of all) │
│ Bottom: amplitude bars per harmonic (like Page H) │
└──────────────────────────────────────────────────────┘
Analysis engine (SG2000 big core, C906 @ 1GHz):
- Cooley-Tukey FFT, 4096-point, Hann windowed
- At 44.1kHz: ~10.8Hz per bin — resolves individual harmonics of notes above ~20Hz
- Fundamental detection: autocorrelation or harmonic product spectrum
- Output: N amplitude + phase values for the detected harmonic series
Input sources (selectable):
- Loaded sample (from Page S)
- Live ADC input (mic or line in via WM8958)
- Resample-own-output — analyse the live output of any FireStorm voice or the full mix
- Manual FFT on any memory buffer
Rendering (FireStorm rasteriser):
- Phasor circles: each drawn as an unfilled circle (radius = amplitude), endpoint dot
- Chain rendered sequentially — innermost (fundamental) first, tip of each feeds the next circle's centre
- Reconstructed waveform traced in a contrasting colour at the right
- Amplitude bars along the bottom mirror Page H (same data, different view)
- All rendered by FireStorm 2D rasteriser from a draw list built by the CPU — zero pixel work on the application processor
Key interaction:
- Jog dial 1: scrub through the input audio — phasors update per frame
- Jog dial 2: zoom into a frequency range
SENDcommand (or dedicated button): push current harmonic analysis directly to Page H — populates amplitude and phase sliders automaticallyFREEZEcommand: locks phasors at current frame for inspection
The resynthesis pipeline — closing the loop:
Live audio or sample
↓
[FFT — Page A]
↓ SEND
[Page H — edit amplitude, phase, per-harmonic envelope]
↓ IFFT
[Page W — resulting waveform, editable further]
↓
FireStorm wavetable playback (128–256 voices)
This is a genuine resynthesis workflow. The musician analyses a real-world sound, the Ant64 decomposes it to its harmonic content, they edit the harmonics by hand, and the result plays back as a synthesised voice. Nothing else in the current instrument market offers this as a first-class built-in workflow.
Page D — Spectral Analysis (Spectrogram, Waterfall, Spectrum)
Page D is the Ant64's full spectral analysis page. All four display modes draw from the same underlying Short-Time Fourier Transform (STFT) engine — they are different projections of identical data, switchable at any time without recomputing the analysis.
The spectrogram is the default and primary view: the most legible, the most useful for practical sound design, and the form used in professional audio analysis tools worldwide. The 3D waterfall is available as an alternate mode — a homage to the Fairlight CMI's iconic display, now running at 60fps with filled shading rather than the original's slow wireframe.
Mode 1 — Spectrogram (Default)
Time runs left to right, frequency runs bottom to top, amplitude is encoded as colour intensity. The entire life of a sound is visible at once as a 2D heat map. This is what audio engineers, researchers, and linguists actually use for analysis work.
Freq │
20k │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
10k │ ░░░░░░▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
5k │ ░░░░░░████░░░░░░░░░░░░░░░░▓░░░░░░░░░░░░░░░░░
2k │ ░░░░░░████▓░░░░░░░░░░░░░░░█░░░░░░░░░░░░░░░░░ ← formant
1k │ ░░░░░░████████▓▓░░░░░░░░░░█░░░░░░░░░░░░░░░░░
500 │ ░░░░░████████████▓░░░░░░░░██▓░░░░░░░░░░░░░░░
200 │ ░░░▓█████████████████▓▓░░░███████▓░░░░░░░░░░
100 │ ░░░███████████████████████████████████▓░░░░░ ← fundamental
20 │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
└──────────────────────────────────────────────→ Time
↑ attack ↑ tail off
Colour scale (cool → warm with amplitude):
black → deep blue → cyan → green → yellow → orange → red → white
silent loud
What becomes immediately visible:
| Feature | What you see |
|---|---|
| Attack transient | Bright vertical flash across all frequencies at note onset |
| Harmonic series | Horizontal bands at fundamental + partials, evenly spaced in log view |
| Inharmonicity | Slight stretching of partials — visible in piano, bells, metal |
| Formants | Broad bright horizontal bands — distinguish vowel sounds in voice |
| Vibrato / pitch drift | Horizontal bands that waver — out-of-tune recording visible immediately |
| Filter sweep | Bright energy moving up or down the frequency axis over time |
| Loop discontinuity | Smeared vertical burst at the loop point — find the click |
| Noise floor | Diffuse low-level colour across all frequencies |
| Room reflections | Faint echo of the attack transient arriving slightly later |
| Noise vs tone | Broadband noise = vertical smear; pure tone = thin horizontal line |
Practical uses:
- Sample editing: find exactly where to trim, where a loop click is, where the tail becomes pure noise
- Vocal analysis: see formant structure; tune SAM speech synth Throat/Mouth to match
- Filter design: watch the cutoff move in real time while turning a filter parameter
- Tuning: vibrato, drift, and intonation all visible without listening
- Patch comparison: load two samples, compare their spectrograms side by side
Mode 2 — 3D Waterfall (Fairlight Homage)
The Fairlight CMI's Page D — a 3D mountain-range display of the evolving spectrum. Frequency on X, amplitude on Y, time receding into Z. The Ant64 version runs at 60fps with filled, Gouraud-shaded polygons and the same cool-to-warm colour map as the spectrogram. The original Fairlight rendered a slow wireframe on hardware that was extraordinary for 1979 but would be a slideshow by modern standards.
┌──────────────────────────────────────────────────────┐
│ MODE 2 — 3D WATERFALL │
│ │
│ Z (time, past →) │
│ \ │
│ \ ████ │
│ \ █ ████ █ │
│ \█ █████ ██ │
│ ────────────────── X (frequency, low → high) │
│ Y = amplitude │
│ │
│ Filled + Gouraud shaded · 60fps · amplitude → hue │
└──────────────────────────────────────────────────────┘
Best for: demonstrations, understanding the temporal shape of a sound intuitively, presentations, and the Fairlight CMI experience. Less practical than the spectrogram for detailed editing work — the 3D projection occludes information that the flat view makes explicit.
Jog dials 5 and 6 rotate the perspective (yaw and pitch) in real time.
Mode 3 — Live Spectrum (Single Frame)
A classic spectrum analyser bar display — frequency on X, amplitude on Y, for the current moment only. This is the mode you see on hi-fi equipment and DAW channel strips. Useful for checking a live mix, monitoring a voice's output character in real time, or watching a filter sweep without the time history of the spectrogram.
┌──────────────────────────────────────────────────────┐
│ MODE 3 — LIVE SPECTRUM │
│ │
│ dB │
│ 0 │ ██ │
│ -12 │ ██ ████ █ │
│ -24 │ ████ ████ ███ █ │
│ -36 │ ██████████ ████ ██ █ │
│ -48 │ ████████████████████████ ███ │
│ └──────────────────────────────── Freq → │
│ Peak hold · RMS overlay · dB scale switchable │
└──────────────────────────────────────────────────────┘
Features: peak hold (decaying dots above each bar), RMS overlay as a smooth curve, linear or logarithmic frequency axis, dB or linear amplitude scale. Updates at the full frame rate of the STFT hop — effectively real time.
Mode 4 — Phasor View (Link to Page A)
A live single-frame phasor decomposition — the same display as Page A but embedded in
Page D for quick access without leaving the spectral analysis context. Press PAGE A
to jump to the full Page A view for detailed interaction.
STFT Engine Parameters
Shared across all four modes.
The STFT Pipeline — Step by Step
The spectrogram is built by running a succession of overlapping FFT frames across the audio and stacking the results. Each step in the chain has a specific role:
Audio stream (PCM, 44.1kHz)
│
│ Step 1 — SEGMENT
│ Slice the audio into overlapping frames.
│ Each frame is FFT_SIZE samples long.
│ Each successive frame is offset by HOP_SIZE samples.
│
├── frame 0: samples 0 → 4095
├── frame 1: samples 512 → 4607 (offset by hop = 512)
├── frame 2: samples 1024 → 5119
├── frame 3: samples 1536 → 5631
│ ...
│ overlap = FFT_SIZE − HOP_SIZE = 4096 − 512 = 3584 samples
│ overlap ratio = 3584 / 4096 = 87.5%
│
│ Step 2 — WINDOW
│ Multiply each frame sample-by-sample by the window function.
│ Tapers the frame smoothly to zero at both edges.
│ Eliminates the sharp discontinuity the FFT would otherwise see.
│
│ w[n] = 0.5 × (1 − cos(2π·n / (N−1))) ← Hann window
│
│ Amplitude
│ 1.0 │ ╭─────────────╮
│ │ ╭╯ ╰╮
│ 0.5 │ ╭╯ ╰╮
│ │ ╭╯ ╰╮
│ 0.0 │╭╯ ╰╮
│ └─────────────────────────────── sample index 0 → N
│
│ Step 3 — FFT
│ Cooley-Tukey FFT on the windowed frame.
│ Output: N/2 complex bins, each representing one frequency component.
│ Extract magnitude: |bin| = √(re² + im²)
│ Extract phase: φ = atan2(im, re) (used by Page A)
│
│ Step 4 — LOG / dB CONVERSION
│ Convert linear magnitude to decibels:
│
│ dB[k] = 20 × log₁₀(magnitude[k] + ε)
│
│ ε is a small floor value (~10⁻¹⁰) to avoid log(0).
│ Result: −96dB (silence / noise floor) to 0dB (full scale).
│
│ Step 5 — NORMALISE
│ Map the dB range to 0.0 – 1.0:
│
│ t[k] = clamp((dB[k] − dB_floor) / (dB_ceil − dB_floor), 0, 1)
│
│ dB_floor and dB_ceil are user configurable (default: −96 to 0).
│
│ Step 6 — COLOUR LUT LOOKUP
│ Use t[k] as an index into a 256-entry RGB colour lookup table.
│ The LUT lives in FireStorm BRAM (256 × 3 bytes = 768 bytes).
│ Swapping colour maps = writing a new LUT to BRAM — zero render cost.
│
│ pixel colour = LUT[round(t[k] × 255)]
│
└── Repeat for every frame → stack of colourised rows → spectrogram
The 87.5% overlap ratio sounds high but is correct for a Hann window. Because the window tapers to zero at the edges, samples near the frame boundaries are heavily attenuated. A 75–87.5% overlap ensures every audio sample falls near the centre of at least one frame where it carries full weight. Without sufficient overlap, faint events between frames can disappear entirely from the display.
Windowing — Spectral Leakage and Its Cure
The FFT assumes the analysis window is a fragment of an infinitely repeating signal. If the audio does not start and end at exactly the same value at the frame edges — which almost never happens with real-world sound — the FFT sees a sharp discontinuity. That discontinuity has energy at every frequency: a pure 440Hz tone smears energy into 441Hz, 450Hz, 600Hz and beyond. This is spectral leakage — energy appearing in bins it has no business being in, obscuring faint harmonics near strong ones.
Multiplying by a window function that tapers smoothly to zero at both edges eliminates the discontinuity. The Ant64 offers four window choices, selectable per analysis session:
| Window | Characteristics | Best for |
|---|---|---|
| Hann | Raised cosine, 2 terms. Good frequency resolution, good leakage suppression. The practical default for almost all audio work. | General purpose — synths, instruments, voice |
| Blackman-Harris | 4-term cosine. Very high leakage suppression (~92dB side-lobe rejection). Wider spectral peaks — less frequency precision, but faint harmonics near strong ones become visible. | Complex tonal sounds — distinguishing close harmonics |
| Flat-top | Optimised for amplitude accuracy, not frequency resolution. Very wide peaks, very accurate peak amplitude measurement. | Calibration — measuring the precise level of a specific harmonic |
| Rectangular | No windowing — maximum time resolution, maximum leakage. Every sample weighted equally. | Transient / percussion analysis — when timing matters more than frequency accuracy |
The Hann formula for reference (N = FFT size, n = sample index 0..N−1):
w[n] = 0.5 × (1 − cos(2πn / (N−1)))
Colour Maps — Remapping Greyscale to Colour
Once each FFT bin magnitude is normalised to 0.0–1.0, it is mapped through a colour lookup table stored in FireStorm BRAM. The choice of colour map significantly affects what is visible — a naive greyscale misses quiet features; a poorly chosen colour scale can create false structure that looks like real signal.
Four maps are available, each suited to different tasks:
| Map | Colours | Character | Best for |
|---|---|---|---|
| Viridis | Dark purple → blue → teal → green → yellow → bright yellow | Perceptually uniform — equal steps in magnitude appear as equal steps in perceived brightness. Colourblind-safe. Readable printed in greyscale. | Default. General analysis, publication, colourblind users |
| Inferno | Black → deep red → orange → yellow → white | Slightly better quiet-end contrast than Viridis. Dramatic, high dynamic range feel. | Sounds with a lot of detail in the noise floor |
| Classic hot | Black → red → orange → yellow → white | Traditional audio analyser look. Familiar to DAW and spectrum analyser users. | Users coming from a DAW background |
| Green phosphor | Black → dark green → bright green → white | The Fairlight CMI aesthetic. Monochrome but with identity. | Mode 2 (3D Waterfall) — the authentic Fairlight look |
The LUT is 256 × 3 bytes = 768 bytes of BRAM — negligible. Switching maps at runtime is a single BRAM write from the CPU; the FireStorm rasteriser sees the new table immediately on the next frame with no pipeline stall.
Parameters Reference
| Parameter | Value | Notes |
|---|---|---|
| FFT size | 2048 or 4096 pt | User selectable — frequency vs. time resolution trade-off |
| Hop size | 512 samples | New frame every ~11.6ms at 44.1kHz; 87.5% overlap with 4096-pt FFT |
| Window | Hann (default) / Blackman-Harris / Flat-top / Rectangular | See window table above |
| Frequency resolution | ~10.8Hz/bin (4096pt) · ~21.5Hz/bin (2048pt) | Resolves harmonics of notes above ~20Hz at 4096pt |
| History depth | 128–1024 frames | 1.5 to 11.8 seconds at 44.1kHz / 512-hop |
| Colour map | Viridis (default) / Inferno / Classic hot / Green phosphor | LUT in FireStorm BRAM — zero-cost swap |
| dB range | −96dB to 0dB | Configurable floor and ceiling |
| Amplitude → colour | linear mag → log (dB) → normalise → LUT index | Full pipeline described above |
Frequency resolution trade-off:
4096-point FFT: fine frequency resolution (~10.8Hz/bin) coarser time (~93ms/frame)
2048-point FFT: coarser frequency (~21.5Hz/bin) finer time (~46ms/frame)
For tonal sounds (synths, pitched instruments): use 4096. For percussive sounds (drums, transients): use 2048 — attack timing matters more than harmonic detail. The Rectangular window pairs naturally with 2048 for maximum transient precision.
Input Sources
All four modes share the same input selection:
| Source | Description |
|---|---|
| Loaded sample | Analysis run over entire sample offline — full history available |
| Live ADC | Mic or line in via WM8958 — continuous real-time scrolling display |
| Resample-own-output | Any FireStorm voice or the full mix — watch synthesis live |
| Memory buffer | Any PCM buffer held in DBFS — for analysis of intermediate results |
Key Interactions
| Control | Action |
|---|---|
| Jog dial 1 | Time scrub / scroll speed |
| Jog dial 2 | Frequency zoom — focus on a harmonic region |
| Jog dial 3 | History depth / waterfall length |
| Jog dial 4 | Amplitude range / dB floor |
| Jog dial 5 | Perspective yaw (Mode 2 only) |
| Jog dial 6 | Perspective tilt / pitch (Mode 2 only) |
1 / 2 / 3 / 4 |
Switch display mode |
LOG / LIN |
Toggle log or linear frequency axis |
PLAY |
Scrub loaded sample — all modes scroll in sync |
MARK |
Place time cursor — use with Page S to jump to that sample point |
SEND |
Push selected frame's spectrum to Page A and Page H |
FREEZE |
Lock display at current frame for inspection |
Relationship to Other Pages
Page D shows the time evolution of the spectrum — the whole life of the sound. Page A shows a single moment — the geometric decomposition of one frozen frame. Page H exposes the harmonic content as editable sliders, fed by analysis from either. Page S shares the same time cursor — marks placed in Page D jump to the exact sample position in the waveform editor, making it easy to find a click or a loop problem and fix it without switching mental context.
Page D (time cursor MARK) ──────────→ Page S (jump to sample position)
Page D (SEND one frame) ──────────→ Page A (phasor view of that frame)
Page D (SEND one frame) ──────────→ Page H (harmonic sliders populated)
Page A (SEND) ──────────→ Page H (same path, from phasor view)
Page H (IFFT) ──────────→ Page W (resulting waveform)
Page W ──────────→ FireStorm wavetable engine
Spectral Drawing — Painting Sound Directly onto the Spectrogram
Page D's spectrogram is not read-only. The drawing tools work directly on the frequency domain canvas — painting magnitude values per bin per frame. The ISTFT engine reconstructs audio from whatever is drawn, in real time. The spectrogram becomes a two-dimensional instrument: frequency is one axis, time is the other, and brightness is amplitude.
The Drawing Tools
| Tool | Behaviour | Sonic result |
|---|---|---|
| Pencil | Sets magnitude at cursor — brush size = frequency bandwidth | Pure tone or narrow band |
| Line | Sweep between two points — straight or curved | Glissando, pitch sweep |
| Fill | Flood fill a frequency region | Noise band — bandwidth = region height |
| Spray | Random magnitude scatter in a region | Textured, granular noise |
| Eraser | Sets bins to silence | Remove unwanted content |
| Clone | Copy a time-frequency region, paste elsewhere | Repeat a motif, delay effect |
| Mirror | Reflect a region horizontally or vertically | Symmetric spectral structures |
What You Draw — What You Hear
| Drawing | Sound |
|---|---|
| Horizontal line at fixed frequency | Sustained pure tone at that pitch |
| Diagonal line rising left to right | Glissando — smooth upward pitch sweep |
| Stack of horizontal lines at harmonic intervals | Pitched timbre — more lines = richer |
| Broad soft horizontal band | Filtered noise — bandwidth = band height |
| Formant blobs at ~500Hz, ~1500Hz, ~2500Hz | Vowel "ah" — move bands to change vowel |
| Short vertical smear | Click, transient, attack |
| Bright isolated dot | Short blip or percussive hit |
| Anything | Something worth hearing |
Phase Synthesis
Drawing provides magnitudes — phase must be synthesised. Three strategies:
| Mode | Method | Character |
|---|---|---|
| Coherent (default) | Phase advances per bin by 2π × k × hop / N each frame |
Clean, smooth tones — best for melodic drawing |
| Random | Independent random phase per bin per frame | Natural-sounding textures, noise, complex surfaces |
| Zero | All phases set to zero | Symmetric signal, can sound buzzy — useful for waveform design |
NOISE toggle switches between Coherent and Random. Coherent for pitched lines, Random
for textured fills — the two most common drawing scenarios.
ISTFT Reconstruction
The reverse of the analysis pipeline — drawn magnitudes reconstructed to PCM:
Drawn spectrogram (magnitude per bin per frame)
│
│ Phase synthesis (coherent / random / zero)
│ Combine magnitude + phase → complex FFT bins
│
↓
IFFT per frame → time-domain segment (N samples)
│
│ Multiply by synthesis Hann window
│ (same window used in analysis — ensures perfect reconstruction
│ when hop and window are matched)
│
↓
Overlap-add with hop size
(successive frames summed at their offset positions)
│
↓
Continuous PCM audio stream → FireStorm output
Overlap-add with matched analysis and synthesis windows gives perfect reconstruction — a drawing that exactly reproduces the original spectrogram will produce audio indistinguishable from the original. Simplified drawings produce simplified audio. The quality of the reconstruction is entirely determined by what was drawn.
Playback is live — drawing and listening happen simultaneously. As the musician paints, the audio updates within one hop period (~11.6ms). The canvas is a live instrument.
Spectral Tracing — Simplifying Real Sounds
The reference layer places the original sound's spectrogram underneath the drawing canvas at reduced opacity, like tracing paper over a photograph. The musician draws on top — tracing the components they want, ignoring those they do not. Playback reconstructs only what was drawn.
┌─────────────────────────────────────────────────────────┐
│ LAYER 0 — Reference (read-only, ghost) │
│ Original sound spectrogram │
│ 50–70% opacity — visible but not dominant │
│ Plays back only when A/B mode held │
├─────────────────────────────────────────────────────────┤
│ LAYER 1 — Drawing (editable, full opacity) │
│ User-drawn spectrogram — starts empty │
│ This is what the ISTFT reconstructs and plays │
└─────────────────────────────────────────────────────────┘
Ghost opacity is controlled by jog dial 8. Hold AB to temporarily play the reference
layer for direct comparison. The ear judges the tracing quality faster than any meter.
What Tracing Reveals
The process of tracing is itself an education in acoustics. Working from coarse to fine:
Pass 1 — Fundamental only
A single horizontal line at the pitch of the note.
Plays back as a pure sine wave. How much identity survives?
Usually: recognisably the right pitch but no timbre.
Pass 2 — Harmonics
Add lines at 2×, 3×, 4× the fundamental.
Match brightness to the original. Timbre begins to emerge.
Three or four harmonics recovers much of the character.
Pass 3 — Formants (for voice and acoustic instruments)
Draw broad soft bands at the formant frequencies.
For voice — ~500Hz, ~1500Hz, ~2500Hz for "ah".
The vowel identity appears. Move the bands: the vowel changes.
Pass 4 — Transients
Short vertical smears at note attack points.
Articulation and pluck/bow character return.
Pass 5 — Noise (optional)
Spray tool for breath, rosin, room, string noise.
The difference between clinical and organic.
Your choice whether to include it.
Each pass is a separate sub-layer, independently toggleable. Mute the noise layer alone to hear the clean tone underneath. Solo the formants to hear just the vowel character. The sub-layers are a spectral mixing desk — each drawn element is a channel.
What Tracing Achieves
Manual psychoacoustic compression. You decide which frequency content the ear actually needs. Often far less than the original recording contains. MP3 does this algorithmically; here the musician decides, with their ears as the judge. The result is frequently more musical than algorithmic compression because the choices are deliberate.
Clean noise reduction. Rather than subtracting an estimated noise floor (which leaves artefacts), draw only the signal content and leave the noise unpainted. It simply does not exist in the reconstruction.
Source separation by hand. Two instruments overlapping in a recording have different harmonic series. Trace one, ignore the other. Imperfect but often effective for melodic material where the fundamental lines are distinct.
Understanding a sound's identity. The tracing process reveals which components are perceptually essential. A violin traced to just four harmonics still sounds like a violin. Remove the transient smear and it loses its bow attack. Remove the noise layer and it loses its rosin. Each element's contribution becomes viscerally clear.
Spectral Drawing as Synthesis Source
Every synthesis engine on the Ant64 accepts sources derived from a spectrogram drawing. The drawing is a universal source editor — the physicist's view of sound made directly manipulable, with all three synthesis paradigms reachable from a single canvas.
Path 1 → Additive Synthesis / Wavetable (Page H / Page W)
The drawn horizontal lines are additive synthesis components. Each line is a sine oscillator at a given frequency with a magnitude that varies over time. The SEND command extracts the harmonic content from the drawing and populates Page H directly — amplitude sliders, phase values, and per-harmonic envelopes all filled from the drawing. IFFT converts to a waveform cycle in Page W. FireStorm plays it with 128 polyphonic voices.
What tracing adds that pure synthesis cannot: the relative amplitudes and formant positions were derived from a real-world sound. The acoustic physics of the original instrument constrain and guide the synthesis. The result starts in a space that already sounds plausible rather than requiring the musician to discover it from scratch.
Path 2 → Sample Engine (PCM)
Run the ISTFT over the full time extent of the drawing — not just one cycle, the entire canvas. The result is a PCM audio buffer. Load it directly into the sample engine as a new source. It plays back with the full sample engine feature set: pitch tracking, loop points, velocity layers, multi-sample key zones.
This is the right path for sounds that cannot be reduced to a repeating cycle:
- A traced vowel transition ("ah" → "ee") — formant bands move over time
- A traced drum hit — transient smear and tail as a single gesture
- A traced environmental texture — wind, breath, room, rain
- A drawn melodic phrase — time-varying pitch and amplitude by hand
The drawn version is always cleaner than the original: noise not drawn is not present. Bleed from other instruments not traced simply does not exist in the reconstruction. The sample engine then plays this cleaned, simplified source with full polyphony.
Path 3 → FM Synthesis (Page F)
The most analytical path. When the drawing contains a harmonic series — a fundamental and its partials — the frequency ratios between the drawn lines define an FM operator configuration:
Traced harmonics: 100Hz 200Hz 300Hz 400Hz 500Hz
Ratio to fundamental: 1:1 2:1 3:1 4:1 5:1
FM operator ratios: C=1 M=2 M=3 M=4 M=5
Modulation indices: derived from relative amplitudes of each partial
The system cannot fully auto-generate FM patches from complex drawings — the magnitude-to-index relationship is not straightforwardly invertible for arbitrary spectra. But it analyses the drawing and proposes a starting patch: detected ratios, suggested modulation indices, recommended algorithm. The musician opens Page F and refines from a meaningful starting point rather than a blank operator graph.
Inharmonic sounds are where this is most powerful. A bell's partials are slightly stretched — visible in the spectrogram as non-integer spacing. The analysis tells the musician immediately that a non-integer FM ratio is needed, and approximately what it should be. Metallic textures, physical models, and acoustic percussion all reveal their FM structure in the drawing.
The Unified Source Creation Pipeline
Real-world sound / live input / own output
│
▼
Page D — load as reference (ghost layer, Layer 0)
│
▼
Draw / trace on Layer 1 — simplified, essential, intentional
│
│ Play back drawing during tracing — live ISTFT
│ A/B against reference — ear judges quality
│ Sub-layers: harmonics / formants / transients / noise
│
├──── PATH 1 ──────────────────────────────────────────────┐
│ SEND to Page H (harmonic sliders populated) │
│ Edit amplitude, phase, per-harmonic envelope │
│ IFFT → Page W (waveform cycle) │
│ → FireStorm additive / wavetable engine │
│ → 128 polyphonic voices │
│ │
├──── PATH 2 ──────────────────────────────────────────────┤
│ ISTFT full time extent → PCM buffer │
│ → Sample engine (256 voices) │
│ → Loop points, velocity layers, key zones │
│ → Cleaner than original — noise not drawn = absent │
│ │
├──── PATH 3 ──────────────────────────────────────────────┤
│ Harmonic ratio analysis → FM patch proposal │
│ → Page F (operator graph, suggested ratios + indices)│
│ → Refine by ear │
│ → FM engine (128 voices) │
│ → Especially powerful for inharmonic / metallic │
│ │
└──── PATH 4 ──────────────────────────────────────────────┘
ISTFT single cycle → Page W wavetable
→ VA oscillator (wavetable mode)
→ Nonlinear ladder filter + SVF
→ BBD chorus · VCA envelope
→ 128 polyphonic voices
→ Drawing is raw material — filter/envelope/chorus
reshape it dynamically on every note
Path 4 → Analog / VA Engine (Wavetable Oscillator Mode)
The VA engine accepts any wavetable as its oscillator source — including one derived from a spectrogram drawing. This is distinct from the other three paths because the drawing is not the final sound: it is the input to an ongoing analog signal chain. The filter, envelope, chorus, and saturation all act on it dynamically, every note, in real time.
Drawn / traced wavetable (from Page W via IFFT, or direct from Page D)
│
▼
VA oscillator — wavetable mode
(cycles through the waveform at the played pitch, with BLEP anti-aliasing)
│
▼
Nonlinear ladder filter
← cutoff envelope (ADSR) · resonance · keyboard tracking · velocity mod
← self-oscillation available at high resonance
│
▼
SVF — parallel or series with ladder
← independent cutoff · character: LP / BP / HP / notch
│
▼
BBD chorus
← rate · depth · stereo spread · bucket-brigade clock rate
│
▼
VCA + amplifier envelope
│
▼
128 polyphonic voices
The filter does not know or care where the waveform came from. It processes whatever spectral content the wavetable contains with full analog nonlinearity — the tanh saturation in the ladder stages, the self-oscillation at high resonance, the phase relationships of the BBD delay line. These interactions are not predictable from the drawing alone; they emerge from the physics of the emulated circuit.
What this produces that other paths cannot:
A drawn wavetable through the VA chain is dynamically reshaped with every note. The filter sweeps, envelopes open and close, the chorus introduces time-varying modulation. The source is fixed; the sound is alive. The other three paths (additive, sample, FM) play back the drawn content more or less directly. Path 4 treats it as raw material.
Example combinations:
| Drawn source | VA treatment | Result |
|---|---|---|
| Traced violin harmonics | Moog ladder filter sweep | Violin-Moog hybrid — filter strips partials as cutoff closes, as it would a saw but with violin harmonic spacing |
| Drawn vowel "ah" formants | Ladder resonance at formant frequency + slow filter LFO | Talking filter — vowel character modulates with the resonance peak sweeping through it |
| Traced bell (inharmonic partials) | BBD chorus at slow rate | Metallic shimmer — chorus detune beats against the inharmonic partials, impossible to program conventionally |
| 303 acid engine + traced bass clarinet wavetable | Diode ladder + accent RC circuit | Acid character acting on clarinet harmonics — the resonant squeal at frequencies the original 303 never had |
| M-86 hoover envelope shape + traced choir vowel | Detuned oscillators + filter snap | Hoover-vowel hybrid — the hoover envelope on vowel spectral content |
| Prime-numbered harmonics only (drawn) | Self-oscillating ladder + slow LFO | Unpredictable beating between the sparse harmonic series and the filter's own resonant frequency |
Wavetable scanning:
If the drawing spans multiple frames in time, the VA engine can scan through them — the oscillator advances through the wavetable as a function of note length, velocity, an LFO, or an envelope. A drawing that transitions from "ah" to "ee" over 2 seconds becomes a vowel-morphing oscillator source, with the filter adding its own movement on top. This is wavetable synthesis in the classical Waldorf/PPG sense, with the wavetable itself authored from a spectrogram drawing rather than pre-baked ROM.
Updating the unified pipeline:
Path 4 sits alongside Path 1 — both use Page W as the wavetable source. The difference is the destination: Path 1 feeds directly to the FireStorm additive engine; Path 4 routes through the full VA signal chain first.
Draw / trace on Page D
│
└──── PATH 4 ──────────────────────────────────────────────┐
ISTFT single cycle → Page W wavetable │
→ VA oscillator (wavetable mode) │
→ Nonlinear ladder filter + SVF │
→ BBD chorus │
→ VCA envelope │
→ 128 polyphonic voices │
→ Dynamic — filter, envelope, chorus reshape the │
source on every note, in real time │
The key distinction across all four paths:
| Path | Engine | Source is... | Sound is... |
|---|---|---|---|
| 1 — Additive / WT | FireStorm wavetable | The final timbre | Static per cycle, dynamic via modulation matrix |
| 2 — Sample | PCM playback | The complete sound, full duration | As drawn — cleaned, simplified original |
| 3 — FM | Operator synthesis | A starting patch approximation | Dynamically generated from operator interaction |
| 4 — VA | Oscillator → analog chain | Raw spectral material | Continuously reshaped by filter, envelope, chorus |
Sounds That Have Never Existed
Because the drawing is unconstrained by physical acoustics, the musician can create sources impossible in the real world:
- The formant structure of a human vowel with the harmonic spacing of a bell
- The transient of a snare drum with the sustained body of a cello
- A voice that speaks in perfect harmonic series (every partial exactly integer)
- A piano with no inharmonicity — perfectly locked partials
- A sound with energy only at prime-numbered harmonics
- A choir vowel that morphs continuously from "ah" to "ee" to "oo" over 4 seconds
These are not available from any physical instrument, any sample library, or any conventional synthesis workflow. They exist only in the frequency domain, and the spectrogram canvas is the place to create them.
Gossip Network — Sharing Drawn Sources
Sources created by spectral drawing are compact:
| Source type | Typical size |
|---|---|
| Page H harmonic patch (64 harmonics, full envelopes) | ~4KB |
| Page W wavetable (256 samples, 16-bit) | 512 bytes |
| FM patch (8 operators, full parameters) | ~1KB |
| VA patch (wavetable + filter + envelope + chorus params) | ~2KB |
| PCM sample (4 seconds, 44.1kHz, 16-bit mono) | ~353KB |
All are shareable over the Ant64 gossip network to any other Ant64 on the same mesh. A musician on one machine traces a sound, creates a patch, shares it — another musician receives it, modifies the drawing, sends a variation back. Spectral tracing becomes a collaborative instrument design workflow across machines.
The Full Analysis-to-Synthesis Pipeline
┌──────────────────────────────────────────────────────────────────────┐
│ INPUT SOURCES │
│ Live ADC mic/line · Loaded sample · Resample-own-output │
│ Import image as sound (PNG pixel brightness → FFT magnitudes) │
└──────────────────────────┬───────────────────────────────────────────┘
│
┌───────────────▼─────────────────┐
│ FFT / STFT ENGINE │
│ SG2000 big core · C906 @ 1GHz │
│ Cooley-Tukey · 4096pt │
│ Hann / BH / Flat-top / Rect │
│ 512-sample hop · 87.5% overlap│
└───┬─────────────────────┬───────┘
│ │
┌────────────▼──────┐ ┌───────────▼───────────────────────────────┐
│ PAGE A │ │ PAGE D │
│ Phasor view │ │ Mode 1: Spectrogram (default) │
│ Single frame │ │ Mode 2: 3D Waterfall (Fairlight homage) │
│ Phase-aware │ │ Mode 3: Live Spectrum (bar graph) │
│ Rotating circles │ │ Mode 4: Phasor (links Page A) │
│ │ │ │
│ │ │ DRAWING TOOLS — canvas is live: │
│ │ │ Layer 0: reference ghost (original) │
│ │ │ Layer 1: drawing (plays back via ISTFT) │
│ │ │ Sub-layers: harmonics / formants / │
│ │ │ transients / noise │
└────────┬──────────┘ └──────────┬──────────────┬───────────┬────┬──────────┘
│ SEND │ SEND │ ISTFT │ FM │ ISTFT
│ (1 frame) │ to H+A │ full time │ ↓ │ 1 cycle
└──────────┬─────────────┘ │ │ │
│ │ │ │
┌────────────▼────────────┐ │ │ │
│ PAGE H │ │ │ │
│ Harmonic Editor │ │ │ │
│ Amplitude · Phase │ │ │ │
│ Per-harmonic envelopes │ │ │ │
└────────────┬────────────┘ │ │ │
│ IFFT │ │ │
┌────────────▼────────────┐ │ │ │
│ PAGE W ├───────────────┼───────────┼────┘
│ Waveform Drawing │ │ │
│ Fine sculpt post-IFFT │ │ │
└──────┬─────────────┬────┘ │ │
│ │ │ │
│ (additive) │ (VA wavetable) │ │
┌──────▼──────┐ ┌───▼─────────────────┐ │ │
│ FireStorm │ │ VA ENGINE │ │ │
│ Additive / │ │ Wavetable osc │ │ │
│ Wavetable │ │ Ladder filter │ │ │
│ 128 voices │ │ SVF · BBD chorus │ │ │
└─────────────┘ │ VCA envelope │ │ │
│ 128 voices │ │ │
└─────────────────────┘ │ │
┌─────────▼─────┐ ┌─▼──────────┐
│ Sample Engine│ │ PAGE F │
│ PCM buffer │ │ FM Graph │
│ 256 voices │ │ 128 voices│
│ Loop · Vel. │ │ Inharmon- │
└───────────────┘ │ ic ready │
└────────────┘
This is a complete draw → play → refine → synthesise loop. A musician can take any sound from the real world, trace its essential components, discard what is not needed, and export the result to any of the four synthesis engines on the Ant64 — all without leaving the workstation app. Or they can draw from nothing, creating sounds constrained only by imagination rather than physical acoustics. The VA path in particular treats the drawing as raw material rather than a finished sound — handing it to the filter, chorus, and envelope to reshape dynamically on every note.
Application Architecture
FireStorm is the ImGui rendering backend for the workstation app.
The SG2000 big core builds the Dear ImGui draw list each frame — vertices, indices, and draw commands — and DMAs it to FireStorm, which rasterises it into the framebuffer in hardware. The CPU does zero pixel work. FireStorm's audio DSP and 2D rasteriser run on orthogonal FPGA resources (SRAM A for audio, SRAM B for rasteriser) and never contend. Full rendering architecture, rasteriser pipeline, and 3D two-pass LOD system are documented in the Display Architecture reference.
Why this matters for the music app: The big core runs pure C++ application logic — patch management, sequencer state, UI event handling, DBFS I/O — with no OS overhead. The full 1GHz C906 is available for application work. FireStorm handles all rendering in parallel with audio DSP.
FireStorm Internal Hard RISC-V Core
The GoWin 138k FPGA (GW5AST), used in the Ant64 and Ant64C only, contains a built-in hard RISC-V processor core — silicon on the die, not synthesized from LUTs. The GoWin 60k in the Ant64S does not have this core. On the Ant64/Ant64C this is used for internal FireStorm hardware debugging.
What "hard" means: A soft RISC-V core implemented in LUTs would consume thousands of LUT4 resources — fabric that would otherwise be available for audio DSP voices or rasterizer logic. The hard core costs zero LUTs. It is free silicon, already there.
What it does on the Ant64:
The hard RISC-V runs independently of all other FireStorm activity, monitoring internal state and providing a debug window into the FPGA's live operation:
- Voice status monitoring — which FireStorm voices are active, envelope states, DSP pipeline occupancy
- Performance counters — audio DSP cycle budgets, rasterizer triangle throughput, memory bus utilisation on SRAM A, SRAM B, DDR3
- Error detection — underruns, overflows, FIFO stalls, QSPI protocol errors
- Register inspection — read/write any FireStorm register from inside the FPGA without going via the external QSPI bus
- Debug stream output — sends status packets to DeMon over a dedicated internal channel; DeMon forwards them to the SG2000 via hi-speed UART for display in AntOS
- Assertion checking — configurable internal assertions that halt or flag specific voice states, timing violations, or data integrity issues during development
Why this matters for development:
Debugging an FPGA design traditionally means either adding logic analyser probes externally (slow, invasive) or instantiating an internal debug core like Xilinx's ILA (consumes LUTs). The hard RISC-V gives a third option: a resident debug processor that has full visibility into FireStorm's internals at silicon speed, consuming no programmable fabric, and communicating with the outside world through DeMon's existing JTAG and UART infrastructure.
It also has a role in production use — monitoring audio DSP health in real time and surfacing meaningful diagnostics to AntOS rather than opaque hardware failures.
Future FPGA: GoWin 7 Series (11nm)
By the time the Ant64 reaches production release, the GoWin 7 series at 11nm may be available. If so, the implications are significant:
What 11nm → 22nm means in practice:
- Roughly 2× the logic density for the same die area — a 138k-equivalent chip could shrink to half the size, or a same-size chip could offer ~250–300k LUTs
- Lower power consumption at equivalent clock speeds — important for a device with internal speakers and a passive or modest cooling solution
- Higher maximum clock speeds — the audio DSP pipeline and rasterizer could run faster, giving more margin for complex voice algorithms
- Potentially lower cost — smaller die at advanced node typically reduces per-chip cost once yields mature
What this means for the Ant64 architecture:
The FireStorm voice engine is designed to be node-agnostic — the fixed-point DSP pipeline, the QSPI/MIPI interfaces, the memory controllers, are all defined functionally. Porting to a GoWin 7 series FPGA would be a re-synthesis and re-timing exercise, not a redesign. The same HDL runs on both.
If a GoWin 7 device with 200k+ LUTs at $26 or less becomes available, the Ant64 could realistically push beyond 128 voices without any architectural changes — simply more resources available for time-multiplexed DSP pipeline slots.
Current position: The design is proceeding on GoWin 138k (22nm, $26/chip confirmed). GoWin 7 series availability will be evaluated at the time. The architecture is designed to benefit from it without depending on it.
AntOS Debug Server / Client — Remote Ant64 Debugging
AntOS running on the little core (700MHz C906) of the SG2000/SG2002 includes a built-in debug server and client. This means one Ant64 can debug another Ant64 over the network — no external probe, no USB cable, no debug pod required.
Ant64 A Ant64 B
───────────────── ──────────────────
AntOS little core AntOS little core
dbg client WiFi dbg server
(developer machine) ◄──────────► (target machine)
│ │
AntOS shell exposes:
- inspect memory - SG2000 memory
- read registers - DeMon registers
- set breakpoints - FireStorm state
- stream logs - Pulse state
- upload code - AntOS internals
- control execution - log streams
What the debug server exposes on the target Ant64:
- SG2000 memory inspection — read/write any address in the big core's address space, including FireStorm register map, shared DRAM regions, ImGui draw list buffers, voice parameter tables
- AntOS internals — scripting VM state, DBFS contents, gossip subsystem, active scripts and their stack traces
- DeMon passthrough — the dbg server can forward commands to DeMon, which in turn uses its JTAG to access FireStorm and the SG2000 debug ports. Full chip debug from a remote AntOS shell.
- FireStorm state (via DeMon JTAG) — voice registers, DSP pipeline state, hard RISC-V debug stream (Ant64/Ant64C)
- Pulse state (via DeMon SPI) — sequencer patterns, MIDI state, jog dial positions, LED buffer
- Live log streaming — all subsystem logs forwarded in real time to the client's AntOS terminal
Transport:
The debug server uses a standard TCP/IP socket connection — not gossip. Gossip is the right tool for P2P discovery and broadcast (patch sharing, presence, chat), but for a debug session you want a direct TCP connection: reliable, ordered, low latency, compatible with standard tooling, and no relay overhead.
- Connect by IP address or hostname over WiFi (ESP-C5 on DeMon) or Ethernet (Ant64C)
- Standard BSD socket API on both client and server — straightforward to implement in AntOS's network layer
- Gossip can be used for peer discovery — finding which Ant64s on the network have the debug server active — but the actual debug session runs over TCP directly
- Ant64C's Ethernet port gives a dedicated wired channel independent of WiFi, useful for high-bandwidth log streaming or memory dump operations
Security:
The debug server requires explicit activation in AntOS — it is off by default. Once enabled, it can be locked to specific peer addresses (by Ant64 identity from the gossip subsystem) so only a trusted machine can connect.
Practical workflow:
# On target Ant64B — enable debug server
> dbg server start
# On developer Ant64A — connect and inspect
> dbg connect ant64b.local
[connected to Ant64B]
dbg> memory read 0x08000000 64 # read FireStorm voice registers
dbg> log stream firestorm # live FireStorm audio DSP log
dbg> script traceback # AntOS script stack on target
dbg> demon jtag firestorm regs # dump FireStorm registers via DeMon
Why this matters:
A developer with two Ant64s can sit at one machine running the music app and debug the other machine's AntOS scripts, FireStorm voice state, or Pulse sequencer live — without touching the target device, without interrupting its audio output, and without any external hardware. This is a significantly better development experience than most embedded platforms offer even with dedicated debug hardware.
It also means the Ant64 development community can help each other debug remotely — with permission, a community member can connect to another's Ant64 to help diagnose a FireStorm audio issue or an AntOS script problem.
ImGui draw list format — what the big core sends to FireStorm each frame:
// ImDrawData structure (simplified):
ImDrawList {
ImVector<ImDrawCmd> CmdBuffer; // draw commands
ImVector<ImDrawIdx> IdxBuffer; // index buffer (uint16)
ImVector<ImDrawVert> VtxBuffer; // vertex buffer
}
struct ImDrawVert {
float x, y; // position
float u, v; // texture coords (for font atlas)
uint32_t col; // RGBA colour
};
struct ImDrawCmd {
uint32_t ElemCount; // number of indices for this draw call
ImTextureID TextureId; // font atlas or null
ImVec4 ClipRect; // scissor rectangle
};
Every ImGui primitive — windows, buttons, waveform curves, piano roll notes, FM routing lines — is ultimately triangles in this format. FireStorm receives the packed vertex/index buffers via DMA and rasterises them in hardware. The full rasteriser pipeline is documented in the Display Architecture reference.
3D Rendering Architecture
3D world rendering on FireStorm uses a two-pass technique — far geometry rendered back-to-front with painter's algorithm (no z-buffer), near geometry with a 15/16-bit z-buffer covering only the near zone [0, D]. This solves the perspective z-buffer precision problem and allows LOD chunk baking for distant geometry. Full documentation including z-buffer format analysis, LOD chunk pipeline, vertex formats, and the complete frame render sequence is in the Display Architecture reference. See also: James Lambert's N64 world renderer series for the underlying technique.
FireStorm has three independent memory buses, each dedicated to a specific workload. No bus is shared between audio DSP and the rasterizer — they run in parallel with zero contention.
The Ant64 ships in three hardware tiers with different memory architectures:
| Tier | Bus A | Bus B | Bus C | Contention |
|---|---|---|---|---|
| Premium | 36-bit SRAM | 36-bit SRAM | 32-bit DRAM | None — fully dedicated |
| Standard | 32-bit SRAM | 32-bit SRAM | 32-bit DRAM | None — fully dedicated |
| Starter | DDR5 | PSRAM | — | Arbitrated shared bus |
Premium and Standard share the same three-bus no-contention architecture — audio DSP and rasterizer each get a fully dedicated SRAM bus. The Starter uses DDR5 as a shared high-bandwidth bus with PSRAM for secondary storage, requiring bus arbitration but remaining fully capable.
Ant64 / Ant64C Ant64S
───────────── ──────
FireStorm (GoWin 138k) FireStorm (GoWin 60k)
│ │
┌──────┼──────┐ ┌─────┴──────┐
│ │ │ │ 8MB PSRAM │
┌───▼──┐┌──▼──┐┌──▼──┐ │ (embedded │
│DDR3 ││SRAM ││SRAM │ │ in FPGA) │
│32-bit││ A ││ B │ └────────────┘
│ 1/2GB││36bit││36bit│ All audio+video
│ 2GB ││fast ││fast │ shared on single
└──────┘└─────┘└─────┘ PSRAM bus
samples audio raster
DSP izer
DRAM Bus — 32-bit, up to 2GB
Standard DDR SDRAM interface. Large capacity at moderate speed — suitable for data that is large but not accessed at audio sample rate or pixel clock rate.
Contents:
- Long audio samples (piano multisamples, orchestra, full recordings)
- Wavetable banks (large collections, hundreds of tables)
- DX7 / FM patch banks, sysex archives
- DBFS working buffers (patch I/O, sample import/export)
- MIPI transfer staging buffers (draw lists in transit, audio stream capture)
- Operating scratch space for the music workstation app
- Video frames for the VJ tool (Mode 4 clip storage)
At 2GB this is effectively unlimited for any musical purpose. A full multisample piano library (all keys, 8 velocity layers) might be 500MB. A complete DX7 library is a few MB. There is headroom for years of content.
SRAM Bus A — Audio DSP Dedicated
All bits are data. No parity on either tier.
On the Premium Ant64, standard SRAM chips come physically 36 bits wide (32 data + 4 parity). All 36 bits are used as data — the 4 parity bits become 4 extra data bits. On the Standard Ant64, 32-bit SRAM chips are used directly.
Audio precision by tier:
Premium — 36-bit SRAM:
S4.31format: ±15 full scale, 31 bits fractional- ~186dB dynamic range internally
- 4 integer bits = 7 bits of headroom above full scale when summing 128 voices — the mix bus cannot clip internally under any musical input
- FM phase accumulators: sub-cent accuracy across the full keyboard range
Standard — 32-bit SRAM:
S1.31format: ±1 full scale, 31 bits fractional- Same ~186dB dynamic range — fractional precision is identical
- 1 integer bit: mix bus accumulates with careful per-voice gain staging to prevent overflow; still far beyond any commercial synthesizer's internal precision
- In practice indistinguishable from the Premium tier in audio output quality
Contents of SRAM A:
- Active voice sample data (current notes playing — hot samples)
- BBD chorus delay line (2048 × 36-bit samples)
- Reverb FDN delay lines (8 lines × up to 4096 samples each)
- Physical modeling waveguide delay lines
- Granular grain buffer (current live granular window)
- FM sine lookup table (1024 × 36-bit, single-cycle access)
- Filter tanh lookup table (4096 × 36-bit, nonlinear ladder)
- BLEP correction table (NCO anti-aliasing)
- Voice envelope/LFO state registers (128 voices × ~32 values)
Every memory access in the audio DSP hot path is a single-cycle SRAM A read or write. No DRAM latency, no bus contention with the rasterizer.
SRAM Bus B — Rasterizer Dedicated
All bits are data. No parity on either tier.
The rasterizer has its own completely independent SRAM bus. The audio DSP never touches this bus and the rasterizer never touches SRAM A.
Framebuffer pixel format by tier:
Premium — 36-bit SRAM → RGB12 (3 pixels per word):
Word[35:24] = Pixel N+0 (R[11:8] G[11:8] B[11:8] = 12-bit per channel)
Word[23:12] = Pixel N+1 (12-bit RGB)
Word[11: 0] = Pixel N+2 (12-bit RGB)
4096 levels per channel → 68 billion colours internally. HDMI output dithers to 8-bit per channel. Zero banding in smooth gradients. Jog dial colour sweeps and waveform backgrounds look flawless.
Standard — 32-bit SRAM → RGB888 (1 pixel per word, 8 bits spare):
Word[31:24] = spare (alpha, stencil, or flags)
Word[23:16] = Red (8-bit, 256 levels)
Word[15: 8] = Green (8-bit, 256 levels)
Word[ 7: 0] = Blue (8-bit, 256 levels)
Standard 24-bit colour — identical to a PC monitor's native format. The 8 spare bits per pixel can carry per-pixel alpha for UI compositing. Visually excellent; only noticeable vs RGB12 in the subtlest gradients.
Framebuffer sizes — double-buffered:
| Resolution | Premium RGB12 | Standard RGB888 |
|---|---|---|
| 640×480 | 900 KB | 1.2 MB |
| 1280×720 (720p) | 2.7 MB | 3.6 MB |
| 1920×1080 (1080p) | 6.0 MB | 8.3 MB |
720p double-buffered fits comfortably in modest fast SRAM on both tiers. 1080p is achievable on both tiers with sufficient SRAM capacity.
Contents of SRAM B (both tiers):
- Framebuffer A (active display) and Framebuffer B (being rendered)
- ImGui draw list buffer A and B (double-buffered)
- Font atlas texture (BRAM preferred; SRAM B fallback)
The rasterizer reads the draw list, writes pixels to the framebuffer, and the HDMI timing generator reads the framebuffer — all on SRAM B, all simultaneously, all without touching SRAM A.
Why Three Buses Matters
The critical insight is that audio and video are completely memory-isolated on both tiers:
| Audio DSP | Rasterizer | |
|---|---|---|
| Memory bus | SRAM A (36-bit premium / 32-bit std) | SRAM B (36-bit premium / 32-bit std) |
| Access pattern | Streaming + random (samples + LUT) | Sequential (framebuffer write) |
| Clock domain | Audio clock (derived from 48kHz) | Pixel clock (~74MHz for 720p) |
| Contention | None | None |
No arbitration logic. No bus priority. No stall cycles. Both workloads run at full SRAM bandwidth simultaneously, every clock cycle, on both tiers.
DRAM holds cold data. Hot data for each workload lives in its dedicated SRAM. The result is a memory hierarchy perfectly matched to a real-time audio-visual synthesizer — and it holds whether you're building a Premium or Standard Ant64.
Tier summary:
| Feature | Ant64C (Creative) | Ant64 (Power) | Ant64S (Starter) |
|---|---|---|---|
| FPGA | GoWin 138k | GoWin 138k | GoWin 60k |
| CPU | SG2000, 512MB DDR3 | SG2000, 512MB DDR3 | SG2002, 256MB DDR3 |
| FireStorm cores | 4 | 4 | 2 |
| FireStorm instruction width | 36-bit | 36-bit | 32-bit |
| FireStorm registers | 128 (2×D0-D31 + 2×A0-A31) | 128 | 64 (2×D0-D15 + 2×A0-A15) |
| FPGA memory | 2GB | 1GB | 8MB PSRAM (embedded) |
| Bus A | 36-bit SRAM | 36-bit SRAM | PSRAM (shared) |
| Bus B | 36-bit SRAM | 36-bit SRAM | — |
| Bus C | 32-bit DDR3 | 32-bit DDR3 | — |
| Audio precision | S4.31 (36-bit) | S4.31 (36-bit) | 32-bit (managed) |
| Framebuffer format | RGB12 (3px/word) | RGB12 (3px/word) | 32-bit (RGB888) |
| Bus contention | None | None | Shared/arbitrated |
| Supervisors | Pulse + DeMon | Pulse + DeMon | 1× RP2350 |
| DIN MIDI In/Out/Thru | ✔ | — | — |
| USB MIDI | ✔ | ✔ | ✔ |
| Optical digital audio | ✔ (FireStorm) | — | — |
| Ethernet | ✔ | — | — |
| WiFi | 2.4 + 5GHz | 2.4 + 5GHz | 2.4GHz |
| Display outputs | HDMI / VGA / DisplayPort | HDMI / VGA | HDMI / VGA |
| DIN MIDI In/Out/Thru | ✔ | — | — |
Ant64S (Starter) Memory Architecture — Embedded PSRAM
The Ant64S uses a GoWin 60k FPGA with 8MB of PSRAM embedded inside the FPGA itself. There are no external memory buses on the FPGA side. Audio DSP and rasterizer share this single embedded PSRAM — bus arbitration is required, but the smaller FPGA and 8MB constraint already define the workload scope.
Embedded PSRAM characteristics:
- Integrated inside the GoWin 60k FPGA package — no external memory chips
- 8MB total — adequate for the Ant64S's audio and graphics scope
- Single shared bus: audio DSP and rasterizer time-share access
- Audio DSP gets absolute priority — rasterizer stalls rather than audio
What 8MB supports on Ant64S:
- Active voice sample data for currently playing notes
- Framebuffer (640×480 × 32-bit = 1.2MB double-buffered)
- ImGui draw list buffer (~256KB)
- BBD chorus and reverb delay lines
- FM sine LUT, filter tables
- Patch data for current instrument
Long samples and large wavetable banks are loaded from the SG2002's DDR3 (256MB) into the FPGA's PSRAM as needed, managed by the FireStorm subsystem.
FireStorm instruction width on Ant64S: 32-bit. On Ant64 and Ant64C, FireStorm instructions are 36-bit — which is why those models have 36-bit SRAM buses. On the Ant64S, 32-bit instructions mean 32-bit memory buses are sufficient and no parity bits are repurposed.
FireStorm cores on Ant64S: 2 (vs 4 on Ant64/Ant64C). Registers: 64 total (2× D0–D15 data, 2× A0–A15 address).
Ant64S is not a cut-down synthesiser. The audio engine, synthesis paradigms, and voice architecture are the same. The constraints are memory capacity and FireStorm core count — not synthesis capability. It is best thought of as the same instrument in a smaller, lower-cost FPGA with less RAM, running a subset of the most demanding configurations.
Font atlas — stored in FPGA BRAM (on-chip) as a pre-rendered 1bpp texture. FireStorm samples it in a single clock cycle without touching SRAM B at all.
Performance envelope (rasterizer): At 720p/60fps, FireStorm has ~16.7ms per frame. A complex ImGui music editor frame with 15,000 triangles × 50 cycles @ 100MHz = 7.5ms rasterization. The 1.35MB draw list + framebuffer transfers on SRAM B take ~1ms at fast SRAM speeds. Total: ~8.5ms, leaving 8ms headroom. 1080p/60fps is achievable with a faster FPGA clock.
CPU ↔ FireStorm Interconnect: Two Buses, Two Roles
The SG2002 big core communicates with FireStorm over two complementary buses — a low-latency control plane and a high-bandwidth data plane.
Bus 1 — QSPI in NOR Addressing Mode (control plane)
FireStorm presents itself to the CPU as a NOR flash device over QSPI. The SG2002's existing QSPI controller talks directly to FireStorm using the standard NOR command set — no custom driver needed. From C++ the FPGA address space is just a pointer:
volatile uint32_t* fpga = (volatile uint32_t*)FPGA_QSPI_BASE;
// Write a voice parameter — a single memory write, ~100ns:
fpga[VOICE_REG(voice, PARAM_CUTOFF)] = cutoff_value;
// Signal new draw list ready:
fpga[DRAWLIST_PTR] = (uint32_t)drawlist_addr;
fpga[DRAWLIST_LEN] = (uint32_t)drawlist_bytes;
fpga[DRAWLIST_CMD] = RENDER_GO;
// Read DSP status:
uint32_t env = fpga[DSP_STATUS(voice, ENV1_LEVEL)];
QSPI NOR address map (FireStorm internal):
FPGA_BASE + 0x00000 Voice register bank (128 voices × 64 regs = 32KB)
FPGA_BASE + 0x08000 Global synth registers (tempo, FX, routing = 4KB)
FPGA_BASE + 0x09000 DSP status readback, read-only (4KB)
FPGA_BASE + 0x0A000 ImGui draw list buffer A (256KB)
FPGA_BASE + 0x4A000 ImGui draw list buffer B (256KB, double-buffered)
FPGA_BASE + 0x8A000 Rasterizer control registers (256 bytes)
QSPI bandwidth: ~40 MB/s. Adequate for register writes and small transfers. A single voice parameter write is ~100ns. Real-time jog dial events at 100/sec consume ~10µs/sec of bus time — negligible.
QSPI is the control plane — low latency, always-on, zero overhead per write.
Bus 2 — Dual 4-Lane MIPI D-PHY (data plane, future)
A dual 4-lane + clock MIPI D-PHY link between the SG2002 and FireStorm, running simultaneously in both directions. The SG2002's existing MIPI CSI/DSI PHYs are repurposed as a general-purpose high-speed serial link.
SG2002 FireStorm
MIPI TX (4 lane + clk) ─────────► MIPI D-PHY RX ~500 MB/s CPU→FPGA
MIPI RX (4 lane + clk) ◄───────── MIPI D-PHY TX ~500 MB/s FPGA→CPU
Full duplex ~1 GB/s aggregate — 12× QSPI in each direction simultaneously.
CPU → FPGA (MIPI link A, ~500 MB/s):
| Transfer | Size | Time |
|---|---|---|
| ImGui draw list (typical) | 150 KB | 0.3 ms |
| ImGui draw list (complex) | 300 KB | 0.6 ms |
| Full 1080p framebuffer | 4 MB | 8 ms |
| Wavetable bank (128 tables) | 2 MB | 4 ms |
| 1s stereo 48kHz sample | 192 KB | 0.4 ms |
| DX7 sysex patch bank | 4 KB | instant |
Large sample banks load mid-performance in milliseconds. 1080p UI becomes viable — the 4MB framebuffer transfers in 8ms, leaving 8ms for rasterization within a 16.7ms frame budget.
FPGA → CPU (MIPI link B, ~500 MB/s):
| Stream | Purpose |
|---|---|
| Mixed stereo audio (384 KB/s) | Lossless resample-own-output — no ADC, no analog roundtrip |
| Raw waveform, all 128 voices | Oscilloscope / FFT display at full resolution |
| DSP state bulk (4 KB/frame) | All envelope/LFO values for rich UI animation |
| ADC input stream | Live sample capture at full rate |
| Framebuffer readback | Screenshot, video stream, network output |
The audio stream back to CPU is the key feature here — FireStorm streams the mixed stereo output to the CPU at 384 KB/s (trivial on a 500 MB/s link). The resample function captures this directly to DRAM with zero quality loss: no DAC, no ADC, no analog stage, no quantisation error. It is a perfect digital copy.
Two buses, two roles — summary:
| QSPI | MIPI | |
|---|---|---|
| Bandwidth | ~40 MB/s | ~500 MB/s each direction |
| Duplex | Half | Full simultaneous |
| Purpose | Control plane | Data plane |
| Use | Register writes, render trigger | Draw lists, samples, audio stream |
| Latency | ~100ns per write | Buffer latency (ms range) |
| Status | Current | Future |
Pulse ↔ FireStorm interconnect:
Pulse (RP2350) connects to FireStorm via two independent interfaces:
-
QSPI bus — control plane, same NOR addressing mode as the SG2002 big core. Pulse reads/writes FireStorm registers directly: voice parameters, audio engine control, sequencer state. Low latency, memory-mapped from Pulse's perspective.
-
4-line PIO fast transfer bus — four GPIO lines running custom RP2350 PIO state machines, forming a dedicated high-bandwidth data path between Pulse and FireStorm entirely separate from QSPI.
At RP2350 PIO speeds with 4 parallel lines, this bus can sustain well over 100 MB/s with a tight custom protocol — far beyond what QSPI can offer for streaming workloads. This is the data plane between Pulse and FireStorm, complementing QSPI's role as the control plane.
What the 4-line bus carries:
- Live sample data from FireStorm's ADC input → FireStorm SRAM (Pulse triggers and controls the capture; FireStorm's codec is the source)
- Bulk voice parameter updates when many voices change simultaneously (e.g. patch change affecting all 128 voices at once)
- Sequencer event bursts — dense note-on/off blocks at the start of a bar where QSPI's per-transaction overhead would add up
- Live sampling data — ADC capture streamed from Pulse → FireStorm SRAM at full sample rate without CPU involvement
- Wavetable or sample block transfers from Pulse's PSRAM → FireStorm SRAM
The PIO protocol is fully defined in Pulse firmware — the RP2350 PIO state machine handles the bit-level framing. FireStorm implements the receiver logic in FPGA fabric. Both sides are deterministic and clock-accurate.
Pulse responsibilities:
- MIDI receive/transmit (DIN on Ant64C, USB MIDI all models)
- Jog dial quadrature decode (encoder events → SG2002 big core via mailbox)
- RGB LED state (WS2812B chain, DMA-driven)
- Joypad / controller input (USB host)
- Audio sequencer (pattern playback, timing, voice triggering via QSPI + PIO to FireStorm)
- Retro speech synthesis — SAM-style formant speech engine (see below)
Note: audio codec (WM8958/WM8960) is on FireStorm directly, not Pulse. Pulse has no audio I/O path — it is purely a control and sequencing processor. Speech audio is rendered by Pulse and streamed to FireStorm for output.
Trigger / CV Inputs — 4× 3.5mm TS Jacks
Four trigger/CV input ports on the jog dial controller (the Rotary satellite MCU, an ATtiny series AVR connected to Pulse via I2C). The jog dial controller runs at 5V natively, making 5V-compatible inputs straightforward.
Connector: 3.5mm TS mono jack — industry standard. Compatible with:
- Eurorack gate/trigger (0–10V)
- Teenage Engineering sync cables
- Roland/Korg sync
- Drum machine trigger outputs
- Footswitches
- Any CV gate source made in the last 40 years
Per-port pinout:
| Pin | Signal |
|---|---|
| Tip | Input signal |
| Sleeve | GND |
| (separate pin on PCB) | 5V out — powers passive sensors, LEDs, contact closures |
Protection circuit per input — NPN transistor buffer:
External signal (0–12V supported)
│
[R1 10kΩ] ── NPN base (2N3904 / BC547, ~3p each)
│ │
Collector Emitter ── GND
│
[R2 10kΩ] ── 5V
│
ATtiny GPIO ── [100nF cap to GND]
- R1 limits base current — safe at 0–12V input (Eurorack 10V gate: 1mA, well within ratings)
- Transistor saturates on high input → GPIO pulled LOW (inverted in firmware)
- MCU GPIO only ever sees 0 or 5V — fully isolated from external voltage
- Below ~0.6V threshold → transistor off → GPIO HIGH — excellent noise immunity
- 100nF cap provides hardware debounce for footswitches
The ATtiny's ADC mode (not just digital) gives:
- Configurable threshold detection in firmware
- Rough velocity from trigger slope (crossing speed)
- Input 3 or 4 can optionally act as 0–5V CV input for pitch/filter/parameter modulation, not just gates — configurable per port in AntOS
BOM per port: 2 resistors + 1 transistor + 1 capacitor + 1 jack = ~£0.15 Total for 4 ports: ~£0.60 additional component cost
Firmware routing (configurable in AntOS per port):
| Function | Description |
|---|---|
| Sidechain trigger | Duck specified voices on trigger — kick pumping effect |
| Secondary sidechain | Independent sidechain for snare gating, etc. |
| External clock / sync | Replace internal BPM — lock to drum machine, Eurorack |
| Note trigger | Trigger a specific voice or sequencer step |
| Punch-in effect | Activate KO II-style effect on trigger |
| Scene commit | Trigger a scene advance — hands-free live performance |
| Record arm | Footswitch to arm/disarm sampling without touching the machine |
| CV gate | 0–5V modulation → any FireStorm parameter via mod matrix |
Default suggested mapping:
- Input 1 → Sidechain A (kick)
- Input 2 → Sidechain B (snare / clap)
- Input 3 → External clock sync
- Input 4 → Footswitch / record arm
All remappable in AntOS.
Pulse Speech Synthesizer — SAM Style
Pulse runs a retro formant speech synthesizer in the tradition of S.A.M. (Software Automatic Mouth, 1982) — the iconic robotic voice of the Commodore 64, Atari 8-bit, and Apple II era. The RP2350 at 300MHz has enormous headroom for this alongside its sequencing duties; the original SAM ran on a 1MHz 6502.
What SAM-style synthesis is:
SAM converts text to speech using two stages:
Text input
│
▼
Reciter (text-to-phoneme)
Text → phoneme string using English pronunciation rules
e.g. "HELLO" → /HH EH L OW/
│
▼
Phoneme-to-speech (formant synthesis)
Each phoneme → three formant frequencies (F1, F2, F3)
→ three formant amplitudes (A1, A2, A3)
→ pitch contour
→ voiced/unvoiced classification
Combined via additive synthesis → audio samples
│
▼
PCM audio buffer → streamed to FireStorm via PIO fast transfer bus
The formant synthesis produces the characteristic 1980s robotic voice — not because the algorithm is simple, but because it is computing speech with very limited resolution (8-bit samples, low sample rate) in a deliberately constrained way. On the Ant64, running at full RP2350 speed with the output going to the WM8958 codec, the same algorithm produces a much cleaner sound — authentically SAM-like but with modern fidelity.
Implementation on Pulse:
The open-source C port of SAM (s-macke/SAM, ~39KB) runs directly on the RP2350. No OS needed — a bare-metal C function call. The pipeline:
- AntOS or the music app sends a text string to Pulse via SPI/UART
- Pulse's Reciter converts text → phoneme string
- Phoneme engine renders PCM samples into a buffer in Pulse's PSRAM
- Completed buffer streamed to FireStorm via the 4-line PIO fast transfer bus
- FireStorm mixes speech into the audio output alongside music voices
- Result: speech comes out of the internal speakers, phono out, HDMI, or optical
Parameters (matching original SAM):
| Parameter | Range | Effect |
|---|---|---|
| Speed | 0–255 | Talking rate (72 = normal SAM) |
| Pitch | 0–255 | Voice fundamental frequency (64 = normal) |
| Throat | 0–255 | Formant F1 shaping — voice quality (128 = normal) |
| Mouth | 0–255 | Formant F2 shaping — vowel colour (128 = normal) |
Preset voices (matching original SAM manual):
| Voice | Speed | Pitch | Throat | Mouth |
|---|---|---|---|---|
| SAM (default) | 72 | 64 | 128 | 128 |
| Elf | 72 | 64 | 110 | 160 |
| Little Robot | 92 | 60 | 190 | 190 |
| Stuffy Guy | 82 | 72 | 110 | 105 |
| Little Old Lady | 82 | 32 | 145 | 145 |
| Extra-Terrestrial | 100 | 64 | 150 | 200 |
Musical uses beyond basic speech:
SAM can sing — phoneme durations and pitch can be driven by the sequencer, turning the speech engine into a vocal synthesizer:
- Vocoder-style leads: sequence phoneme events from the pattern sequencer, drive pitch from MIDI note data, get a singing robot voice
- Arpeggio speech: rapid phoneme triggering at musical intervals
- Daft Punk style: pitch-shifted formant speech over a chord progression
- System announcements: AntOS can use Pulse speech for boot messages, error alerts, patch names spoken aloud — authentic 8-bit computer character
- Live performance: jog dials control Speed/Pitch/Throat/Mouth in real time while the sequencer drives the phoneme stream
Extension beyond SAM:
The RP2350 has far more processing power than a 6502. The speech engine can be extended beyond the original SAM algorithm:
- Higher sample rate output (original SAM: ~7kHz; Pulse can do 48kHz)
- Smoother formant interpolation between phonemes
- Additional phoneme sets (non-English languages)
- Pitch envelope per phoneme (more natural intonation)
- The SG2000's TPU could assist with neural TTS for more natural speech, while Pulse handles the retro formant mode independently
IPC — CPU ↔ Pulse:
- Jog dial events: Pulse sends encoder events to SG2002 big core via mailbox
- MIDI events: Pulse forwards to big core for sequencer and UI feedback
- LED state: big core writes to Pulse LED buffer (RGB for jog dials)
IPC — Big core ↔ Little core (AntOS):
- Hardware mailbox registers: memory-mapped, interrupt-driven, non-blocking
- App → AntOS: DBFS load/save requests, sample load, MIDI route config
- AntOS → App: patch data ready, jog dial events, MIDI events, system events
Consistent UI conventions (all pages):
- Dial 1/2 = cursor X/Y navigation without mouse
- Dial 3–8 = context-sensitive parameter adjustment for current selection
- Any dial push = confirm / select / toggle
- Mouse primary for freehand drawing (waveform, harmonic sliders)
- MIDI keyboard = note input in Page R piano roll
Retro aesthetic option:
Green-on-black phosphor rendering mode, selectable per page. Implemented as an ImGui
style override — background #001400, foreground #00FF41, custom draw list colours.
Not a gimmick: the Fairlight's visual identity is inseparable from its cultural impact,
and having it as an option is a genuine homage. The music app ships with both a modern
colour theme and the classic phosphor theme.
Competitive Comparison and Cost Analysis
What Compares to the Ant64?
The honest answer is: nothing currently in production does. The Ant64 occupies a category that has no living occupant. To understand why, it helps to look at what each existing product does well, then see how many of them you would need to buy to match the Ant64's combined capability.
The Closest Competitors — and Their Gaps
Waldorf Kyra (FPGA, 128 voices, VA+WT)
The closest thing in voice count and FPGA architecture. Discontinued August 2023. Was the only other FPGA-based synthesizer with 128+ voices at production scale.
What it had: 128 voices, 8-part multitimbral, excellent VA sound, solid build. What it lacked: No FM, no sampling, no computer, no video, no speech synthesis, no DIN MIDI (desktop module only), poor software support that ultimately killed it, no open platform. Retail was ~€1,600–2,000 and is now gone.
Waldorf Quantum MK2 (hybrid, 16 voices)
The previous gold standard for synthesis depth per voice. Discontinued April 2025.
What it had: Excellent hybrid architecture, granular sampling, wavetable, live audio input, polyphonic aftertouch. What it lacked: Only 16 voices, no FM, no video, no computer, no DIN MIDI, closed platform, ~€4,800.
Sequential Prophet X (hybrid, 16 voices, ~€3,800)
Best-in-class sample+synthesis combination, excellent analog filters.
What it had: 150GB sample library, analog filters per voice, solid build quality. What it lacked: No live audio input (USB only for samples), 16 voices, no FM, no video, no computer, no speech synthesis, no open platform.
Access Virus TI2 (digital VA, 80 voices, ~€3,000)
The long-reigning high-polyphony VA synthesizer. Discontinued but still used widely.
What it had: 80 voices, excellent VA engine, solid FM-like features. What it lacked: No sampling, no video, no computer, no speech synthesis, closed.
Fairlight CMI (historical, 1979–1985)
The only machine that historically combined sampling, synthesis, sequencing, and visual editing in a single system with a comparable philosophy.
What it had: Everything the Ant64 draws inspiration from — visual waveform drawing, graphical sequencer (Page R), sampling, synthesis, the light pen interface. What it cost: £20,000–50,000 at launch (equivalent to £150,000–400,000 today). What it lacked vs Ant64: FM synthesis, 128+ voice polyphony, MIDI (early models), optical audio, retro speech synth, RGB control surface, open hackable platform.
The Stack You Would Need to Match Ant64C
| Ant64C capability | Closest equivalent | Price (2026) |
|---|---|---|
| 128-voice VA+WT synth | Waldorf Kyra (discontinued) | ~€1,600 used |
| Full FM synthesis (6-op+) | Yamaha Montage M / Modx+ | ~€2,000+ |
| Live sampling + S&S engine | Sequential Prophet X | ~€3,800 |
| Granular synthesis | Waldorf Quantum MK2 (discont.) | ~€3,500 used |
| DIN MIDI In/Out/Thru | iConnectivity mioXL hub | ~€400 |
| Multi-track hardware sequencer | Squarp Pyramid MK3 | ~€700 |
| 303 acid engine | Roland TB-03 | ~€350 |
| Video synthesizer / visualiser | Critter & Guitari EYESY | ~€500 |
| Fairlight-style visual editor | Nothing available | — |
| Retro speech synthesizer | Nothing available | — |
| Home computer (AntOS, coding) | Raspberry Pi 400 | ~€70 |
| Internal stereo speakers | External monitors | ~€100 |
| Optical audio out | External DAC | ~€100 |
| Total | 10+ separate devices | ~€13,000+ |
And that stack still doesn't give you: all synthesis engines layerable per voice, the RGB jog dial performance surface, DX7 sysex import, Fairlight-style waveform drawing, the open FPGA bitstream, the FireStorm EE custom execution engine, or the unified AntOS operating system tying everything together.
The Waldorf Kyra Is the Most Direct Point of Comparison
The Kyra was ~€1,600 at clearance for a 128-voice FPGA synth with no computer, no FM, no sampling, no video, no speech synthesis, and no active development. It was discontinued because it never fulfilled its potential — the FPGA was capable of far more than Waldorf ever shipped.
The Ant64 targets everything the Kyra had plus everything it never got plus things nobody has attempted in a single product. It is also open — the FPGA bitstream, the FireStorm EE ISA, AntOS — none of it is locked down. The community can add synthesis engines. The Kyra community could only petition Waldorf for updates and wait.
Why No One Has Done This Before
The combination that makes the Ant64 unique has been technically possible for several years but has not been commercially attempted because:
-
Commercial synth companies fear cannibalising existing product lines — a truly open, hackable synthesizer that does everything undermines upsell.
-
FPGA expertise is rare in music hardware companies — most synth makers use DSP chips or CPUs, not FPGAs. GoWin making competitive 138k FPGAs at accessible prices is relatively recent.
-
The home computer + synthesizer combination was last attempted in the 1980s (C64, Atari ST, Amiga) and the paradigm was abandoned as PCs and DAWs took over. Nobody has attempted to revive it with modern silicon until now.
-
Video + audio integration in a single instrument has simply never been productised for musicians. LZX makes video synths for visual artists. Synth companies make audio synths for musicians. Nobody built the bridge.