Ant64 Audio System — More information...

Vision

The Ant64 audio section should match and exceed the specification of the best synthesizers currently available (reference: Waldorf Quantum MK2, Sequential Prophet X), while being capable of authentic analog synthesis emulation from an entirely digital signal path, with an optional analog output/character stage. All DSP runs in the FireStorm using fixed-point arithmetic exclusively — no floating point anywhere in the signal path.


Competitive Reference — State of the Art (2025/26)

Synth Voices Osc/voice Synthesis types Filters/voice Analog?
Waldorf Quantum MK2 16 3 (×5 algos) WT, VA, Granular, Resonator, Kernel/FM 2× analog 24/12dB LP + digital SVF Hybrid
Sequential Prophet X 16 4 (2 VA + 2 sample) VA + sample Stereo analog LP Hybrid
Sequential Prophet-10 10 2 VA (subtractive) Analog LP Full analog
Moog One 8/16 3 VA (subtractive) Analog LP/HP/Notch Full analog
Access Virus TI2 80 3 VA + WT + FM Digital multimode Full digital
Waldorf Kyra (FPGA) 128 10 VA + WT Digital 12/24dB LP/BP/HP Full digital

The Kyra is the key benchmark — a commercially shipped pure-FPGA synth proving 128 voices with 10 oscillators each is achievable in real hardware. Ant64 FireStorm targets this as a minimum, not a ceiling.

Ant64 targets:

  • 128 voices minimum for VA and FM engines (matching Kyra, far exceeding all others)
  • 256 voices for sample playback and simpler FM configurations
  • Three first-class synthesis paradigms — Analog Style, Digital Sample, FM/OP
  • All three available simultaneously and mixable per patch
  • Fixed-point FPGA DSP throughout, optional analog output character stage

Three Synthesis Paradigms

The Ant64 treats synthesis as three equal, first-class paradigms — not one engine with bolt-on extras. Any voice slot can run any engine. Patches can layer all three simultaneously.

┌─────────────────┐  ┌─────────────────┐   ┌─────────────────┐
│   PARADIGM 1    │  │   PARADIGM 2    │   │   PARADIGM 3    │
│                 │  │                 │   │                 │
│  ANALOG STYLE   │  │ DIGITAL SAMPLE  │   │   YAMAHA / FM   │
│  (VA / Subtr.)  │  │   (S&S / PCM)   │   │  (OP-based)     │
│                 │  │                 │   │                 │
│ Juno · Moog     │  │ Korg M1 · JD800 │   │ DX7 · TX81Z     │
│ Prophet · 303   │  │ ROMpler · S&S   │   │ OPL · OPN       │
│ M-86 · Hoover   │  │ Piano · Strings │   │ FM bass · EP    │
│ 128 voices      │  │ 256 voices      │   │ 128 voices      │
└────────┬────────┘  └────────┬────────┘   └────────┬────────┘
         └───────────────────►▼◄────────────────────┘
                      Voice Mix Bus (S32)
                   Global FX Chain · Output Stage

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    Pulse RP2350                         │
│  MIDI in/out · patch management · parameter control     │
│  voice allocation · sequencer · jog dials               │
└───────────────────────┬─────────────────────────────────┘
                        │ QSPI + 4-line PIO fast transfer
┌───────────────────────▼─────────────────────────────────┐
│            FireStorm (GoWin 138k / 60k)                 │
│                                                         │
│  Time-multiplexed DSP pipeline — 128–256 voices         │
│  (one physical pipeline serves all voices sequentially) │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │  Voice Mix Bus (S36, all voices accumulated)     │   │
│  └──────────────────────┬───────────────────────────┘   │
│                         │                               │
│  ┌──────────────────────▼───────────────────────────┐   │
│  │  Global FX chain (Reverb · EQ · Compressor)      │   │
│  └──────────────────────┬───────────────────────────┘   │
└─────────────────────────┼───────────────────────────────┘
                          │ Stereo S24 PCM → WM8958/60 codec
          ┌───────────────▼─────────────────┐
          │  Optional Analog Stage          │
          │  (character · saturation · VCF) │
          └───────────────┬─────────────────┘
                          │ Line out / headphones / HDMI / optical

FireStorm Clock Architecture & Voice Budget

Clock Domains

The GoWin GW5AT-138 is fabricated on TSMC 22nm ULP process. It contains several independent clock domains fed from on-chip PLLs:

Domain Clock Source Notes
BSRAM 380MHz Hard silicon spec Dedicated hard memory blocks — not constrained by fabric routing
Fabric ~200MHz (target) PLL → synthesis result Programmable LUT/FF logic — speed depends on critical path
DSP blocks ~300MHz+ Hard silicon Dedicated multiply-accumulate blocks, independent of fabric
Pixel clock 74.25MHz (720p) · 148.5MHz (1080p) PLL HDMI timing generation
Audio clock 12.288MHz (derived) PLL 48kHz × 256 = standard I²S master clock

The fabric clock is not a fixed silicon spec — it is the maximum frequency at which the synthesised logic meets timing after place-and-route. A deeply pipelined design with short combinational paths achieves a higher fabric clock than one with long unbroken logic chains.

Fabric Clock Estimation — Derived from BSRAM

The BSRAM:fabric clock ratio on comparable FPGAs:

Device Process BSRAM Fabric (complex design) Ratio
Xilinx Artix-7 28nm ~600MHz 250–350MHz 1.7–2.4×
Intel Cyclone 10 GX 20nm ~600MHz 200–350MHz 1.7–3×
GoWin GW2A 55nm ~200MHz ~150MHz ~1.3×
GoWin GW5AT (this device) 22nm 380MHz ~200–250MHz ~1.5–1.9×

For a complex pipelined design combining an audio DSP engine and a 2D rasterizer, a ratio of 1.5–1.9× is typical. This gives a fabric clock estimate of:

  Conservative target:  200MHz  (ratio 1.9×)
  Optimistic target:    250MHz  (ratio 1.5× — achievable with careful timing closure)

200MHz is used as the design target throughout this document. All voice counts and performance figures are stated at 200MHz. 250MHz figures are noted where relevant as the optimistic case.

A key property of the 380MHz BSRAM is that at 200MHz fabric:

  1 fabric clock period  =  5.0ns
  1 BSRAM clock period   =  2.63ns

  BSRAM cycles per fabric cycle:  5.0 / 2.63 = 1.9×

A BSRAM read issued at one fabric clock edge returns its result before the next fabric clock edge. BSRAM reads in the audio DSP pipeline are effectively zero-stall — the tanh lookup table, wavetable samples, BLEP correction table, and BBD delay taps are all available within a single fabric pipeline stage.


Cycle Budget — 200MHz at 48kHz

  Fabric clock:            200,000,000 Hz
  Audio sample rate:            48,000 Hz
  Cycles per sample period:      4,167 cycles

  Allocation:
  ┌────────────────────────────────────────┬───────────────────┐
  │  Voice pipeline (all engines)          │  ~3,700 cycles    │
  │  Post-mix global effects               │    ~385 cycles    │
  │  Control / state management            │     ~82 cycles    │
  │                                        │                   │
  │  TOTAL                                 │   4,167 cycles    │
  └────────────────────────────────────────┴───────────────────┘

  Rasterizer: runs on SEPARATE fabric resources and SRAM B simultaneously.
  Does not consume any of the above 4,167 cycles.

Voice Pipeline — Cycle Cost Per Voice

The audio engine is time-multiplexed: one physical DSP pipeline is shared across all voices. Each voice is processed in sequence within a single sample period. The pipeline is fully pipelined — a new voice enters each clock cycle, independent of other voices (no inter-voice data dependency).

VA Voice (2 oscillators, ladder filter 4-pole, SVF, BBD, VCA, 2 envelopes, 2 LFOs)

  Stage                                  Cycles   Notes
  ─────────────────────────────────────────────────────────────────
  Load voice state from BSRAM                1    380MHz BSRAM — returns within 1 fabric cycle
  Phase accumulator × 2 oscillators          2    Parallel
  Wavetable reads × 2 (BSRAM)                2    Parallel, BLEP table also read here
  Oscillator mix                             1
  Ladder pole 1 (multiply + tanh + sum)      3    tanh → BSRAM LUT, 1 cycle read
  Ladder pole 2                              3    Sequentially dependent on pole 1
  Ladder pole 3                              3    Sequentially dependent on pole 2
  Ladder pole 4                              3    Sequentially dependent on pole 3
  SVF (2 integrators, parallel)              3
  BBD delay tap read + wet/dry mix           1    BSRAM, zero-stall
  VCA                                        1
  Envelope ADSR update                       2
  LFO update                                 1    Can overlap with envelope
  Write state back to BSRAM + BBD write      1    Pipelined, no stall
  Mix bus accumulate                         1
  ─────────────────────────────────────────────────────────────────
  TOTAL                                     29 cycles per VA voice

The four ladder poles are the critical serialised path — each pole's output feeds the next pole's input within the same sample, so they cannot be parallelised. This is the physics of the ladder filter topology and applies to any correct implementation. The tanh nonlinearity is a BSRAM lookup at 380MHz — it adds only 1 cycle, not a multi-cycle stall as it would be with a computed approximation.

FM Voice (6-operator, DX7-compatible algorithms with partial branch parallelism)

  Stage                                  Cycles   Notes
  ─────────────────────────────────────────────────────────────────
  Phase accumulator × 6 operators           12    2 cycles each
  Sine LUT × 6 (BSRAM)                       6    1 cycle each, pipelined
  Modulation apply + level × 6              12    2 cycles each
  Parallel branch reduction (algorithm)      6    Some algorithms allow 2 ops simultaneously
  Output routing + carrier sum               3
  Envelope per operator × 6                  6    Overlap with pipeline stages above
  ─────────────────────────────────────────────────────────────────
  TOTAL (with ~30% parallelism saving)     ~28 cycles per 6-op FM voice

DX7 algorithms with parallel carrier branches (e.g. algorithms 1, 2, 5, 6) allow two operators to be computed simultaneously, reducing effective cycle count. The 28-cycle figure reflects this average across all 32 algorithms.

Sample Voice (S&S playback with linear interpolation)

  Stage                                  Cycles   Notes
  ─────────────────────────────────────────────────────────────────
  Sample read × 2 (for interpolation)        2    BSRAM or SRAM A depending on sample size
  Linear interpolation (multiply + add)      2
  Loop boundary check + wrap                 1
  Stereo pan matrix (2 multiplies)           2
  VCA                                        1
  ─────────────────────────────────────────────────────────────────
  TOTAL                                      7 cycles per sample voice

Voice Count Derivation

Available cycles for voice pipeline: ~3,700 (leaving ~467 for effects + control)

Engine Cycles/voice Voices at 200MHz Voices at 250MHz
VA (full: 2 osc, ladder+SVF, BBD) 29 128 192
VA (optimised timing closure) 29 192 256
FM 6-operator 28 128 192
FM 4-operator ~19 192 256
Sample playback 7 512 512

128 VA voices is the conservative, guaranteed target at 200MHz — matching the Waldorf Kyra, the only comparable shipped FPGA synthesiser, but with a far more complex per-voice signal chain (nonlinear ladder + SVF + BBD vs the Kyra's simpler digital filter). 192 VA voices is achievable with careful timing closure and pipeline optimisation at 200MHz, and 256 at 250MHz.

The Kyra achieved 128 voices × 10 oscillators on a Xilinx Artix-7 (28nm) with a simpler filter architecture. The FireStorm runs on 22nm silicon with a faster BSRAM and a deeper per-voice chain. The comparison is favourable.

Mixed-Engine Voice Allocation

When multiple engines run simultaneously, the cycle budget is shared:

  At 200MHz (3,700 cycle voice budget):

  128 VA (29×128 = 3,712) ............. full budget — VA only
   96 VA + 48 FM (2,784 + 1,344)  ..... 4,128 — slightly over, needs optimisation
   96 VA + 32 FM + 64 sample .......... 2,784 + 896 + 448 = 4,128 — same
   64 VA + 64 FM + 64 sample .......... 1,856 + 1,792 + 448 = 4,096 — comfortable

  At 250MHz (4,800 cycle voice budget):

  192 VA ................................ 5,568 — exceeds, use 160 VA at 250MHz
  128 VA + 64 FM + 64 sample .......... 3,712 + 1,792 + 448 = 5,952 — use 128+48+64
  128 VA + 48 FM + 64 sample .......... 3,712 + 1,344 + 448 = 5,504 — comfortable

The voice allocator in the workstation app manages the cycle budget dynamically. Patches declare their engine type; the allocator enforces the total cycle budget and notifies the musician if a configuration exceeds capacity.

Practical default targets (stated in marketing/spec sheet):

Spec Value Basis
VA voices 128 Guaranteed at 200MHz, conservative
FM voices (6-op) 128 Guaranteed at 200MHz
Sample voices 512 Comfortable at 200MHz
Mixed simultaneous up to 192 Depends on engine mix — allocator managed
At optimised 250MHz up to 256 All engines, with timing closure

Post-Mix Global Effects — Cycle Budget

Global effects run after all voice outputs are summed to a stereo bus. They are not per-voice — the cycle cost is the same regardless of voice count.

  Effect                                    Cycles    Notes
  ──────────────────────────────────────────────────────────────────
  FDN Reverb (8×8 feedback delay network,
    8 delay lines, full matrix multiply)       300    Largest single effect
  4-band parametric EQ (biquad × 4)             40    Stereo — 2 channels × 4 × 5 muls
  Stereo compressor / limiter                   30
  STFT analysis hop amortised                   15    512-pt FFT ÷ 512 samples = ~15/sample
  Mix bus saturation (soft clip)                 5
  ──────────────────────────────────────────────────────────────────
  TOTAL                                        390 cycles   (9.4% of 4,167 sample budget)

The effects budget is trivially small relative to the total. The reverb, EQ, and compressor together consume fewer cycles than processing two VA voices.


Rasterizer Performance

The rasteriser runs on completely independent fabric and SRAM B — orthogonal to the audio DSP pipeline. It consumes none of the 4,167-cycle audio sample budget. Full rasteriser performance figures (triangles/frame, cycle costs, spectrogram render timing) are documented in the Display Architecture reference.


BSRAM Capacity Allocation

GoWin GW5AT-138 total BSRAM: 6.3Mb = ~787KB

Use Size Notes
Voice state (256 VA voices × 63 bytes) 16KB Phases, filter states, envelopes, LFOs
Wavetable (256pt × 16-bit) 512B One cycle per waveform — multiple stored
BLEP correction table (1024 × 16-bit) 2KB Anti-aliasing correction per oscillator
tanh lookup (4096 × 36-bit) 18KB Nonlinear ladder saturation
Sine LUT for FM (4096 × 16-bit) 8KB FM operator sine generation
Font atlas (128 glyphs × 16×16 × 1-bit) 4KB ImGui text rendering
Colour LUT for spectrogram (256 × 3B) 768B Viridis / Inferno / etc. — swap at runtime
Scratch / intermediate ~30KB Pipeline staging, STFT twiddle factors
Total used ~79KB 10% of available BSRAM

BSRAM usage is light — 90% is available for additional tables, larger LUTs, expanded wavetable sets, or future features. The BBD delay lines are intentionally kept in SRAM A (dedicated 36-bit audio bus) rather than BSRAM, because 128 voices × 25ms stereo delay = 13.8MB, which far exceeds the 787KB BSRAM ceiling. SRAM A handles this cleanly with no bus contention.


Per-Voice Engine

Each voice runs time-multiplexed on a shared fixed-point DSP pipeline in FireStorm. At 200MHz with 128 VA voices, each voice gets approximately 29 pipeline stages per sample period — all within the 4,167-cycle budget at 48kHz.

Oscillator Block (3 per voice)

Each oscillator independently selects its synthesis algorithm:

1. Virtual Analog (VA) — subtractive, Juno/Prophet/Moog territory

  • Phase accumulator: U32 tuning word (gives ~0.01 cent resolution at 48 kHz)
  • Waveforms: sawtooth, PWM-sawtooth (Alpha Juno style — essential for M-86/hoover), pulse, triangle, sine, square, sub-octave
  • BLEP anti-aliasing via pre-computed correction table in BRAM
  • Hard sync between oscillators
  • Cross-modulation (oscillator 1 FM-ing oscillator 2)

2. Wavetable — PPG/Waldorf territory

  • 128-step wavetables, up to 128 tables per bank
  • Smooth interpolation between table positions (bilinear, S24 fixed-point)
  • Position sweepable by LFO, envelope, mod matrix, or MIDI note
  • Custom wavetable upload from AntOS via DBFS

3. FM — Yamaha DX7 and beyond

  • Up to 6 operators per oscillator slot (configurable algorithm routing)
  • S24.8 fixed-point for operator levels
  • All classic DX7 algorithms plus free routing mode
  • Operator envelopes: 4-stage (matches DX7 rate/level model)

4. Granular

  • Circular sample buffer (up to 4 seconds at 48 kHz, stored in FPGA BRAM or external SRAM)
  • Per-grain: position scatter, pitch scatter, duration, envelope shape, pan
  • Grain density: 1–200 grains/second
  • Live input granularisation supported

5. Physical Modeling

  • Karplus-Strong string model (delay line in BRAM + one-pole LP damping filter)
  • Waveguide wind/reed model
  • Suitable for plucked strings, bowed strings, blown tubes

6. Additive

  • 64 partials per voice, each with independent amplitude/frequency envelope
  • IFFT resynthesis path (optional, higher latency)

Filter Block (2 per voice)

Two fully independent filter units per voice, series or parallel routing:

Filter 1 — Nonlinear Ladder (Moog/Roland character)

  • Huovilainen model: 4× cascaded one-pole LP stages with nonlinear tanh feedback
  • tanh implemented as 12-bit addressed BRAM lookup table (pre-computed, S1.15 format)
  • Modes: 24 dB/oct LP · 12 dB/oct LP · 18 dB/oct LP
  • Full resonance to self-oscillation
  • Input: S24, internal state: S32 with careful scaling to prevent overflow

Filter 2 — State Variable Filter (SVF)

  • Simultaneous LP / HP / BP / Notch outputs selectable per voice
  • 12 or 24 dB/oct
  • Suitable for formant filtering, comb filter mode, phaser stages
  • Topology: Chamberlin two-integrator loop, S32 fixed-point

Filter routing

  • Serial: F1 → F2 (4-pole into SVF — very flexible)
  • Parallel: F1 + F2 mixed (phase cancellation effects possible)
  • F2 only, F1 only

Envelope Generators (4 per voice)

Independent assignment to any modulation destination:

  • ENV1 — standard ADSR with optional hold segment
  • ENV2 — ADSR (typically filter)
  • ENV3 — ADSR (free assign)
  • ENV4 — Multi-stage loopable (up to 8 segments, Waldorf-style)

Each envelope is exponential (not linear) for musical response. Implemented as a multiply-accumulate with a per-segment decay constant. Rate responds to keyboard tracking (higher notes = faster envelopes, like real analog).


LFO Block (4 per voice + 2 global)

Per-voice LFOs (4):

  • Waveforms: sine, triangle, sawtooth, reverse saw, square, sample & hold, smoothed S&H
  • Rate: 0.01 Hz – 20 kHz (audio-rate modulation supported)
  • Sync: free, tempo-sync, note-triggered, key-sync
  • Delay + fade-in time

Global LFOs (2):

  • Same spec but shared across all voices (authentic to Juno/JP-8 behaviour)
  • Can be switched to per-voice for richer polymodulation

Modulation Matrix

64 slots, each: Source → Destination × Amount

Sources include: ENV1-4, LFO1-6, velocity, aftertouch (mono + poly), mod wheel, pitch bend, key tracking, note number, random (per-note), MIDI CC 0–127, oscillator output (audio-rate mod)

Destinations include: all oscillator parameters (pitch, PW, wavetable pos, FM ratio/index), both filter cutoffs, both resonances, all envelope rates/levels, all LFO rates/depths, VCA level, pan, effect parameters

This exceeds the Quantum's modulation depth and matches a mid-size Eurorack system in routing flexibility.


Amplitude (VCA)

  • One VCA per voice: sample × envelope_level, single S32 multiply
  • Velocity scaling: linear or exponential curve, configurable per patch
  • Pan position: per-voice stereo placement (constant power)


Paradigm 2 — Digital Sample Engine (S&S / PCM + Live Sampling)

Inspired by the Korg M1, Roland JD-800/JV series, and the S&S (Sample and Synthesis) approach of the late 1980s–90s. The DAC chip's audio input extends this far beyond a traditional ROMpler — the Ant64 can sample both ahead of time and in real time.

This is a significant competitive differentiator. The Sequential Prophet X has no live audio input at all — samples must be loaded via USB. The Waldorf Quantum MK2 had live input but has been discontinued (April 2025). No current production synth at any price combines live real-time sampling with the full synthesis architecture described here.

Voice budget: up to 256 simultaneous voices — sample playback is computationally cheaper than VA (no nonlinear filter), so the pipeline can run more simultaneously.


Sample Sources — Four Modes

Mode A — Pre-loaded Samples (traditional S&S)

  • Samples stored in DBFS as BLOBs, loaded to SRAM on patch activate
  • Sample format: 16-bit signed PCM, mono or stereo, 44.1/48 kHz
  • Velocity layers: up to 8 per note zone
  • Full keyboard mapping: different sample per key range
  • Import from SD card, USB, or AntOS file manager
  • Use case: piano, strings, brass, choir, drum kits — realistic acoustic instruments

Mode B — Real-Time Live Sampling

  • Audio input → ADC on DAC chip → DMA ring buffer in SRAM
  • Capture on demand: press record, play a note/chord/phrase — captured immediately
  • Auto-loop detection: FireStorm DSP finds zero-crossings for clean loop points
  • Latency from capture to playable: < 5ms (one DMA buffer period)
  • Use case: sample a guitar chord, a vocal phrase, an external synth — play it instantly across the keyboard at any pitch

Mode C — Resample the Synth Output

  • Route the Ant64's own mixed output back through the ADC input
  • Capture a complex layered patch as a single sample
  • Then play that sample back through a new synthesis layer on top
  • Classic technique: Quantum's "self-recording", Ensoniq workflow
  • Use case: freeze a complex evolving pad as a static sample, layer it with VA leads; capture a granular texture and play it chromatically; reduce polyphony load by freezing background layers

Mode D — Live Granular (streaming granular)

  • Audio input streams directly into the granular engine without pre-recording
  • No buffer delay: granular parameters applied to live input in real time
  • Grain position jitter, pitch scatter, density, envelope all modulatable live
  • Use case: real-time granular processing of a vocalist, guitarist, or any audio source — turns the Ant64 into a live granular effects processor as well as a synthesizer

Sample Oscillator (all modes)

  • Phase accumulator reads through sample buffer (interpolated, U32 pointer)
  • Pitch shifting: ±48 semitones from root pitch with fine-tune
  • Loop modes: no loop · forward loop · ping-pong · release loop
  • Loop crossfade: short crossfade window at loop point removes clicks (FireStorm DSP)
  • Stereo samples: both channels preserved through stereo voice path

Sample Filter (SVF)

  • Chamberlin two-integrator SVF: LP / HP / BP / Notch, 12 or 24 dB/oct
  • Lighter compute than nonlinear ladder — enables 256 voice budget
  • ENV2 → filter cutoff for classic S&S brightness shaping

Envelopes (2 per voice, ADSR)

  • ENV1 → VCA (volume) — exponential, keyboard-tracked rates
  • ENV2 → filter cutoff

Granular Engine (all sample sources)

  • Grain size: 1ms – 500ms
  • Grain density: 1 – 200 grains/second
  • Position scatter: random offset from playhead position (creates texture)
  • Pitch scatter: random pitch variation per grain (±24 semitones)
  • Pan scatter: random stereo position per grain
  • Grain envelope: Gaussian, trapezoidal, or rectangular window
  • Reverse grains: per-grain random reversal flag

Layering with Other Paradigms

Any voice slot can layer a sample engine voice under or over a VA or FM voice. A patch can combine: FM electric piano (P3) + VA analog pad (P1) + live granular texture from mic input (P2 Mode D) — all playing simultaneously, all going through the shared filter and effects chain. No current hardware synth does all three at once.


Paradigm 3 — FM / Operator Engine (Yamaha Style)

Full frequency modulation synthesis in the DX7 / TX81Z tradition. An FM voice is built from operators — each operator is a simple unit: a sine wave oscillator with its own ADSR envelope and output level. Operators modulate each other according to an algorithm (a routing diagram), producing complex, harmonically rich timbres from simple building blocks.

Voice budget: up to 128 simultaneous voices at 6 operators per voice.

What an Operator Is

  ┌───────────────────────────────────┐
  │           OPERATOR                │
  │                                   │
  │  Frequency ratio (coarse + fine)  │
  │  ──► Phase accumulator (U32)      │
  │  ──► Sine lookup (BRAM, 1024pt)   │
  │  ──► × Output level (TL, 0–99)    │
  │  ──► × ADSR envelope              │
  │  ──► × Velocity scaling           │
  │  ──► × Key rate scaling           │
  │  ──► Output (modulates or sums)   │
  └───────────────────────────────────┘

Each operator produces a sine wave (or alternative waveform — see below) at a ratio of the base pitch, shaped by its own envelope and level. Carriers sum to audio output. Modulators feed their output into the phase of another operator, adding harmonics.

Operator Count and Waveforms

Mode Operators/voice Voices Reference
2-op 2 256 OPL2 (AdLib)
4-op 4 192 TX81Z, OPN
6-op 6 128 DX7, DX5
8-op 8 96 Beyond DX7 — Ant64 exclusive

Waveforms per operator — the TX81Z already extended DX7's sine-only operators to 8 waveforms. Ant64 supports 16 per operator, stored as 1024-point tables in BRAM:

# Waveform Character
0 Sine Classic FM, pure
1 Half sine Brighter, more even harmonics
2 Absolute sine Full-wave rectified, buzzy
3 Quarter sine (pulse) Hollow, wooden
4 Sawtooth Harsh, bright
5 Square Hollow, 303-like when FM modulated
6 Triangle Soft, flute-like
7–15 Custom wavetable User-defined, uploaded via AntOS

Mixing waveforms across operators gives far more timbral variety than DX7's sine-only approach — closer to the Vector synthesis of the Prophet VS.

FM Algorithms

An algorithm defines which operators are carriers (summed to output) and which are modulators (fed into another operator's phase). Ant64 supports:

  • All 32 DX7 algorithms (fully compatible — DX7 patch import supported)
  • All 8 TX81Z/DX11 4-op algorithms
  • Free routing mode — any operator can modulate any other, including self-feedback, parallel chains, stacked towers — a full modular FM signal graph per voice

Per-Operator ADSR

Each operator has an independent 4-stage envelope controlling its output level:

  • Attack rate, Decay rate, Sustain level, Release rate
  • Exponential curves (matches DX7 behaviour — rate, not time)
  • Keyboard rate scaling: higher notes use faster envelope rates (natural instrument feel)
  • Velocity sensitivity per operator: velocity can boost modulator depth for dynamic timbre

This is the key musical insight of FM — modulator envelope depth controls brightness. A fast modulator attack with slow decay gives a percussive click + evolving tone. Slow modulator attack gives a swelling, building timbre. This is what makes DX7 electric pianos, bells, marimbas, and basses so expressive.

Fixed-Point FM Implementation

FM synthesis is entirely integer arithmetic — perfect for fixed-point FPGA:

// Per operator per sample:
phase_acc += frequency_word;          // U32 add — wraps naturally
mod_input = previous_operator_out;   // S16 from modulating operator
index = (phase_acc >> 22) + (mod_input >> feedback_shift);  // 10-bit table address
output = sine_lut[index & 0x3FF];    // BRAM lookup → S16
output = (output * envelope_level) >> 15;  // S16 × U15 → S16

The entire 6-operator voice runs in ~60 multiplies and ~60 BRAM lookups per sample cycle. At a 200MHz fabric clock with 48kHz audio, that's 4,167 cycles per sample period — shared across all voices in a time-multiplexed pipeline. At ~28 cycles per 6-op FM voice, 128 simultaneous FM voices fit within the cycle budget with headroom for post-mix effects. See the Clock Architecture & Voice Budget section for full derivation.

DX7 Patch Compatibility

The FM engine is designed to be DX7 sysex-compatible — DX7 .syx patch banks can be imported via AntOS and converted to Ant64 FM patch format. This gives instant access to the vast library of DX7 patches: electric pianos, basses, bells, mallets, brass, strings.


Global Effects Chain (post-mix)

Running after all 128–256 voices are summed:

BBD Chorus (M-86 / Juno authentic)

  • Circular delay buffer: 2048 samples (~42 ms at 48 kHz)
  • Two modulated read taps (one per stereo channel)
  • Modulation LFO: triangle, rate/depth configurable
  • Mode I (subtle) and Mode II (deep) matching Alpha Juno BBD character
  • Sub-sample linear interpolation to prevent zipper noise
  • Additional modes: ensemble (3 taps), flanger (short delay + feedback), rotary

Reverb

  • Algorithmic: Schroeder/FDN hybrid (8 delay lines in BRAM)
  • Types: room, hall, plate, spring (spring particularly useful for guitar/analog character)
  • Pre-delay: 0–250 ms

Delay

  • Stereo delay, up to 2 seconds
  • Tap tempo sync, ping-pong mode
  • High-frequency damping per feedback tap (analog tape character)

Distortion / Saturation

  • Soft clip (tanh lookup) and hard clip stages
  • Bitcrusher (sample rate reduction + bit depth reduction)
  • Useful for lo-fi / rave character on individual voices or globally

Real-Time Sound FX Engine

The Ant64 is a home computer as well as a synthesizer. Games, demos, AntOS system events, and applications all need real-time sound effects — not music, but responsive, low-latency audio events triggered by code. The SFX engine is a dedicated subsystem within FireStorm, separate from the music voice pool, providing guaranteed voice availability for non-musical audio regardless of what the music engine is doing.

SFX Voice Pool

A reserved partition of FireStorm voices dedicated to SFX — not shared with music voices. Default allocation: 32 SFX voices (drawn from the total FireStorm budget, leaving 96–224 for music depending on model). Configurable at boot — a game might want 48 SFX voices; a pure music application might release them all to music.

  FireStorm total voice budget (200MHz, 48kHz):
  ┌─────────────────────────────┬──────────────────────────────┐
  │  Music voices (default)     │  SFX voices (default)        │
  │  96 VA + 48 FM + 64 sample  │  32 voices (any engine type) │
  │  = 208 music voices         │  = reserved, always available│
  └─────────────────────────────┴──────────────────────────────┘
  Partition is configurable — SFX pool size set at application launch.

SFX voices use the same DSP engines (VA, sample, FM) as music voices — any synthesis type is available for sound effects, not just sample playback. The distinction is ownership and priority: music voices are managed by the sequencer and workstation app; SFX voices are managed by the SFX API.

Priority System

When all 32 SFX voices are in use and a new SFX is triggered, the priority system decides which voice to steal:

Priority level Description Stealing behaviour
Critical System sounds, UI feedback Never stolen — reserved slots
High Important game events (player death, explosion) Steals from Low first
Normal General SFX (footsteps, impacts, pickups) Default level
Low Ambient, background, non-essential Stolen first

Oldest-first stealing within a priority level — the voice that has been playing longest is the one that gets cut when a new higher-priority sound needs a slot.

Procedural SFX Generation (SFXR-style)

SFX do not need to be pre-recorded samples. Procedural generation from parameterised waveforms covers the entire vocabulary of classic game and system audio — and does it in a few bytes of patch data rather than kilobytes of sample data.

Inspired by DrPetter's SFXR (2007) — the tool that established the vocabulary of indie game audio — the Ant64 SFX engine generates sounds from a compact parameter set evaluated in real time on FireStorm:

  SFX Patch (compact binary, ~32 bytes):
  ┌─────────────────────────────────────────────────────────────┐
  │  Waveform type:    square / saw / sine / noise / triangle   │
  │  Base frequency:   Hz (or note)                             │
  │  Frequency sweep:  Hz/sec (positive = rising, neg = falling)│
  │  Frequency delta:  acceleration on sweep (exponential feel) │
  │                                                             │
  │  Amplitude envelope:                                        │
  │    Attack:   ms                                             │
  │    Sustain:  ms                                             │
  │    Punch:    0–1 (instant volume spike at note on)          │
  │    Decay:    ms                                             │
  │                                                             │
  │  Duty cycle (square wave):  0–1, with sweep                 │
  │  Vibrato:   rate + depth                                    │
  │  Arpeggiate: frequency multiplier + speed (chip tune jumps) │
  │                                                             │
  │  Low-pass filter: cutoff + resonance + cutoff sweep         │
  │  High-pass filter: cutoff + sweep                           │
  │                                                             │
  │  Phaser: offset + sweep + feedback                          │
  │  Retro noise: bit depth + sample rate reduction             │
  └─────────────────────────────────────────────────────────────┘

Built-in SFX archetypes (one-touch generation, randomisable):

Archetype Parameters tuned for Example
Coin / pickup Rising square freq sweep, short decay Mario coin, item collect
Laser / shoot Falling saw sweep, fast decay Retro shoot-em-up shot
Explosion Noise + low-pass, long decay, punch Any explosion
Jump Rising freq sweep, medium decay Platform game jump
Power-up Arpeggiated rising sequence Level complete, item upgrade
Hit / hurt Noise burst, short, pitch drop Damage received
Select / blip Short square tone, fast attack/decay Menu navigation
Zap Noise + frequency wobble, medium decay Electric / magic effect
Rumble Sub-bass noise, long sustain Earthquake, engine
Ping Sine, slow decay Sonar, notification
Alert Two-tone alternating, repeating Alarm, warning
Ambient hum Sine + slight vibrato, sustained Engine, machinery

Each archetype has randomisable parameters — press RANDOMISE to get a variant in the same family. This is the SFXR workflow: generate, audition, adjust, accept.

SFX API (AntOS Scripting Bindings + Big Core C++ API)

From AntOS scripts / games (little core):

-- Play a named SFX from the library
sfx.play("coin_pickup")

-- Play with parameter overrides
sfx.play("laser", { pitch = 880, sweep = -200, volume = 0.8 })

-- Play a procedurally generated SFX from a patch struct
sfx.play_patch(my_patch)

-- Trigger at a specific FireStorm voice (bypass priority system)
sfx.play_voice(14, "explosion")

-- Play with spatial position (stereo pan derived from position)
sfx.play_spatial("footstep", { x = 0.3, distance = 1.0 })

-- Stop all SFX of a given name
sfx.stop("ambient_hum")

-- Set global SFX volume (independent of music volume)
sfx.set_volume(0.7)

From big core C++ (games / bare-metal applications):

// Immediate trigger — lowest latency path, direct QSPI write to FireStorm
SFX::play("coin_pickup");

// Parameterised
SFX::play("laser", SFXParams{ .pitch = 880.0f, .sweep = -200.0f });

// Procedural patch
SFXPatch patch = SFXPresets::explosion();
patch.decay_ms = 800;
SFX::play(patch);

// Spatial (2D game — x position maps to stereo pan)
SFX::play_spatial("footstep", Vec2{player.x, player.y});

The big core path writes directly to FireStorm voice registers via QSPI — the lowest possible latency, no OS involvement. The little core path goes through AntOS IPC but is still sub-millisecond for a register write sequence.

SFX Library in DBFS

SFX patches are stored in DBFS as 32-byte compact binary structs — the same storage system used for music patches. A full library of 256 SFX patches occupies 8KB. The library ships with a default set covering the archetypes above and is fully replaceable by the user.

  DBFS SFX entry (~32 bytes):
  ├─ name:       char[16]   "coin_pickup\0"
  ├─ waveform:   uint8      WAVE_SQUARE
  ├─ base_freq:  float32    523.25 (C5)
  ├─ envelope:   4× float32 attack/sustain/punch/decay
  ├─ sweep:      float32    freq sweep rate
  ├─ filter:     4× float32 lp_cutoff/lp_res/hp_cutoff/sweep
  ├─ arp:        2× float32 multiplier/speed
  └─ flags:      uint8      retro_noise | phaser | vibrato

Optional Analog Output Stage

A small analog board between the DAC and the line output can add genuine analog character without compromising digital precision:

Option A — Passive (simplest):

  • Output transformer for warmth and harmonic saturation
  • No active components in signal path

Option B — Active character stage:

  • Op-amp saturation stage (TL072 or similar, run warm)
  • Single-pole analog LP filter for gentle HF rolloff (removes any DAC artifacts)
  • Drives to line level (phono out) and internal speakers

Option C — Analog VCF (maximum authenticity):

  • CEM3320 / SSI2144 (modern reissue of the Prophet-5 filter chip) or
  • Discrete transistor ladder (Moog style)
  • DAC output → analog VCF → line out
  • Controlled via a dedicated CV output (DAC-driven CV from FireStorm)
  • This gives a genuine analog filter stage identical to a Prophet-5 or Minimoog
  • Can be bypassed digitally for clean output

Option C is the most ambitious but gives the Ant64 something the Waldorf Quantum doesn't have: a real, classic analog filter chip in the signal path.


Environmental / Acoustic Post-Processing

Any audio source on the Ant64 — music voices, SFX, MOD player, live input, speech synthesis — can be routed through an environmental processing preset that simulates the acoustic character of a physical space or transmission medium. Underwater, cavern, large hall, metal pipe, telephone, outer space — the processing transforms the dry sound into something that belongs in that environment.

This is distinct from the global FX chain (which applies to the music mix). Environmental processing operates as insert or send buses on individual voices or voice groups, and as a global environment applied to the entire output mix. Multiple environments can run simultaneously — SFX voices in one space, music voices in another.

Implementation on FireStorm

Each environment preset is a configuration of existing FireStorm DSP blocks — no new hardware is needed. The blocks are already present: FDN reverb, parametric EQ, ladder filter, BBD chorus, distortion/saturation, delay, bitcrusher. An environment is a named set of parameters applied to these blocks.

  Any audio source
        │
        ├──→  [EQ curve]           ← shape the frequency response
        │
        ├──→  [Modulation]         ← pitch wobble / tremolo / chorus
        │
        ├──→  [FDN Reverb]         ← room size, decay, diffusion, damping
        │
        ├──→  [Delay]              ← pre-delay, echo, flutter
        │
        ├──→  [Noise floor]        ← add ambient background noise
        │
        └──→  Processed output

All blocks are in the existing FireStorm DSP pipeline — applying an environment preset is a register write from the big core or AntOS, taking effect within one sample period. Crossfading between environments (smooth transition as a character moves from a room into a corridor) is a linear parameter interpolation over a configurable number of bars or seconds.

Environment Presets

Underwater

The defining acoustic characteristics of underwater audio: dramatic high-frequency absorption (water absorbs high frequencies rapidly), pressure-induced pitch variation, soft low-frequency resonance, and the physical sensation of sound transmitted through a dense medium.

  Processing chain:
  ├─ Low-pass filter:  aggressive, -24dB/oct at 600–900Hz
  │                   cutoff slowly wobbles ±50Hz at 0.3Hz (pressure variation)
  ├─ Resonance:        mild peak at 400Hz (water column resonance character)
  ├─ Chorus:          2-tap, slow rate (0.2Hz), moderate depth — water diffusion
  ├─ Pitch modulation: ±4 cents at 0.15Hz (density-of-medium effect)
  ├─ Reverb:          medium decay (1.2s), high diffusion, heavily damped highs
  │                   pre-delay 8ms — sound travels slower in water (relative feel)
  ├─ Volume:          -4dB overall — water absorbs energy
  └─ Optional:        broadband noise floor at -48dB — bubbles, water movement

Distinctive and immediately recognisable. Works on any source: muffled underwater music, distant underwater explosions, speech that sounds like it's heard through a pool wall.


Cavern / Cave

Stone surfaces reflect sound with moderate HF absorption. Long reverb tails, strong early reflections from close walls, low-frequency resonance in the cave body, potential flutter echo between parallel surfaces.

  Processing chain:
  ├─ EQ:              -3dB shelf above 6kHz (stone absorbs some high frequencies)
  │                   +2dB at 200–400Hz (room resonance / low-frequency build)
  ├─ Early reflections: 4–6 discrete delays at 15–60ms, -6 to -18dB
  │                     simulating close stone walls
  ├─ Reverb:          long decay (2.5–4s), medium diffusion, stone character
  │                   decay time varies with cavern size preset
  ├─ Flutter echo:    optional — delay at ~80ms with feedback 0.5–0.7 for
  │                   parallel wall flutter (narrow canyon feel)
  ├─ Pre-delay:       20–40ms (distance to nearest wall)
  └─ Sub-bass boost:  +3dB below 80Hz — caves resonate at low frequencies

Variants: small cave (tighter reflections, shorter decay), large cavern (longer pre-delay, 4–6s decay), ice cave (brighter reflections, less HF absorption), lava tube (more low-end, slight distortion character).


Large Hall / Cathedral

The classic reverberant space. Long pre-delay (distance to first reflection), very long decay, high diffusion, wide stereo spread. The sound of music meant to fill a large resonant space.

  Processing chain:
  ├─ EQ:              gentle air boost (+2dB at 10kHz) — hall brightness
  │                   slight low-mid cut (-1.5dB at 300Hz) — reduce muddiness
  ├─ Pre-delay:       40–80ms (distance to first wall in a large hall)
  ├─ Reverb:          very long decay (3–8s), very high diffusion
  │                   early reflections at 40–120ms
  ├─ Stereo spread:   maximum — reverb tail fills the full stereo field
  └─ Late reverb:     gradual HF rolloff in tail (air absorption over distance)

Small Room / Studio

Close, intimate acoustic space. Short reverb, audible early reflections, relatively dry compared to hall. The sound of a padded room or recording booth.

  Processing chain:
  ├─ EQ:              slight boxiness (+1dB at 400Hz — close wall resonance)
  ├─ Early reflections: 4 reflections at 8–25ms, -4 to -10dB
  ├─ Reverb:          short decay (0.3–0.8s), low diffusion
  └─ Pre-delay:       2–8ms

Metal Pipe / Tunnel

Resonant cylindrical geometry creates strong modal resonances — specific frequencies ring loudly while others are suppressed. Flutter echo between parallel surfaces. The distinctive metallic coloration of sounds heard through a pipe or ventilation shaft.

  Processing chain:
  ├─ Resonant EQ:     sharp peaks at pipe resonant frequencies
  │                   f_n = n × c / (2L) where L = pipe length, c = 344m/s
  │                   Example: 5m pipe → resonances at 34Hz, 68Hz, 103Hz...
  │                   Implemented as 4–6 narrow bandpass peaks in EQ
  ├─ HF cut:          -18dB above 3kHz (pipe walls absorb high frequencies)
  ├─ Flutter echo:    delay at ~30ms (pipe diameter), feedback 0.6–0.75
  │                   creates the metallic ringing character
  ├─ Reverb:          short, low diffusion (cylindrical geometry = coherent echo)
  └─ Distortion:      mild saturation (metallic surface coloration)

Telephone / Radio Transmission

Bandpass filtering to match the frequency response of telephone audio (300–3400Hz) or AM radio. Adds noise, mild compression, slight saturation. Immediately recognisable as "heard over a communication channel."

  Telephone:
  ├─ Bandpass:        300Hz–3,400Hz (ITU-T G.711 telephone band)
  ├─ Distortion:      mild saturation (analogue circuit character)
  ├─ Noise:           white noise at -50dB (line noise)
  ├─ Compression:     heavy (3:1, fast attack) — telephone dynamic range limiting
  └─ Volume:          -2dB overall

  AM Radio:
  ├─ Bandpass:        100Hz–5,000Hz (AM broadcast bandwidth)
  ├─ Noise:           pink noise at -40dB + occasional crackle bursts
  ├─ Distortion:      moderate saturation (AM demodulator character)
  └─ Slight flutter:  0.5Hz pitch modulation at ±2 cents (carrier instability)

  Walkie-talkie / CB:
  ├─ Bandpass:        400Hz–2,800Hz (narrower than telephone)
  ├─ Hard clipping:   aggressive — squelch character
  ├─ White noise:     -35dB (radio static)
  └─ Gate:            noise gate opens on signal (squelch simulation)

Outer Space (Sci-Fi Convention)

Physically, space is silent — no medium to carry sound. The sci-fi convention is a large, reverberant, pristine space with very slow decay and no air absorption. The sound of something massive happening in a vacuum, as heard by the audience rather than physics.

  Processing chain:
  ├─ EQ:              flat — no air absorption, all frequencies preserved equally
  ├─ Reverb:          very long decay (6–15s), very high diffusion
  │                   no HF rolloff in the tail (no air = no absorption)
  ├─ Pre-delay:       100–200ms (great distance, vastness of space)
  ├─ Stereo:          extreme width — a 180° spatial impression
  ├─ Pitch shift:     very slight down (-5 cents) — gravitational scale suggestion
  └─ No noise floor:  space is completely silent between events

Variants: close explosion (short pre-delay, massive low-end boost), distant signal (more pre-delay, high-frequency roll-off simulating transmission distance).


Custom Environment

All environment parameters are exposed via the workstation app and AntOS API. Any combination of EQ, reverb, delay, modulation, noise, distortion, and filter settings can be saved as a named custom environment preset in DBFS. Environments are small parameter structs (~128 bytes) and shareable over the gossip network.


Routing Architecture

  FireStorm audio sources:
  ┌────────────────────────────────────────────────────────────┐
  │  Music voices (VA, FM, Sample, 303)                        │
  │  SFX voices (procedural, sample-based)                     │
  │  MOD player channels                                       │
  │  Live audio input (ADC)                                    │
  │  SAM speech synthesis (from Pulse via PIO)                 │
  └──────────────────────┬─────────────────────────────────────┘
                         │
              ┌──────────▼──────────────────────────┐
              │  Environment Bus Router             │
              │                                     │
              │  Per-voice or per-group assignment: │
              │  ├─ Music → Environment A           │
              │  ├─ SFX   → Environment B           │
              │  ├─ Voice 1–8 → Environment C       │
              │  └─ ADC input → Environment D       │
              └──────────┬──────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
  [Env A: Hall]   [Env B: Cavern]  [Env C: Underwater]
         │               │               │
         └───────────────┴───────────────┘
                         │
                  [Master mix bus]
                         │
                  [Global FX chain]
                         │
                  [WM8958 output]

Up to 4 simultaneous environments at full polyphony — constrained only by the FireStorm cycle budget. Each active environment is a separate FDN reverb instance plus EQ and modulation chain. At 200MHz fabric, 4 × 390 cycles (effects budget) = 1,560 cycles, leaving 2,607 cycles for voice DSP — still comfortable at 64–96 VA voices per environment.

API

Big core C++ (workstation app, bare-metal):

// Apply environment to a voice group
Audio::setEnvironment(GROUP_SFX,   ENV_CAVERN);
Audio::setEnvironment(GROUP_MUSIC, ENV_HALL);
Audio::setEnvironment(GROUP_ADC,   ENV_UNDERWATER);

// Crossfade to a new environment over 2 seconds
Audio::crossfadeEnvironment(GROUP_MUSIC, ENV_TUNNEL, 2.0f);

// Custom environment from a preset struct
EnvironmentPreset my_env = EnvironmentPresets::cavern_large();
my_env.reverb_decay_s = 5.0f;
Audio::setCustomEnvironment(GROUP_SFX, my_env);

AntOS scripting bindings:

-- Set environment for a voice group
audio.set_env("sfx", "cavern")
audio.set_env("music", "hall")

-- Smooth crossfade
audio.crossfade_env("music", "underwater", 3.0)

-- Load custom environment from DBFS
local env = dbfs.load_env("my_cave_preset")
audio.set_custom_env("all", env)

The TB-303 is a special case that warrants its own dedicated engine mode rather than being shoehorned into the general voice architecture. What makes it distinctive is not just the filter — it is the complete interaction between the oscillator, the dual envelope system, the accent circuit, the slide/portamento, and the step sequencer. Get any one of these wrong and it stops sounding like acid.

Why the 303 is Hard to Clone

Most 303 clones fail because they treat it as "sawtooth + 18dB filter." The real story:

  • The filter is actually 4-pole but with interacting (non-buffered) poles, giving an effective ~18 dB/oct rolloff with a distinctive resonance character and an unusual 10 Hz peak in the resonance feedback circuit (Tim Stinchcombe's 2009 analysis)
  • There are two independent envelope generators — not one
  • The accent circuit is an RC-based sweep that accumulates across consecutive accented notes — this is what gives the "increasingly distressed animal cry" when accents repeat
  • The slide uses a fixed time (not fixed rate) — so the pitch change slows down as the interval shrinks
  • The square wave is derived from the sawtooth via single-transistor waveshaping, giving it a subtly different character to a clean square wave
  • Overdrive of the output is intrinsic to the acid sound — the filter output into a slightly clipping output stage adds harmonics essential to the genre

TB-303 Engine Architecture (FireStorm)

   Step Sequencer (Pulse RP2350)
   ├─ Note pitch
   ├─ Gate (normal / extended for slide)
   ├─ Slide flag → portamento circuit
   └─ Accent flag → accent circuit

   VCO
   ├─ Sawtooth (phase accumulator, U32)
   └─ Square (derived: sawtooth → single-pole HP waveshaper, not clean square)

   VEG (Volume Envelope Generator)
   ├─ Attack: sharp (near-instant), fixed
   ├─ Decay: exponential, FIXED long time (~200ms typical)
   └─ Drives VCA only

   MEG (Main Envelope Generator)
   ├─ Attack: sharp, fixed
   ├─ Decay: variable (Decay pot) on normal notes
   ├─ Decay: FORCED to minimum on accented notes
   └─ Drives VCF cutoff via Env Mod amount

   Accent Circuit  ← triggered only on accented notes
   ├─ Takes MEG output through RC smoothing network (47kΩ + 33nF)
   ├─ Adds smoothed voltage to VCA level (louder note)
   ├─ Adds smoothed voltage to VCF cutoff (brighter note)
   └─ CRITICAL: RC does not fully discharge between consecutive accents
      → Repeated accents cause progressively higher filter peaks
      → This stacking behaviour IS the acid sound

   Portamento / Slide Circuit
   ├─ Fixed slide TIME (not rate) — glide always takes same duration
   ├─ Activated when current note has slide flag set
   └─ Stays active until the note after the slid-to note begins

   VCF — Diode Ladder (303-specific model)
   ├─ 4-pole with non-buffered interacting stages (unlike Moog buffered ladder)
   ├─ Deliberately mismatched capacitor on first pole (C18 ≈ half value)
      → Prevents self-oscillation; overdrive occurs instead
   ├─ Resonance → overdrive, not self-osc — implement as tanh saturation at
      resonance feedback path with clipping asymptote
   ├─ High-pass filter in resonance feedback loop (unusual 10 Hz peak)
   └─ Additional HP filtering between filter output and VCA (bass rolloff)

   VCA → Output Saturation Stage
   └─ Soft-clip (tanh) at output — essential for acid harmonic character

Fixed-Point Implementation Notes

The accent accumulation is the trickiest part to get right in fixed-point:

// Conceptual fixed-point accent RC model
// Each sample tick:
accent_voltage = accent_voltage * decay_coeff;   // S1.30 multiply
if (accent_triggered) {
    accent_voltage += meg_output * accent_amount; // add MEG contribution
}
vcf_cutoff += accent_voltage >> accent_depth_shift;
vca_level  += (accent_voltage * vca_accent_scale) >> 16;

The RC time constant maps to a decay_coeff close to 1.0 — approximately 1 - (1 / (sample_rate * RC_time)). At 48 kHz with RC ≈ 47kΩ × 33nF = 1.55ms, decay_coeff ≈ 0.99999 in S1.30 fixed-point. Needs sufficient precision — use S2.30 or S1.62 (64-bit accumulator) to avoid losing the accent voltage to rounding.

The mismatched first pole in the diode ladder is modelled by giving the first one-pole stage a slightly different frequency coefficient — approximately 0.5× the capacitance means ~2× the cutoff frequency for that stage. This is what prevents the feedback phase from reaching 360° and suppresses self-oscillation.

303 Step Sequencer

The sequencer is as much part of the 303 sound as the synth itself. The workstation app drives the 303 engine via the Pulse sequencer:

  • 16 steps (extendable to 32)
  • Per-step: pitch (3 octave range), gate length (normal / extended), slide, accent
  • Tempo sync to MIDI clock or internal BPM
  • Pattern storage in DBFS as compact binary (16 bytes per pattern)
  • Live step entry mode (authentic 303-style programming)
  • Pattern chain and randomise modes

303 vs General Voice Engine

The 303 engine runs as a dedicated single-voice mode in FireStorm — it does not use the general 32-voice architecture. When activated it takes one voice slot and runs the 303-specific DSP pipeline with the dual envelope and accent circuit. Multiple 303 engines can run simultaneously (one per voice slot) for multi-track acid patterns.


Reference Sounds / Targets

Sound Origin Key requirements
TB-303 acid bass Roland TB-303 Diode ladder + dual ENV + accent accumulation + slide + output overdrive
M-86 / Hoover Alpha Juno patch #86 "What The?" PWM-sawtooth, fast LFO on PWM, BBD chorus, dropping pitch env
Juno pad Juno-60 / 106 Sawtooth + sub, 4-pole LP, BBD chorus
Prophet brass Prophet-5 2× saw, LP filter with env attack, slight detune
DX7 bass/piano Yamaha DX7 6-op FM, algorithm 5 (piano), algorithm 14 (bass)
Minimoog lead Minimoog Model D 3× oscillator unison, ladder filter self-osc
Wavetable sweep PPG Wave / Waldorf WT position swept by envelope, resonant LP
Karplus string Physical modeling Short exciter noise burst, delay line damping
Granular cloud Any granular Live input, position scatter, long grains

Fixed-Point Arithmetic Summary

All signal-path arithmetic uses fixed-point. No floating-point units instantiated in FireStorm.

Block Format Notes
Phase accumulator U32 Natural overflow = waveform wrap
Audio samples S24 144 dB dynamic range, well beyond DAC
Mix bus accumulation S32 Headroom for 128+ voices summed without saturation
Filter state variables S32 Prevents overflow in resonant feedback path
Filter tanh S1.15 lookup 12-bit address, BRAM, pre-computed
Envelope levels U24 Exponential segments via multiply-accumulate
LFO S16 Sub-audio, no precision issue
FM operator levels S24.8 Extra fractional bits for smooth FM ratios
Wavetable samples S16 Storage; interpolated to S24 in DSP path
Modulation amounts S16 Signed, allows bipolar modulation

Integration with AntOS

  • Pulse RP2350: receives MIDI, runs audio sequencing, jog dial input, joypad input, retro SAM-style speech synthesis; sends sequencer events and register writes to FireStorm over QSPI (control plane); streams bulk data including rendered speech PCM to FireStorm over the 4-line PIO fast transfer bus; hi-speed UART to SG2000; acts as SPI slave to DeMon. Note: Pulse has no audio codec — audio I/O is handled directly by FireStorm
  • DeMon RP2350: system supervisor; JTAG to SG2000 and FireStorm for debug/programming; QSPI to FireStorm for boot and control; SPI slave to SG2000; hi-speed UART to SG2000 for debug streams and boot messaging; SPI master to Pulse
  • FireStorm (FPGA): pure sample-rate DSP, register-mapped voice parameters, no OS; contains the FireStorm EE execution engine alongside audio DSP and rasterizer; WM8958/WM8960 audio codec connected directly — handles all audio output routing: 2× internal stereo speakers, phono (line) out, HDMI audio embed, optical out (S/PDIF / TOSLINK); plus stereo audio input for live sampling
  • SG2000 big core (C906 @ 1GHz, bare metal): the Music Workstation App — a full native C++ application compiled for the big core. No OS. Direct hardware access. Dear ImGui UI rendered to FireStorm. Controls all voice parameters, patch management, sequencer, and the visual editing pages (A, D, W, H, S, R, E, F, M). This is the primary user-facing application — think of it as the instrument's firmware.
  • AntOS (little core, C906 @ 700MHz): system OS — shell, networking, file management, debug server/client, MIDI routing config, gossip P2P. All audio system libraries are exposed with AntOS scripting bindings so scripts can query and control any aspect of the audio engine — voice parameters, patch loading, sequencer state, FFT data feeds. AntOS is not responsible for real-time audio; that belongs entirely to the big core and FireStorm.
  • DBFS: patches stored as compact binary structs (~256 bytes per patch), banks as BLOB
  • USB MIDI: handled by Pulse, exposed to AntOS as a virtual MIDI port

Exceeding the Waldorf Quantum MK2 — and Everything Else

Note: The Waldorf Quantum MK2 was discontinued in April 2025. No current production synthesizer combines the full feature set described below. The Ant64 is targeting a space that currently has no occupant.

Feature Quantum MK2 †disc. Prophet X Virus TI2 Waldorf Kyra Ant64 Target
Max voices 16 16 80 128 128–256
Synthesis paradigms 1 hybrid 1 hybrid 1 VA+WT 1 VA+WT 3 (VA · Sample · FM)
FM engine Kernel only No Yes No Full 8-op, free routing
Full S&S sample engine Granular only Yes (150GB lib) No No Yes + user samples
DX7 sysex import No No No No Yes
Live audio input Yes No No No Yes
Real-time sampling Yes (limited) No No No Yes — 4 modes
Resample own output Yes No No No Yes
Live granular input Yes (limited) No No No Yes — streaming
303 acid engine No No No No Yes (dedicated)
M-86 / Hoover authentic No No No No Yes
Real analog filter option Hybrid on-board Yes (analog LP) No No Optional SSI2144
RGB performance UI No No No No 8× RGB jog dials
Open / hackable DSP No No No No Yes — FPGA bitstream
Still in production No Yes Yes Yes Yes — Ant64S/Ant64/Ant64C
Price ~€4,800 ~€3,800 ~€3,000 ~€1,800 Multiple tiers

Physical UI — 8 Jog Dials (Pulse RP2350)

Eight endless rotary encoders with integrated push buttons are connected directly to Pulse. Each dial optionally has an RGB LED (e.g. WS2812B/SK6812) driven via a single DMA-backed data line from Pulse — all 8 LEDs chained, full strip refresh in ~30µs, updated at 60 Hz.

Input Model (per dial)

Each dial provides three distinct physical inputs:

Input Action
Rotate Increment / decrement current parameter (relative, no jump-on-pickup)
Push Context action: confirm / reset to default / toggle mode
Push + Rotate Fine adjust (smaller step size) or alternate parameter

With one dial designated as Shift (hold push, turn others), effective logical control count is 24 without adding hardware.


Parameter Page System

8 dials cannot cover the full synth engine in one view. A paged system is used, with the current page indicated by dial LED colour. Turning any dial while on a page instantly updates that parameter in the FireStorm voice registers via Pulse.

Page LED Colour Dial assignments
OSC Amber Osc1 pitch, Osc1 PW/WT pos, Osc2 pitch, Osc2 detune, Osc mix, Sub level, Noise level, Sync/mode
FILTER Blue Cutoff, Resonance, Filter type, Env amount, Env polarity, Key tracking, Drive, Filter routing
ENV Green Attack, Decay, Sustain, Release, ENV2 Attack, ENV2 Decay, ENV2 Sustain, ENV2 Release
LFO Purple LFO1 rate, LFO1 depth, LFO1 waveform, LFO1 dest, LFO2 rate, LFO2 depth, LFO2 waveform, LFO2 dest
FX Cyan Chorus rate, Chorus depth, Chorus mode, Reverb size, Reverb mix, Delay time, Delay feedback, Master FX mix
MOD White Mod slot select, Source, Destination, Amount, ×4 quick-assign slots
303 Red Cutoff, Resonance, Env mod, Decay, Accent, Slide time, Waveform (push=toggle), Tempo
PATCH Magenta Patch select, Bank select, Save, Compare, Voice count, Unison detune, Bend range, Portamento

Page is selected by a dedicated page button (or double-tap any dial push), or via AntOS UI.


RGB LED Behaviour

Colour = Parameter Value (on active page)

Hue sweeps across the current parameter's range as the dial is turned:

Low value  ──────────────────────────────  High value
  Blue → Cyan → Green → Yellow → Orange → Red

Brightness = how far from default. At default value: dim. At extreme: full brightness. This means a quick glance shows the "shape" of a patch across all 8 parameters.

Breathing = Live Modulation

If a parameter is being modulated by an LFO or envelope, its LED breathes — pulsing in brightness at the modulation rate. Immediately shows what is moving without any display. At audio-rate modulation (FM), LED glows solid at modulation depth colour.

Page Colour Identity

When switching pages all 8 dials briefly flash their new page colour then settle into value-hue mode. You always know which page you are on from the tint of the LEDs.

303 Mode — Sequencer Feedback

In 303 mode the LEDs reflect the live sequencer state:

Dial RGB behaviour
Cutoff (1) Flashes briefly on each MEG envelope trigger
Resonance (2) Brightness tracks resonance value continuously
Env Mod (3) Pulses on each note gate
Decay (4) Glow duration tracks current decay time visually
Accent (5) Builds in brightness across consecutive accented steps — mirrors the RC accent accumulation in hardware. Resets when accent chain breaks.
Slide (6) Glows cyan during active portamento slide
Waveform (7) Amber = sawtooth, Blue = square
Tempo (8) Pulses white on every beat (16th note flash, brighter on beat 1)

The accent dial building in brightness across repeated accents makes the accumulation circuit visible — an immediate diagnostic and a striking performance visual.

Shift / Modifier State

  • Dial whose button is held as Shift: glows white
  • Dials with available shift-functions: glow their page colour at reduced brightness
  • Dials with no shift-function: go dark

Limit Warning

When a parameter reaches its minimum or maximum, the dial flashes white once — a tactile+visual "end stop" replacing any on-screen message.

Voice Activity (optional poly view)

In a dedicated "voice view" mode (hold page button), the 8 dials represent 8 of the 32 voice slots: lit = voice currently sounding, brightness = amplitude, colour = synthesis engine type (VA=amber, WT=blue, FM=yellow, granular=green, 303=red).


WS2812B / SK6812 Implementation on Pulse RP2350

Pulse GPIO (1 pin) ──► WS2812B chain ──► LED0 ──► LED1 ──► ... ──► LED7
  • RP2350 PIO state machine generates the WS2812 protocol (standard PIO program)
  • DMA transfers 8× 24-bit GRB values (192 bits) per frame — negligible CPU overhead
  • Refresh rate: 60 Hz (16.7ms frame time, well within WS2812B timing)
  • Total data per frame: 24 bytes
  • Power: ~20mA per LED at full white × 8 = 160mA max; run at 30-50% brightness for ~50-80mA total — easily supplied from the Pulse board regulator

Pulse maintains a uint8_t led_grb[8][3] buffer. The synth engine and sequencer write into this buffer; a 60 Hz timer DMA-blasts it to the LED chain. No blocking, no polling.


MIDI Connectivity

The Ant64 has the full professional MIDI connectivity stack — both traditional and modern. All MIDI handling runs on Pulse RP2350.

DIN MIDI (5-pin) — Ant64C only

  • MIDI In — receive notes, CC, program change, sysex, clock from any hardware device
  • MIDI Out — transmit from internal sequencer to external hardware synths, drum machines, rack modules, vintage gear — anything made since 1983
  • MIDI Thru — hardwired pass-through of MIDI In signal, no latency, no CPU involvement

DIN MIDI is present only on the Ant64C (Creative Edition). The Ant64 and Ant64S have USB MIDI only. Having all three DIN ports on the Ant64C makes it a first-class citizen in a traditional hardware MIDI studio — no adaptors needed for vintage gear.

MIDI Thru is increasingly rare on modern synths. The Waldorf Quantum MK2 had no DIN MIDI at all. The Ant64C restores the full traditional MIDI port set.

USB MIDI

  • Class-compliant USB MIDI device — no driver needed on any OS
  • Simultaneously acts as MIDI host — can connect USB MIDI controllers directly (keyboards, pad controllers, wind controllers) without a computer in the chain
  • MIDI over USB alongside DIN simultaneously — computer DAW + hardware rack at the same time
  • AntOS exposes USB MIDI as a virtual port accessible from scripts and the workstation app

MIDI Feature Set (Pulse firmware)

  • Full 16-channel receive and transmit
  • MIDI clock master and slave — sync internal sequencer to external gear or DAW
  • MIDI Machine Control (MMC) — transport control from DAW
  • Sysex passthrough and sysex-based patch dump/restore
  • MPE (MIDI Polyphonic Expression) receive — per-note pitch bend, pressure, slide from MPE controllers (Roli Seaboard, Linnstrument, Expressive E Osmose)
  • MIDI Learn on all synth parameters — any knob/slider on any controller can map to any Ant64 parameter in real time

Pulse Sequencer

The Pulse RP2350 runs a polyphonic multi-track sequencer alongside the MIDI and voice allocation engine. This is not a simple arpeggiator — it is a full composition tool.

Architecture

  Pulse Sequencer
  ├─ 16 tracks (any combination of types below)
  ├─ Up to 64 steps per pattern, variable step length
  ├─ Pattern chain → Song mode (patterns → arrangement)
  ├─ MIDI clock sync (master or slave)
  └─ All output routed to:
       ├─ Internal FireStorm voice engine (any paradigm)
       ├─ DIN MIDI Out (control external hardware)
       └─ USB MIDI Out (control DAW / computer)

Track Types

Track Description
Melodic Polyphonic pitch sequence, velocity, gate length per step
Drum/Rhythm 16-slot per-step pattern, each slot → different MIDI note / voice
303 Acid Dedicated 303 sequencer (pitch, gate, slide, accent per step)
Chord Step-based chord sequence with voicing control
CC Automation Records and plays back MIDI CC curves — automate any parameter
Arpeggiator MIDI note input → arpeggiated output, multiple modes
Euclidean Mathematical rhythm generation (hits, steps, rotation, offset)

Sequencer Features

  • Real-time recording — play notes live, capture to sequencer
  • Step entry — program steps one at a time, 303-style
  • Probability per step — 0–100% chance of triggering (generative / evolving patterns)
  • Parameter locks — per-step value overrides for any synth parameter (Elektron-style)
  • Swing / shuffle — adjustable timing offset on even steps
  • Polyrhythm — each track can have independent step count and time division
  • Pattern chain — order patterns into a song arrangement
  • Live pattern switching — seamless pattern change on bar boundary
  • MIDI thru routing — incoming MIDI notes merged with sequencer output on MIDI Out

303 Track (dedicated)

As described in the 303 acid engine section — full per-step pitch, gate type (normal/ extended), slide flag, accent flag. Authentic Roland-style programming workflow. Multiple 303 tracks can run simultaneously (full acid setup: bass line + lead line + chord stab from one box).


Amiga MOD Player (Pulse) + MOD Editor (SG2000 Big Core)

The MOD system follows the same editor/player split as the live coder: the player runs bare-metal on Pulse; the editor is a page in the workstation app on the SG2000 big core. The little core's role is limited to the IPC API that routes edited pattern data from the big core to Pulse.

Pulse runs the player engine — pattern sequencer, effect processor, sample mixer, Paula emulation. It reads module files from DBFS or SD card and handles all real-time playback duties bare-metal at 300MHz.

SG2000 big core (Page K in the workstation app) provides the visual editor — pattern grid, track view, instrument editor, sample browser, song arranger. The musician edits here; changes are serialised and sent to Pulse via the little core IPC.

The original ProTracker MOD player ran on a 7.09MHz Motorola 68000. Pulse runs at 300MHz — roughly 40× the headroom — making a faithful, extended MOD player trivially accommodated alongside the live code player, sequencer, MIDI, jog dials, and SAM speech synthesis.

Supported Formats

Format Origin Channels Notes
.MOD Amiga ProTracker / SoundTracker 4–8 Original Amiga format — 31-sample, 64-pattern
.XM FastTracker 2 (DOS) Up to 32 Extended patterns, envelopes, vibrato table
.S3M ScreamTracker 3 (DOS) Up to 32 Stereo panning, OPL2 channel support
.IT Impulse Tracker (DOS) Up to 64 Most expressive tracker format — filters, NNA

The MOD player reads module files from DBFS or the SD card. Samples are streamed from storage for large modules rather than pre-loaded entirely into RAM.

Architecture on Pulse

  DBFS / SD card
       │
       │  Module file (samples + pattern data + song order)
       ▼
  Pattern sequencer
  ├─ Song position counter (order list → pattern index)
  ├─ Pattern step counter (rows 0–63 / 0–255)
  ├─ Per-channel note/instrument/effect parsing
  └─ Tick clock (BPM × ticks-per-row, default 6 ticks/row at 125 BPM)
       │
       ▼
  Effect processor (per channel, per tick)
  ├─ 0xy  Arpeggio        ├─ 3xx  Tone portamento
  ├─ 1xx  Porta up        ├─ 4xy  Vibrato
  ├─ 2xx  Porta down      ├─ Axy  Volume slide
  ├─ 5xy  Porta + vol     ├─ Bxx  Pattern jump
  ├─ 6xy  Vibrato + vol   ├─ Cxx  Set volume
  ├─ 7xy  Tremolo         ├─ Dxx  Pattern break
  ├─ 9xx  Sample offset   ├─ Exx  Extended effects
  └─ Fxx  Set speed/BPM   └─ Gxx  Set global volume
       │
       ▼
  Sample mixer (fixed-point, per channel)
  ├─ Linear interpolation between samples
  ├─ Amiga-accurate Paula hardware emulation (optional)
  │  └─ Low-pass filter characteristic (RC filter on Paula output)
  ├─ Per-channel volume + panning
  └─ Stereo mix bus → FireStorm audio output

Amiga Paula Emulation

The distinctive Amiga sound is partly the DAC and partly the Paula chip's hardware low-pass filter — a simple RC filter that rounded off the harsh edges of 8-bit samples. The MOD player offers:

  • Accurate mode: Paula RC filter emulated per-channel — the authentic warm Amiga sound. Samples retain the characteristic 8-bit warmth.
  • Clean mode: no filter — full-fidelity playback. 16-bit or 24-bit samples sound as sharp as the original recording allows.
  • FireStorm mode: MOD channels routed as voice inputs to the VA engine — each channel feeds a ladder filter and BBD chorus. Amiga samples processed through fully analog-modelled circuitry. The intersection of 1985 and 2026.

Integration with the Sequencer and FireStorm

The MOD player is not isolated from the rest of the audio system:

  • MOD channels can be routed to any FireStorm voice slot — the sample data feeds the sample engine directly, with all VA/filter/chorus processing available on top
  • The MOD player's BPM clock can sync to the Pulse sequencer clock or MIDI clock — MOD patterns and tracker sequences run in tight synchronisation
  • MOD note events can trigger the live coding interpreter (see below) — a MOD file can drive a live coding performance as its clock source

SMPS Player (Pulse) + SMPS Editor (SG2000 Big Core)

SMPS (Sega Music Player System) is the music driver format used across the Sega Mega Drive / Genesis library — Sonic the Hedgehog, Streets of Rage, Gunstar Heroes, Phantasy Star IV, and hundreds of others. It describes music in terms of FM operator parameters, note sequences, envelopes, LFO settings, tempos, and PSG (square wave) voices.

The Ant64 plays SMPS data natively through FireStorm's own synthesis engines — no chip emulation. The FireStorm FM engine is already a superset of what the YM2612 provides. SMPS is understood as a music format and its data mapped directly to FireStorm voice parameters. The result sounds like the original — or better, because FireStorm has no DAC ladder noise, no 8-bit depth limitation, and the full VA and effects chain available if wanted.


What SMPS Describes

SMPS files contain:

  Header
  ├─ Tempo (tick rate)
  ├─ Channel count (FM channels + PSG channels)
  ├─ Pointers to per-channel data blocks
  │
  Per FM channel data block:
  ├─ Instrument (FM voice) data:
  │   ├─ Algorithm (0–7) — operator routing
  │   ├─ Feedback level
  │   └─ Per-operator (4 operators):
  │       ├─ Multiple (frequency ratio)
  │       ├─ Detune
  │       ├─ Total level (volume/carrier level)
  │       ├─ Key scaling
  │       ├─ Attack rate
  │       ├─ First decay rate
  │       ├─ Second decay rate (sustain rate)
  │       ├─ Sustain level
  │       ├─ Release rate
  │       ├─ AM enable
  │       └─ SSG-EG (envelope generator shape)
  ├─ Note sequence (pitch + duration codes)
  ├─ Volume / pan settings
  ├─ LFO settings (rate, PM depth, AM depth)
  ├─ Modulation (vibrato, tremolo patterns)
  └─ Loop / repeat / jump commands
  │
  Per PSG channel data block:
  ├─ Note sequence (pitch + duration codes)
  ├─ Volume envelope pointer
  └─ Noise mode (channel 4 only: white / periodic)
  │
  DAC channel (channel 6, if used):
  └─ PCM sample index + playback commands

This is complete, self-contained music data. The synthesis parameters are all present in the file — there is nothing chip-specific about them. Operator ratios, envelope shapes, and note sequences are universal FM synthesis concepts. Algorithm 4 on a YM2612 is the same carrier/modulator topology as Algorithm 4 on any 4-operator FM engine.


Mapping SMPS to FireStorm

FM Channels (6 per SMPS file → FireStorm FM engine)

The YM2612 is a 4-operator, 8-algorithm FM engine with sine-only waveforms. FireStorm is 6-operator with free routing and 16 waveforms per operator — a proper superset. SMPS FM voices map directly:

  SMPS FM instrument           →   FireStorm FM voice
  ──────────────────────────────────────────────────
  Algorithm 0–7 (4-op routing) →   Operators 1–4, routing per algorithm
                                    Operators 5–6 unused (or add harmonics)
  Feedback level               →   Operator 1 self-modulation depth
  Operator multiple            →   Operator frequency ratio
  Operator detune              →   Fine detune (same concept)
  Total level                  →   Operator output level
  Key scaling                  →   Key scale rate
  Attack / decay / sustain     →   EG stage rates (same parameters)
  Release rate                 →   EG release
  AM enable                    →   LFO AM sensitivity
  SSG-EG                       →   Extended EG shape (FireStorm EG superset)

The 8 YM2612 algorithms map to FireStorm operator routing configurations:

  Algo 0: [1→2→3→4]           Serial stack (deepest FM modulation)
  Algo 1: [(1+2)→3→4]         Two carriers into 3
  Algo 2: [(1+(2→3))→4]       Mixed
  Algo 3: [((1→2)+3)→4]       Mixed
  Algo 4: [(1→2)+(3→4)]       Two pairs — FM bass + FM lead simultaneously
  Algo 5: [(1→(2+3+4))]       One modulator into three carriers
  Algo 6: [(1→2)+(3)+(4)]     One FM pair + two pure carriers
  Algo 7: [1+2+3+4]           All carriers — brightest, most additive

All 8 map exactly to FireStorm routing configurations. The additional operators (5, 6) can be left silent for faithful reproduction, or used to add harmonic richness beyond what the original could produce.

PSG Channels (3 square wave + 1 noise → FireStorm VA engine)

The SN76489 PSG provides three square wave oscillators and a noise channel. These map directly to FireStorm VA voices with the square/pulse waveform selected:

  SMPS PSG channel             →   FireStorm VA voice
  ──────────────────────────────────────────────────────
  Note pitch                   →   Phase accumulator frequency
  Volume envelope              →   VCA amplitude envelope
  Channel 4 white noise        →   VA noise oscillator
  Channel 4 periodic noise     →   VA oscillator at low frequency (buzzy pitch)

No filter or chorus added for default playback — clean square waves, faithfully reproducing the PSG character. Optional: VSA engine post-processing (slight LP filter to soften edges, or BitCrusher for authentic 4-bit volume steps).

DAC Channel (Channel 6 PCM → FireStorm Sample Engine)

SMPS Channel 6 switches between FM synthesis and 8-bit PCM playback — famously used for the electric bass slap in Sonic 1 and drum hits in many titles. PCM samples are stored in the SMPS data or referenced externally. These load directly into the FireStorm sample engine:

  SMPS DAC sample data (8-bit, ~8kHz)  →  FireStorm sample voice
  ├─ Upsampled to 48kHz (linear interpolation)
  ├─ Bit depth extended to 16-bit (dithered)
  └─ Played at original pitch via sample engine pitch tracking

Optional: the sample engine can apply the ladder filter and BBD chorus to DAC samples — the classic Mega Drive bass through a Juno chorus is an immediately recognisable and striking combination.


SMPS Format Variants

SMPS was extended and modified across Sega's library. The Pulse player handles the documented variants:

Variant Example games Notes
SMPS Z80 Sonic 1, Sonic 2, early MD Original — data structure well documented
SMPS/68k Sonic 3, S&K 68k-resident driver, same data model
SMPS2 Streets of Rage series Extended PSG envelopes, extra features
Clone Driver v2 Homebrew / fan games Community standard, fully documented

The community disassembly projects (S1/S2/S3 disassemblies, SMPS2ASM) produce clean, well-labelled SMPS data — these are ideal input for both the player and the editor.


Pulse SMPS Player Architecture

  SMPS file (from DBFS or SD card)
          │
          ▼
  Pulse SMPS parser (bare-metal, RP2350 @ 300MHz)
  ├─ Read header → channel count, tempo, channel pointers
  ├─ Load FM instrument tables into working memory
  ├─ Load note/duration/command sequences per channel
  └─ Initialise tick clock at specified tempo
          │
          ▼
  Pulse tick engine (same infrastructure as MOD and live code players)
  Per tick, per channel:
  ├─ Advance note pointer
  ├─ Decode note / rest / loop / jump commands
  ├─ Apply modulation (vibrato depth, tremolo)
  ├─ Apply volume / pan updates
  └─ Write voice parameters to FireStorm via QSPI
          │
          ▼
  FireStorm FM engine (channels 1–6)
  FireStorm VA engine (PSG channels 1–3 + noise)
  FireStorm Sample engine (DAC channel 6)
          │
          ▼
  Mix bus → WM8958 → audio output

The tick clock is shared with the MOD player and live coder — SMPS, MOD, and live code can run simultaneously, all locked to the same tempo reference.


SMPS Editor (Page V — Workstation App, SG2000 Big Core)

A dedicated workstation page for viewing, editing, and authoring SMPS data. The editor shows the music as it is — note sequences, FM instrument parameters, PSG envelopes — in a form the musician can understand and modify.

  ┌─────────────────────────────────────────────────────────────┐
  │  PAGE V — SMPS EDITOR                                       │
  │                                                             │
  │  File: green_hill.smps  Variant: SMPS Z80  [PLAY] [STOP]   │
  │                                                             │
  │  FM CH1  ████░░░░░░  Instrument 3  C4  tick 12/48          │
  │  FM CH2  ██████░░░░  Instrument 7  G3  tick 12/48          │
  │  FM CH3  ░░░░░░░░░░  (rest)                                 │
  │  FM CH4  ████░░░░░░  Instrument 1  E2  tick 06/48          │
  │  FM CH5  ██░░░░░░░░  Instrument 5  A3  tick 12/48          │
  │  FM CH6  ██████████  DAC sample 2  (bass slap)             │
  │  PSG 1   ████░░░░░░  C5  vol env 4                         │
  │  PSG 2   ░░░░░░░░░░  (rest)                                 │
  │  PSG 3   ██░░░░░░░░  G5  vol env 2                         │
  │  NOISE   ░░████░░░░  periodic  vol 10                       │
  │                                                             │
  │  [INSTRUMENTS]  [PATTERNS]  [SONG ORDER]  [EXPORT]         │
  └─────────────────────────────────────────────────────────────┘

Instrument editor: all 4-operator FM parameters for each SMPS instrument, with an algorithm diagram (same as Page F but constrained to 4-operator / 8-algorithm topology). Changes sent live to Pulse player while music is playing.

Pattern editor: note grid per channel — pitch, duration, volume, modulation commands. Familiar to anyone who has used a tracker.

Song editor: the SMPS channel data is structured as patterns with loop/jump commands — the song editor visualises this as an arrangement of segments.

Export: write modified data back as a valid SMPS binary — compatible with Mega Drive homebrew tools and emulators. Useful for fan composers and ROM hackers.

Convert to native: export the SMPS data as a native Ant64 FM patch set and piano roll sequence in Page R — converting Mega Drive music into a fully editable Ant64 arrangement with all native synthesis features available.


The Ant64 implements a live music coding interpreter in the tradition of Sonic Pi, TidalCycles, ORCA, and Foxdot — systems where music is written as code, evaluated while playing, and changes take effect at the next musical boundary without stopping or interrupting playback.

The key property that distinguishes live coding from conventional sequencing is temporal hot-swap: the musician edits and evaluates code while the music plays. The interpreter accepts the new definition, compiles it to an event stream, and slots it in at the next quantisation boundary — the next bar, the next 4 bars, or the next phrase, depending on configuration. No gap. No click. No restart.

Architecture

The system is split across three processors by role, with clear ownership at each level:

  SG2000 big core                SG2000 little core        Pulse RP2350
  ─────────────────              ──────────────────        ─────────────────────
  Workstation App (bare metal)   AntOS (OS duties)         Hard real-time engines
  ────────────────────────────   ─────────────────         ─────────────────────
  Page L — Live Coder editor     IPC / transfer API        Live code player
  ├─ Code editor (syntax hl)     ├─ Receives compiled      ├─ Expression evaluator
  ├─ Evaluation feedback         │  data from big core     ├─ Pattern → event stream
  ├─ Error display inline        ├─ Routes to Pulse        ├─ Tick clock
  ├─ Code history / versions     └─ OS / networking        ├─ Event queue
  └─ Output console                                        ├─ Quantised hot-swap
                                                           ├─ MIDI output
  MOD Editor                                               ├─ FireStorm regs
  ├─ Pattern / track view                                  │
  ├─ Sample browser                                        MOD player
  ├─ Instrument editor                                     ├─ Pattern sequencer
  └─ Song arranger                                         ├─ Effect processor
                                                           ├─ Sample mixer
                                                           └─ Paula emulation

Both editors live on the SG2000 big core, in the bare-metal workstation app alongside all other pages. Page L is the live coding editor. The MOD editor is a separate page (Page K — see below). The musician writes and edits everything here.

The little core handles OS duties only — AntOS networking, file management, debug server, gossip. Its role in the music pipeline is narrow and passive: it provides the API and IPC channel through which the big core's editors send compiled pattern data down to Pulse. It does not parse, evaluate, or schedule anything musical.

Pulse owns both player engines — the MOD player and the live code player run bare-metal on Pulse at 300MHz. Pulse receives compiled pattern data from the big core (via little core IPC), holds it, and performs the atomic hot-swap at the next quantisation boundary. Pulse is the only processor that touches the tick clock and event queue. Both engines share Pulse's scheduler infrastructure — the same tick clock, the same event queue, the same FireStorm register write path.

The Live Coding Language

A small, purpose-designed expression language — not a general scripting language. Concise enough to type live, expressive enough to describe complex rhythmic and harmonic structures. Inspired by the pattern notation of TidalCycles and the readability of Sonic Pi.

  -- Basic pattern: note sequence at specified intervals
  play [c4 e4 g4 c5] every 1 bar

  -- Euclidean rhythm with velocity variation
  drum kick every euclid(3, 8) vel [100 80 90]

  -- Conditional: alternate every 2 bars
  play [c4 e4] |> alt [g3 b3] every 2 bars

  -- Parameter modulation: filter cutoff swept over 4 bars
  sweep cutoff 200 4000 over 4 bars

  -- Polyrhythm: two patterns at different cycle lengths
  play [c4 d4 e4 f4] every 3 steps
  play [g3 a3]       every 2 steps

  -- Sample trigger from MOD player sample bank
  trig sample "snare_amiga" every euclid(5, 16)

  -- Reference a FireStorm voice by name and set parameters
  voice "bass" | filter 800 | res 0.7 | play [c2 c2 g1 c2]

  -- Route output to MIDI channel
  play [c4 e4 g4] on midi 1

Patterns evaluate to event streams. The interpreter resolves them relative to the current BPM and sends timestamped note-on/note-off and parameter events to Pulse's scheduler. Every evaluation is quantised — every 1 bar means the change takes effect at the next bar boundary, never mid-bar.

Hot-Swap Quantisation

  Musician edits code → hits Evaluate (Page L, big core)
          │
          ▼
  Big core serialises the expression to a compact binary form
  and passes it to little core via hardware mailbox IPC
          │
          ▼
  Little core (AntOS IPC API) routes the data to Pulse
  over hi-speed UART — its only role in this pipeline
          │
          ▼
  Pulse receives the compiled expression
  Pulse evaluator parses and generates the event stream
  (Pulse at 300MHz — typically < 2ms, well ahead of any bar boundary)
          │
          ▼
  Pulse holds the new stream in a pending slot
  continues playing the current active stream uninterrupted
          │
          ▼
  Next quantisation boundary (bar / phrase / configured interval)
          │
          ▼
  Pulse atomic swap: pending stream becomes active
  old stream discarded
  → No gap, no click, no restart — music continues uninterrupted

The quantisation interval is configurable: 1 beat, 1 bar, 2 bars, 4 bars, or manual (swap only when the musician explicitly triggers it). A tighter interval means faster response to edits; a wider interval means more musical coherence between changes.

Integration with Other Systems

The live coder is not isolated — it drives and interacts with the full audio system:

Integration Description
FireStorm voices Any voice by name — full VA/FM/sample engine access
MOD player Trigger MOD samples by name, sync to MOD BPM as clock source
303 engine Generate acid patterns procedurally — acid [c2 eb2 f2] slide [3,7]
Jog dials Any dial assignable as a live variable — tempo = dial_1 * 200
MIDI out Drive external hardware — DIN MIDI, USB MIDI
FFT / spectrogram Read live spectral data as input to pattern conditions
Euclidean rhythms First-class euclid(hits, steps, rotation) operator
Probability maybe(0.7) — 70% chance of triggering
Randomness rand, choose [...], shuffle [...] — seeded or free

Page L — Live Coder Editor (Workstation App, SG2000 Big Core)

A dedicated page in the workstation app. The musician writes and edits code here while the music plays. The editor renders to the HDMI output via the FireStorm rasteriser alongside any other active page.

  ┌────────────────────────────────────────────────────────┐
  │  PAGE L — LIVE CODER                                   │
  │                                                        │
  │  > play [c4 e4 g4 c5] every 1 bar                      │  ← active (green)
  │  > drum kick every euclid(3,8)                         │  ← active (green)
  │  > sweep cutoff 200 4000 over 4 bars_                  │  ← cursor
  │                                                        │
  │  [EVAL] at next bar  ·  BPM: 124  ·  Bar: 003.2        │
  │                                                        │
  │  ✓  pattern: c4 e4 g4 c5 — swapped at bar 003          │
  │  ✓  drum: euclid(3,8) — active                         │
  │  !  sweep: parse error — missing 'over'                │  ← error (amber)
  └────────────────────────────────────────────────────────┘

Features:

  • Syntax highlighting — notes, keywords, operators, voice names in distinct colours
  • Inline error display — errors shown on the relevant line, not a separate panel
  • Evaluation status — each active expression shown with its swap status
  • Bar/beat position counter — shows exactly where in the musical timeline the cursor is
  • Code history — previous evaluations stored, navigable with jog dial
  • Split view — two code buffers side by side, each evaluating independently
  • MUTE per expression — silence an active pattern without deleting it
  • SOLO — mute all other active patterns, play only the selected one

Music Generation

The Ant64 supports music generation at three independent tiers, each with different latency, capability, and network dependency characteristics. All three produce the same output: note and pattern data in the same format as hand-edited content. Generated data flows into the editors — Page L, Page K, or Page R — where the musician can inspect, edit, reject, or accept it before it plays. Generation is a source, not an override.

  Generation tier
  ───────────────
  Tier 1: Algorithmic (big core, instant, offline)   ──┐
  Tier 2: TPU inference (SG2000, local, fast)        ──┼──→  Note / pattern data
  Tier 3: AI API (little core async, network)        ──┘     (same format as edited)
                                                                      │
                                                                      ▼
                                                           Page L — live coder
                                                           Page K — MOD editor
                                                           Page R — piano roll
                                                                      │
                                                                      ▼
                                                                Pulse player

Tier 1 — Algorithmic Generation (SG2000 Big Core)

Deterministic mathematical algorithms running bare-metal on the big core. Instant output, no network, no model loading. These are the tools of formal and stochastic composition — the same methods used by Xenakis, Messiaen, Steve Reich, and the demoscene tracker community.


Euclidean Rhythms (Bjorklund's Algorithm)

Distribute N hits across M steps as evenly as possible. The resulting patterns correspond precisely to the core rhythmic patterns of every major world music tradition — they emerge from small integer ratios, not from cultural convention.

  euclid(2, 8)  = [x . . . x . . .]  ← Habanera / basic clave
  euclid(3, 8)  = [x . . x . . x .]  ← Afro-Cuban clave, tresillo
  euclid(4, 8)  = [x . x . x . x .]  ← Standard four-on-the-floor (even)
  euclid(5, 8)  = [x . x x . x x .]  ← Quintillo (Cuban)
  euclid(3, 16) = [x . . . . x . . . . . x . . . .]  ← Sparse kick pattern
  euclid(5, 12) = [x . . x . x . . x . x .]  ← Yoruba bell pattern (Bembé)
  euclid(7, 12) = [x . x x . x . x x . x .]  ← Bembé variation
  euclid(9, 16) = [x . x x . x . x . x x . x . x .]  ← Complex polyrhythm

The rotation parameter shifts the pattern by N steps — same hits, different downbeat: euclid(3, 8, rotation: 2) starts two positions into the pattern, changing the phrasing without changing the density.

First-class syntax in the live coder: drum kick every euclid(3, 8)


Markov Chains

Build a transition probability matrix from an existing sequence — a melody, a chord progression, a rhythm — and generate new sequences with the same statistical character. The generated output has the same "feel" as the input without repeating it literally.

  Training sequence (input):  C4 E4 G4 E4 C4 D4 F4 A4 F4 D4 ...
                                         │
                              Build transition matrix:
                              From C4 → E4 (0.6), D4 (0.3), G4 (0.1)
                              From E4 → G4 (0.5), C4 (0.3), D4 (0.2)
                              From G4 → E4 (0.7), A4 (0.2), C5 (0.1)
                                         │
                              Generate new sequence:
                              C4 → E4 → G4 → A4 → F4 → D4 → F4 → E4 ...
                              (statistically similar, not identical)

Order of the Markov chain determines how much context is used:

  • Order 1: each note depends only on the previous note — loose, improvisational feel
  • Order 2: each note depends on the previous two — more phrase coherence
  • Order 3+: longer-range dependencies — closer to the training material's style

Training material can be: a melody played live into the sequencer, a loaded MIDI file, a MOD module's pattern data, or a manually entered note sequence in Page R.


L-Systems (Lindenmayer Systems)

Recursive string rewriting rules that produce self-similar, fractal-like structures. Originally developed to model plant growth, they produce musical phrases with natural hierarchical structure — phrases within phrases within phrases, all related by the same generative rule.

  Simple melodic L-system:
  Axiom:  A
  Rules:  A → A B A       (A expands to three elements)
          B → B A B       (B expands to three elements)
  Depth 1: A B A          (3 notes)
  Depth 2: ABA BAB ABA    (9 notes)
  Depth 3: (27 notes — self-similar phrase structure at 3 levels)

  Map symbols to musical elements:
  A = root note (C4), duration = 1 beat
  B = fifth (G4),  duration = 0.5 beat
  → Generates a self-similar melody with Fibonacci-length phrases

More complex rules can encode pitch, duration, dynamics, and articulation separately. L-systems produce the kind of recursive phrase structure found in Bach and in generative ambient music — coherent at multiple timescales simultaneously.


Cellular Automata

A grid of cells, each alive or dead, updated each step by local neighbourhood rules. Musical mapping: each row = one time step, each column = one pitch or drum voice. Live cells = note triggers. The evolution of the grid becomes the evolution of the music.

  Wolfram Rule 110 (computationally universal — generates complex non-repeating patterns):

  Step 0:  . . . . . . . . X . . . . . . .   ← single seed
  Step 1:  . . . . . . . X X . . . . . . .
  Step 2:  . . . . . . X X . . . . . . . .
  Step 3:  . . . . . X X X . . . . . . . .
  Step 4:  . . . . X X . X . . . . . . . .
  ...      (continues, never exactly repeating)

  Map to drum grid: X = hit, . = rest
  16 columns = 16 pitches or drum voices
  Each row = one bar step
  → Evolving, non-repeating rhythmic pattern from a single starting cell

Conway's Life mapped to a piano roll: stable structures (still lifes) = sustained chords; oscillators = repeating rhythmic figures; gliders = melodic lines that travel across the pitch space over time.

The seed pattern and the rule number are the only inputs. Different seeds with the same rule produce related but distinct patterns — a generative variation system with mathematical coherence.


Functional Harmony Generator

Generates chord progressions using tonal harmony rules: Roman numeral grammar, voice leading constraints, tension/resolution patterns, secondary dominants, borrowed chords, modal interchange.

  Key: C major
  Style: jazz (allows substitutions, extensions)

  Generate:
  I Maj7  →  VI m7  →  II m7  →  V7  →  I Maj7   ← basic ii-V-I with turnaround
  I Maj7  →  ♭VII7  →  IV Maj7  →  I               ← modal interchange (Mixolydian ♭VII)
  I Maj9  →  ♯IV ø7 →  IV Maj7  →  III7  →  VI7    ← tritone substitution chain

  Voice leading:
  ├─ Common tones held across chord changes where possible
  ├─ Contrary motion preferred over parallel motion
  ├─ No parallel fifths (classical) / allowed (jazz)
  └─ Voice range constraints per part (SATB or lead+bass)

Output: chord symbols + voice-led individual note sequences, ready to feed into the piano roll or live coder.


Stochastic / Probabilistic Generation

Xenakis formalised the use of probability distributions in musical composition. The Ant64 implements several:

Distribution Characteristic Musical use
Gaussian Bell curve around a centre Pitch clusters around a tonic; velocity variation around a target level
Poisson Event frequency over time Note density — sparse or dense passages with statistical consistency
Random walk Each step ±Δ from previous Melodic lines that wander plausibly — not random jumps
Brownian (1/f²) Slower drift than random walk Slow harmonic movement, pad evolution
Pink noise (1/f) Statistics match real music Rhythmic and melodic sequences whose density variation matches natural music
Cauchy Heavy-tailed — occasional large jumps Surprising melodic leaps in otherwise stepwise lines

Pink noise (1/f) is particularly significant: the amplitude spectrum of real music falls as approximately 1/f. Sequences generated with 1/f statistics are statistically indistinguishable from real music at the macro level — they have the same distribution of phrase lengths, interval sizes, and dynamic variation.


Fractals

Self-similar structures at multiple timescales. The Mandelbrot and Julia sets map to pitch and duration via boundary proximity. The dragon curve and Koch snowflake produce rhythmic patterns with fractal self-similarity.

More musically direct: fractal melody using the midpoint displacement algorithm. Start with two notes, recursively insert midpoints with a random offset that halves each recursion — produces a melody that is smooth at large scales but detailed at small scales. Sounds like natural melodic improvisation.


Tier 2 — Local Neural Inference (SG2000 TPU)

The SG2000 contains a 0.5 TOPS INT8 tensor processing unit — sufficient for running small trained neural networks locally with no network dependency.

What fits in 0.5 TOPS INT8:

Model type Use Notes
Small LSTM (1–2 layers, 128–256 hidden) Melody continuation Trained on a corpus — continues a started phrase in the same style
Tiny Transformer (2–4 heads, 64–128 dim) Chord suggestion Given a melody fragment, suggest appropriate harmony
Style classifier Genre/mood detection Classify the current playing style, feed back to generation
Groove quantiser Humanise timing Learned timing offsets per beat position — makes quantised patterns feel live

The TPU runs inference on the SG2000's big core side — a model is loaded from DBFS, given a context window (the last N notes or bars), and returns a continuation. Inference time for a small LSTM on a few hundred tokens is well under 100ms — fast enough to generate the next bar before Pulse needs it.

Model distribution via gossip network: trained model files are compact (LSTM ~500KB, tiny transformer ~2MB) and shareable over the Ant64 mesh. Musicians can share style models the same way they share patches.


Tier 3 — External AI Generation (AntOS AI Library, Little Core)

The AntOS AI library provides async API access to external large language and music generation models over WiFi or Ethernet. The little core handles the network request; the big core sent the prompt and receives the result; the result is queued as pattern data and plays when it arrives — latency is irrelevant because playback is quantised.

Supported API targets (configurable):

Service Generation type Notes
OpenAI (GPT-4o / o1) Natural language → note data Prompt in plain English, result parsed to events
Anthropic (Claude) Same Structured music generation via system prompt
Google (Gemini) Same Also supports music description and analysis
MusicGen / AudioCraft Audio generation Result → Page S sample, not note data
ABC Notation endpoints Score generation ABC notation → note events — clean round-trip
Custom endpoint Configurable Point to any compatible API

The request/response cycle:

  Musician types prompt on Page L or a dedicated generation panel
  (e.g. "16-bar walking bass line in Cm, jazz feel, medium tempo")
          │
          ▼
  Big core formats the request (note data format spec in system prompt)
  Passes to little core via mailbox IPC
          │
          ▼
  Little core (AntOS AI library) makes the API call over WiFi/Ethernet
  Async — does not block the big core or Pulse
          │
          ▼
  Response received — little core parses to note event format
  Passes result to big core via mailbox IPC
          │
          ▼
  Big core inserts result into Page L / Page K / Page R for review
  Musician can edit, transpose, truncate, loop before committing
          │
          ▼
  Commit → Pulse player queue → plays at next quantisation boundary

Natural language generation examples (live coder syntax):

  -- Ask AI for a pattern (async — plays when response arrives)
  play ai("funky 16th-note bassline in E minor, syncopated")

  -- With style reference
  play ai("melody in the style of Boards of Canada, melancholic, C major")

  -- Rhythmic only
  drum ai("complex polyrhythmic pattern, 5 against 4, kick and hi-hat")

  -- Constrained — must fit current harmony
  play ai("fill in the gaps of this melody", context: current_bar)

  -- Chord progression
  chords ai("jazz reharmonisation of a I-IV-V in Bb")

The AI library formats a structured system prompt that specifies the output format (note name, octave, duration, velocity as JSON or a compact domain-specific format), ensuring the response can be reliably parsed to playable event data. The musician never sees the raw API response — only the musical result in the editor.

Offline graceful degradation: if no network is available, Tier 3 calls return an error with a suggestion to use a Tier 1 or Tier 2 equivalent. The system never hangs waiting for a network response — the timeout is configurable and short.


The Generative Feedback Loop

The three tiers combine with the spectral analysis pipeline to create a closed generative loop — generate, analyse, refine, regenerate:

  Tier 1/2/3 generation
          │
          ▼
  Pattern data → Pulse plays it
          │
          ▼
  Audio output → Page D spectrogram (live input mode)
          │
          ▼
  Spectral analysis → harmonic content extracted
          │
          ├──→  Feed harmonic data back as constraint to Tier 1
          │     ("generate variations that preserve these harmonics")
          │
          ├──→  SEND to Page H → edit harmonics → IFFT → new wavetable
          │     (the generated music reshapes the synthesis voice)
          │
          └──→  Feed to AI prompt as musical context (Tier 3)
                ("here is the spectral analysis of what I just played —
                  generate a complementary melody")

Generated music becomes input to synthesis source creation, which changes the timbre of the instrument that plays the next generation pass. Each cycle can produce something genuinely novel without any repetition.


Generation Language Extensions (Live Coder)

The live coder syntax gains first-class generation operators:

  -- Euclidean rhythm
  drum kick every euclid(3, 8)
  drum snare every euclid(2, 8, rotation: 4)

  -- Markov chain continuation from a seed sequence
  play markov(seed: [c4 e4 g4 e4 c4], order: 2, steps: 16)

  -- L-system melody
  play lsystem(axiom: "A", rules: {A: "ABA", B: "BAB"}, depth: 3, map: {A: c4, B: g4})

  -- Cellular automaton rhythm (Rule 110, 16-column drum grid)
  drum ca(rule: 110, seed: current_bar, voices: [kick snare hihat])

  -- Harmonic progression (functional harmony generator)
  chords harmony(key: "Cm", style: "jazz", bars: 8)

  -- Stochastic melody (Gaussian pitch around tonic, Poisson density)
  play stochastic(centre: c4, spread: 5, density: 0.6, steps: 16)

  -- Pink noise sequence (1/f statistics)
  play pink(root: c4, scale: "minor", steps: 32)

  -- Random walk melody
  play walk(start: c4, step: 2, steps: 16, scale: "dorian")

  -- Local TPU model continuation
  play tpu(model: "blues_lstm", context: last_bars(4), steps: 8)

  -- External AI (async, plays when ready)
  play ai("melodic fill, 2 bars, match current harmony")

  -- Hybrid: generate with algorithm, humanise with TPU groove model
  play euclid(5, 16) |> humanise(model: "jazz_groove")

All generation operators produce event streams in the same format as hand-typed patterns. They can be piped through any live coder operator — transpose, reverse, stretch, filter, humanise — regardless of how they were generated.


Page G — Music Generator (Workstation App, SG2000 Big Core)

A dedicated workstation page for generation — separate from Page L (live coding) so the generation workflow has its own space without cluttering the code editor.

  ┌────────────────────────────────────────────────────────────┐
  │  PAGE G — MUSIC GENERATOR                                  │
  │                                                            │
  │  MODE: [Algorithmic ▼]  KEY: [Cm]  BARS: [8]  BPM: [124]   │
  │                                                            │
  │  Algorithm  ○ Euclidean  ● Markov  ○ L-System  ○ CA        │
  │             ○ Harmony    ○ Stoch.  ○ Pink noise            │
  │                                                            │
  │  Markov order: [2]   Seed: [current selection in Page R]   │
  │  Steps: [16]         Scale: [natural minor]                │
  │                                                            │
  │  [GENERATE]   [PREVIEW]   [SEND TO PAGE R]   [SEND TO L]   │
  │                                                            │
  │  AI PROMPT ─────────────────────────────────────────────   │
  │  > funky bassline in Cm, 8 bars, syncopated_               │
  │  [ASK AI]  Status: waiting for response...                 │
  │                                                            │
  │  TPU MODEL: [blues_lstm ▼]  Context: [last 4 bars]         │
  │  [CONTINUE]                                                │
  └────────────────────────────────────────────────────────────┘

All three tiers accessible from one page. Generated output previews in a small piano roll at the bottom of the panel before committing. SEND TO PAGE R inserts into the arrangement. SEND TO L inserts as a live coder expression.

Page G is added to the workstation page list alongside A, D, K, L, W, H, S, R, E, F, M.


Video System — Light Synth

The Ant64 has integrated audio-reactive video output — a light synth driven by the same synthesis engine data that produces the audio. Four modes: audio-reactive visualiser (waveform, FFT, Lissajous), synth parameter visualiser (each voice rendered as a visual element), generative light synth (MIDI/sequencer events drive procedural visuals), and VJ tool (MIDI-triggered clip playback with parameter control).

All audio system parameters are exposed as input data to the video system. Voice pitch, envelope state, LFO modulation, filter cutoff, and note velocity all map to visual parameters — position, colour, size, motion, brightness. The light synth treats video the way the audio engine treats sound: parametric, generative, driven from synthesis data.

Full light synth documentation — modes, synthesis analogy table, scripting API, and hardware output specifications — is in the Display Architecture reference.

The audio system's role is to make its live data (voice states, FFT data, MIDI events, sequencer position) available to the video system via the AntOS scripting bindings. The display hardware (FireStorm compositor, output clocking, layer system) is entirely the display system's concern.


Complete System Overview

Pulling all capabilities together, the Ant64 is not a synthesizer with extras — it is a complete audio-visual instrument and MIDI hub.

┌──────────────────────────────────────────────────────────────────────────────┐
│                                 ANT64                                        │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐     │
│  │                       SG2002 / SG2000 SoC                           │     │
│  │  ┌──────────────────────────┐   ┌───────────────────────────────┐   │     │
│  │  │  BIG CORE (C906 @ 1GHz)  │   │  LITTLE CORE (C906 @ 700MHz)  │   │     │
│  │  │  Bare metal · no OS      │   │  AntOS · Luau · DBFS          │   │     │
│  │  │  Music Workstation App   │◄─►│  Gossip · Shell · File mgmt   │   │     │
│  │  │  C++ · Dear ImGui        │   │  Hardware Mailbox IPC         │   │     │
│  │  └────────────┬─────────────┘   └───────────────────────────────┘   │     │
│  │               │ QSPI (NOR mode) — control plane                     │     │
│  │               │ MIPI 4-lane × 2 — data plane (future)               │     │
│  └───────────────┼─────────────────────────────────────────────────────┘     │
│                  │                  ▲ SPI slave                              │
│  ┌───────────────▼──────────────────┴──────────────────────────────────┐     │
│  │                         FireStorm (FPGA)                            │     │
│  │    GoWin 138k (Ant64/Ant64C)  ·  GoWin 60k (Ant64S)                 │     │
│  │                                                                     │     │
│  │  FireStorm EE  ·  Audio DSP 128+ voices  ·  2D Rasterizer           │     │
│  │  Hard RISC-V debug core [Ant64/Ant64C only — GoWin 138k]            │     │
│  │  VA · FM · Sample · Granular · Filters · BBD · Effects              │     │
│  │  Fixed-point · No float  ·  HDMI/VGA/(DisplayPort Ant64C)           │     │
│  │  WM8958/WM8960 audio codec (direct) · Optical out                   │     │
│  │  → 2× internal stereo speakers · Phono out · HDMI audio             │     │
│  │                                                                     │     │
│  │  SRAM A (36-bit) — audio  ·  SRAM B (36-bit) — rasterizer           │     │
│  │  DDR3 (32-bit, 1/2GB) — samples/wavetables  [Ant64/Ant64C]          │     │
│  │  8MB embedded PSRAM [Ant64S]                                        │     │
│  └──┬──────────────────────────────────────────┬───────────────────────┘     │
│     │ QSPI — register writes / control           │ JTAG (debug/program)      │
│     │ 4-line PIO — fast bulk data transfer      │                            │
│  ┌──▼──────────────────────────────┐  ┌─────────▼──────────────────────┐     │
│  │        Pulse RP2350             │  │        DeMon RP2350            │     │
│  │  MIDI · Joypads · Jog dials     │  │  System supervisor             │     │
│  │  Audio sequencer · SAM speech   │  │  JTAG → SG2000 + FireStorm     │     │
│  │  4× 3.5mm trigger/CV inputs     │  │                                │     │
│  │  USB MIDI host+device           │  │  QSPI → FireStorm              │     │
│  │  DIN MIDI (Ant64C)              │  │  SPI slave ← SG2000            │     │
│  │  8× RGB jog dials (WS2812B)     │  │  Hi-speed UART ↔ SG2000        │     │
│  │  Hi-speed UART ↔ SG2000         │  │  SPI master → Pulse            │     │
│  │  SPI slave ← DeMon              │  │  Boot · Watchdog · Debug       │     │
│  │                                 │  │  WiFi/BT (ESP-C5) · RTC        │     │
│  └─────────────────────────────────┘  └────────────────────────────────┘     │
│                                                                              │
│  OUTPUTS: 2× internal stereo speakers · Phono out · Optical audio            │
│           HDMI/VGA/(DP Ant64C) · DIN MIDI (Ant64C) · USB MIDI                │
│  INPUTS:  Stereo audio (phono) · DIN MIDI (Ant64C) · USB MIDI · Keyboard     │
└──────────────────────────────────────────────────────────────────────────────┘

How Does It Compare? Is There Anything Like It?

Short answer: No. Nothing like it exists or has existed.

The Ant64 occupies a category of one. To understand why, consider what you would need to buy to match its combined capabilities in 2026:

Capability Best dedicated hardware Price
128-voice VA + WT synth Waldorf Kyra ~€1,800 (discont.)
Full FM synthesis (6-op, DX7 compat) Yamaha Montage M ~€4,500
Live sampling + S&S engine Sequential Prophet X ~€3,800
Live granular input Waldorf Quantum MK2 ~€4,800 (discont.)
Multi-track hardware sequencer Squarp Pyramid MK3 ~€700
DIN MIDI In/Out/Thru hub iConnectivity mioXL ~€400
Video synthesizer / visualiser Critter & Guitari EYESY ~€500
303 acid engine Roland TB-03 ~€350
Total 8 separate devices ~€16,850+

And that stack still wouldn't have:

  • All three synthesis paradigms layerable per voice
  • RGB performance UI on the synth controls
  • DX7 sysex import
  • Audio-reactive video tied directly to the synthesis engine
  • An open, hackable FPGA DSP layer
  • A unified OS (AntOS) tying everything together with a scripting language

The Closest Historical Precedents

Fairlight CMI (1979–85) — combined sampler + synthesis + sequencer + video display. Cost £20,000–£50,000. The Ant64 is architecturally more capable in synthesis depth and has video output where the Fairlight had only a CRT display.

Synclavier (1975–92) — FM + sampling + sequencer, professional studio standard. Cost $200,000+. The Ant64 has comparable synthesis capability in a hobby platform.

Edirol/Roland CG-8 Visual Synthesizer (2003) — combined MIDI-controlled audio and video synthesis. Discontinued. No audio synthesis — it was a video processor only. No current equivalent exists.

Conclusion: The Ant64 at full spec is the first hobbyist-accessible device to combine professional-grade polyphonic synthesis (all paradigms), a hardware sequencer, full MIDI connectivity, live sampling, and integrated video synthesis in a single open platform. The commercial equivalent does not exist in 2026.


Visual Editing System — Fairlight-Inspired

The Fairlight CMI's defining characteristic was not just its sound but its visual interface — a light pen on a green CRT that let you literally draw sounds, compose sequences on a grid, and sculpt waveforms by hand. In 1979–85 this cost £30,000. The Ant64 implements the same paradigm as a dedicated native C++ application — the Ant64 Music Workstation App — using Dear ImGui as the UI framework, rendered to the HDMI video output and navigated via the 8 jog dials plus mouse or stylus input.

This is a full standalone bare-metal C++ application on the SG2000 big core, using Dear ImGui as the UI framework, rendered to the HDMI video output via the FireStorm rasteriser, navigated via the 8 jog dials plus mouse or stylus. It communicates with FireStorm, Pulse, and DBFS over defined IPC interfaces. Think of it the way a DAW relates to an OS — it runs on the platform but is its own substantial piece of software with direct hardware access and no OS overhead.

The Fairlight's Pages — and Their Ant64 Equivalents

Fairlight page Function Ant64 equivalent
Page 4 32 harmonic amplitude sliders Page H — Harmonic Editor
Page 5 Per-harmonic envelope profiles Page H ENV view
Page 6 Freehand waveform drawing Page W — Waveform Drawing
Page 8 Sample recording and display Page S — Sample Editor
Page D 3D spectral waterfall (STFT) Page D — Spectral Waterfall (extended)
Page R Graphical grid sequencer Page R — Piano Roll Sequencer
(none) Envelope ADSR visual editor Page E — Envelope Editor
(none) FM algorithm node graph Page F — FM Algorithm Editor
(none) Modulation matrix visual Page M — Mod Matrix Editor
(none) Phasor / DFT decomposition Page A — Audio Analyser (new)
(none) Live music coding editor Page L — Live Coder (new)
(none) MOD / tracker editor Page K — MOD Editor (new)
(none) SMPS / VGM editor Page V — SMPS Editor (new)
(none) Music generation Page G — Music Generator (new)

Input Devices for Visual Editing

The Fairlight used a light pen — a stylus held against the CRT that detected the electron beam position. The Ant64 equivalents:

Input Role
Mouse (USB, via Pulse) Primary cursor control — point, click, drag to draw
8 jog dials Dial 1 = X cursor; Dial 2 = Y/amplitude; Dial 3–8 = context parameters; push = confirm/select
QWERTY keyboard (USB) Command entry (Fairlight-style: type SAW, TRI, SIN)
Stylus tablet (USB, optional) Pressure-sensitive drawing — stroke pressure → waveform amplitude
MIDI keyboard Enter notes in sequencer grid by playing them live

The jog-dial-as-cursor approach is particularly natural for waveform editing: Dial 1 steps through sample points, Dial 2 adjusts amplitude — fully navigable without a mouse if preferred. Exactly the tactile feel the Fairlight was going for.


Page W — Waveform Drawing (Fairlight Page 6)

Draw and sculpt a waveform cycle directly on screen. 256 points per cycle, amplitude range −128 to +127. Changes heard immediately — FireStorm updates within one frame.

Drawing modes:

  • DRAW — freehand: drag cursor to paint amplitude continuously
  • JOIN — each point joins the last with a straight line (Fairlight default)
  • PLOT — set individual points without affecting neighbours

Macro waveforms — type or button-press fills current segment: SAW · SQ n (square, pulse width n) · TRI · SIN · NOISE

Transform operations:

  • INV — invert vertically (flip around zero)
  • REV — reverse horizontally (mirror in time)
  • SQZ — squeeze (compress amplitude toward zero — soften the wave)
  • MRG — merge: interpolate all segments between two defined endpoints
  • MIX — blend two waveforms at a user ratio (crossfade between timbres)
  • CPY — copy segment to a range of segments

128-segment system — matching the Fairlight exactly:

  • The full sound is divided into 128 time segments, each with its own waveform
  • The sound evolves through all 128 as it plays — this is how you get organic, living timbres that change over time rather than a static waveform looping
  • Draw segment 1 as a bright sawtooth, segment 64 as a sine, MRG between them: the sound smoothly morphs from harsh to pure over its duration
  • This is the Fairlight's characteristic evolving pad / orchestral hit sound

Live preview[PLAY] triggers the voice immediately on any keypress so you hear your drawn waveform in context as you work.


Page H — Harmonic Editor (Fairlight Page 4/5)

Additive synthesis through visual harmonic control. 32 vertical sliders — one per harmonic partial — showing amplitude. Drag to sculpt the frequency spectrum directly.

Views:

  • AMP — harmonic amplitudes (the fundamental timbral shape)
  • PHASE — phase offset per partial (subtle textural effect)
  • ENV — per-harmonic envelope decay rate (each partial fades at its own speed)

Per-harmonic envelopes are the Fairlight's secret for convincing acoustic sounds. High harmonics in a real piano decay faster than low ones. Draw steeper ENV curves for H8–H32 and flatter ones for H1–H4 — the result sounds like a real instrument's natural harmonic decay, not a filter envelope smearing everything together.

[→ PAGE W] converts the current harmonic profile to a waveform via IFFT — bridges the additive synthesis view and the waveform drawing view.


Page S — Sample Editor (Fairlight Page 8/D)

Full visual waveform display for recorded samples. Scrollable, zoomable, with all standard sample editing operations controlled graphically.

Visual controls:

  • Drag START, END, LOOP S, LOOP E markers along the waveform
  • Shaded region shows the active loop with crossfade zone highlighted
  • Zoom from full sample view down to individual sample cycles for precision loop setting

Operations:

  • TRIM — remove audio outside markers · NORM — normalise to full scale
  • REVERSE — flip in time · X-FADE — set loop crossfade length (0–100ms)
  • RECORD — live waveform display as audio is captured — watch it draw itself in

All changes feed directly to FireStorm sample parameters. Loop crossfade length maps to the hardware crossfade DSP block in real time — no bounce/reload needed.


Page R — Graphical Sequencer (Fairlight Page R)

The piano roll. Click to place notes, drag to resize them, sweep to paint rhythms. The direct ancestor of every DAW piano roll in use today — the Ant64 has the original paradigm, hardware-native.

Grid editing:

  • Click empty cell → place note · Drag right → extend length · Right-click → delete
  • Hold and sweep across cells → paint multiple notes in one gesture
  • MIDI keyboard input → notes appear at current step position in real time

Per-step detail (zoom in):

  • Velocity shown as vertical fill within the cell
  • Gate length shown as horizontal fill
  • Probability shown as partial transparency (50% = 50% chance on each cycle)
  • Parameter lock shown as a coloured dot (per-step timbre/filter/pitch override)

8 tracks, each independently configurable:

  • Track type: melodic · drum · CC automation · 303 acid
  • Track voice: any FireStorm voice, DIN MIDI channel, or USB MIDI channel
  • Track step count: 1–64 independent per track (polyrhythm built-in)

Song editor — patterns chain into a full arrangement: Patterns displayed as blocks in a timeline; drag to reorder, double-click to edit.


Page E — Envelope Editor

Visual multi-stage envelope editing. Drag breakpoints with mouse or navigate with Dial 1/2. All four envelopes per voice displayed simultaneously for comparison. ENV4 (8-stage loopable) shows all 8 breakpoints; drag to reshape complex LFO-like envelopes. Changes update FireStorm envelope parameters in real time.


Page F — FM Algorithm Editor

Visual node graph for FM operator routing. Each of the 8 operators is a box; modulation connections are arrows. Carriers (→ audio output) shown in green; modulators in amber.

Drag from one operator to another to create or destroy a modulation path. Click an operator to expand its parameters inline: frequency ratio, output level, envelope shape (mini-display), feedback amount.

Supports all 32 DX7 algorithms displayed as the original Yamaha diagrams, plus completely free routing beyond anything DX7 offered — any operator can modulate any other, including chains, stacks, and feedback loops. DX7 .syx import populates the node graph automatically.


Page M — Modulation Matrix Editor

All 64 modulation slots displayed as a visual grid. Each active slot shown as a line: source on left, destination on right, line thickness proportional to modulation amount. Active modulators animate — thickness pulses with the modulator's live value so you can see every LFO and envelope moving in the display while the patch plays.


Page A — Audio Analyser (Phasor / DFT Decomposition)

Inspired by Sebastian Lague's Fourier transform visualisation. A single FFT frame of audio decomposed into individual rotating phasors — one circle per frequency component. Each circle's radius equals the component's amplitude; it rotates at that component's frequency. The tip of the final (outermost) phasor traces the reconstructed waveform in real time. This is the geometric intuition behind the Fourier transform made visible and interactive.

        ┌──────────────────────────────────────────────────────┐
        │  PAGE A — PHASOR DECOMPOSITION                       │
        │                                                      │
        │         ○──────────────────── fundamental            │
        │         │  ○──────────── 2nd harmonic                │
        │         │  │  ○──── 3rd harmonic                     │
        │         │  │  │  ● tip traces waveform               │
        │         │  │  │                                      │
        │  [amplitude rings rotating at each frequency]        │
        │                                                      │
        │  Right panel: reconstructed waveform (sum of all)    │
        │  Bottom: amplitude bars per harmonic (like Page H)   │
        └──────────────────────────────────────────────────────┘

Analysis engine (SG2000 big core, C906 @ 1GHz):

  • Cooley-Tukey FFT, 4096-point, Hann windowed
  • At 44.1kHz: ~10.8Hz per bin — resolves individual harmonics of notes above ~20Hz
  • Fundamental detection: autocorrelation or harmonic product spectrum
  • Output: N amplitude + phase values for the detected harmonic series

Input sources (selectable):

  • Loaded sample (from Page S)
  • Live ADC input (mic or line in via WM8958)
  • Resample-own-output — analyse the live output of any FireStorm voice or the full mix
  • Manual FFT on any memory buffer

Rendering (FireStorm rasteriser):

  • Phasor circles: each drawn as an unfilled circle (radius = amplitude), endpoint dot
  • Chain rendered sequentially — innermost (fundamental) first, tip of each feeds the next circle's centre
  • Reconstructed waveform traced in a contrasting colour at the right
  • Amplitude bars along the bottom mirror Page H (same data, different view)
  • All rendered by FireStorm 2D rasteriser from a draw list built by the CPU — zero pixel work on the application processor

Key interaction:

  • Jog dial 1: scrub through the input audio — phasors update per frame
  • Jog dial 2: zoom into a frequency range
  • SEND command (or dedicated button): push current harmonic analysis directly to Page H — populates amplitude and phase sliders automatically
  • FREEZE command: locks phasors at current frame for inspection

The resynthesis pipeline — closing the loop:

Live audio or sample
        ↓
    [FFT — Page A]
        ↓  SEND
    [Page H — edit amplitude, phase, per-harmonic envelope]
        ↓  IFFT
    [Page W — resulting waveform, editable further]
        ↓
    FireStorm wavetable playback (128–256 voices)

This is a genuine resynthesis workflow. The musician analyses a real-world sound, the Ant64 decomposes it to its harmonic content, they edit the harmonics by hand, and the result plays back as a synthesised voice. Nothing else in the current instrument market offers this as a first-class built-in workflow.


Page D — Spectral Analysis (Spectrogram, Waterfall, Spectrum)

Page D is the Ant64's full spectral analysis page. All four display modes draw from the same underlying Short-Time Fourier Transform (STFT) engine — they are different projections of identical data, switchable at any time without recomputing the analysis.

The spectrogram is the default and primary view: the most legible, the most useful for practical sound design, and the form used in professional audio analysis tools worldwide. The 3D waterfall is available as an alternate mode — a homage to the Fairlight CMI's iconic display, now running at 60fps with filled shading rather than the original's slow wireframe.


Mode 1 — Spectrogram (Default)

Time runs left to right, frequency runs bottom to top, amplitude is encoded as colour intensity. The entire life of a sound is visible at once as a 2D heat map. This is what audio engineers, researchers, and linguists actually use for analysis work.

  Freq │
  20k  │  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
  10k  │  ░░░░░░▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
   5k  │  ░░░░░░████░░░░░░░░░░░░░░░░▓░░░░░░░░░░░░░░░░░
   2k  │  ░░░░░░████▓░░░░░░░░░░░░░░░█░░░░░░░░░░░░░░░░░  ← formant
   1k  │  ░░░░░░████████▓▓░░░░░░░░░░█░░░░░░░░░░░░░░░░░
  500  │  ░░░░░████████████▓░░░░░░░░██▓░░░░░░░░░░░░░░░
  200  │  ░░░▓█████████████████▓▓░░░███████▓░░░░░░░░░░
  100  │  ░░░███████████████████████████████████▓░░░░░  ← fundamental
   20  │  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
       └──────────────────────────────────────────────→ Time
          ↑ attack                         ↑ tail off

Colour scale (cool → warm with amplitude):

  black → deep blue → cyan → green → yellow → orange → red → white
  silent                                                      loud

What becomes immediately visible:

Feature What you see
Attack transient Bright vertical flash across all frequencies at note onset
Harmonic series Horizontal bands at fundamental + partials, evenly spaced in log view
Inharmonicity Slight stretching of partials — visible in piano, bells, metal
Formants Broad bright horizontal bands — distinguish vowel sounds in voice
Vibrato / pitch drift Horizontal bands that waver — out-of-tune recording visible immediately
Filter sweep Bright energy moving up or down the frequency axis over time
Loop discontinuity Smeared vertical burst at the loop point — find the click
Noise floor Diffuse low-level colour across all frequencies
Room reflections Faint echo of the attack transient arriving slightly later
Noise vs tone Broadband noise = vertical smear; pure tone = thin horizontal line

Practical uses:

  • Sample editing: find exactly where to trim, where a loop click is, where the tail becomes pure noise
  • Vocal analysis: see formant structure; tune SAM speech synth Throat/Mouth to match
  • Filter design: watch the cutoff move in real time while turning a filter parameter
  • Tuning: vibrato, drift, and intonation all visible without listening
  • Patch comparison: load two samples, compare their spectrograms side by side

Mode 2 — 3D Waterfall (Fairlight Homage)

The Fairlight CMI's Page D — a 3D mountain-range display of the evolving spectrum. Frequency on X, amplitude on Y, time receding into Z. The Ant64 version runs at 60fps with filled, Gouraud-shaded polygons and the same cool-to-warm colour map as the spectrogram. The original Fairlight rendered a slow wireframe on hardware that was extraordinary for 1979 but would be a slideshow by modern standards.

        ┌──────────────────────────────────────────────────────┐
        │  MODE 2 — 3D WATERFALL                               │
        │                                                      │
        │  Z (time, past →)                                    │
        │   \                                                  │
        │    \  ████                                           │
        │     \ █  ████  █                                     │
        │      \█     █████  ██                                │
        │       ────────────────── X (frequency, low → high)   │
        │                Y = amplitude                         │
        │                                                      │
        │  Filled + Gouraud shaded · 60fps · amplitude → hue   │
        └──────────────────────────────────────────────────────┘

Best for: demonstrations, understanding the temporal shape of a sound intuitively, presentations, and the Fairlight CMI experience. Less practical than the spectrogram for detailed editing work — the 3D projection occludes information that the flat view makes explicit.

Jog dials 5 and 6 rotate the perspective (yaw and pitch) in real time.


Mode 3 — Live Spectrum (Single Frame)

A classic spectrum analyser bar display — frequency on X, amplitude on Y, for the current moment only. This is the mode you see on hi-fi equipment and DAW channel strips. Useful for checking a live mix, monitoring a voice's output character in real time, or watching a filter sweep without the time history of the spectrogram.

        ┌──────────────────────────────────────────────────────┐
        │  MODE 3 — LIVE SPECTRUM                              │
        │                                                      │
        │  dB                                                  │
        │   0  │         ██                                    │
        │ -12  │    ██  ████ █                                 │
        │ -24  │   ████ ████ ███ █                             │
        │ -36  │  ██████████ ████ ██  █                        │
        │ -48  │ ████████████████████████ ███                  │
        │       └──────────────────────────────── Freq →       │
        │  Peak hold · RMS overlay · dB scale switchable       │
        └──────────────────────────────────────────────────────┘

Features: peak hold (decaying dots above each bar), RMS overlay as a smooth curve, linear or logarithmic frequency axis, dB or linear amplitude scale. Updates at the full frame rate of the STFT hop — effectively real time.


Mode 4 — Phasor View (Link to Page A)

A live single-frame phasor decomposition — the same display as Page A but embedded in Page D for quick access without leaving the spectral analysis context. Press PAGE A to jump to the full Page A view for detailed interaction.


STFT Engine Parameters

Shared across all four modes.


The STFT Pipeline — Step by Step

The spectrogram is built by running a succession of overlapping FFT frames across the audio and stacking the results. Each step in the chain has a specific role:

  Audio stream (PCM, 44.1kHz)
          │
          │  Step 1 — SEGMENT
          │  Slice the audio into overlapping frames.
          │  Each frame is FFT_SIZE samples long.
          │  Each successive frame is offset by HOP_SIZE samples.
          │
          ├── frame 0:  samples    0 →  4095
          ├── frame 1:  samples  512 →  4607   (offset by hop = 512)
          ├── frame 2:  samples 1024 →  5119
          ├── frame 3:  samples 1536 →  5631
          │   ...
          │   overlap = FFT_SIZE − HOP_SIZE = 4096 − 512 = 3584 samples
          │   overlap ratio = 3584 / 4096 = 87.5%
          │
          │  Step 2 — WINDOW
          │  Multiply each frame sample-by-sample by the window function.
          │  Tapers the frame smoothly to zero at both edges.
          │  Eliminates the sharp discontinuity the FFT would otherwise see.
          │
          │      w[n] = 0.5 × (1 − cos(2π·n / (N−1)))   ← Hann window
          │
          │  Amplitude
          │    1.0 │        ╭─────────────╮
          │        │      ╭╯               ╰╮
          │    0.5 │    ╭╯                   ╰╮
          │        │  ╭╯                       ╰╮
          │    0.0 │╭╯                           ╰╮
          │        └───────────────────────────────  sample index 0 → N
          │
          │  Step 3 — FFT
          │  Cooley-Tukey FFT on the windowed frame.
          │  Output: N/2 complex bins, each representing one frequency component.
          │  Extract magnitude: |bin| = √(re² + im²)
          │  Extract phase:     φ     = atan2(im, re)   (used by Page A)
          │
          │  Step 4 — LOG / dB CONVERSION
          │  Convert linear magnitude to decibels:
          │
          │      dB[k] = 20 × log₁₀(magnitude[k] + ε)
          │
          │  ε is a small floor value (~10⁻¹⁰) to avoid log(0).
          │  Result: −96dB (silence / noise floor) to 0dB (full scale).
          │
          │  Step 5 — NORMALISE
          │  Map the dB range to 0.0 – 1.0:
          │
          │      t[k] = clamp((dB[k] − dB_floor) / (dB_ceil − dB_floor), 0, 1)
          │
          │  dB_floor and dB_ceil are user configurable (default: −96 to 0).
          │
          │  Step 6 — COLOUR LUT LOOKUP
          │  Use t[k] as an index into a 256-entry RGB colour lookup table.
          │  The LUT lives in FireStorm BRAM (256 × 3 bytes = 768 bytes).
          │  Swapping colour maps = writing a new LUT to BRAM — zero render cost.
          │
          │      pixel colour = LUT[round(t[k] × 255)]
          │
          └── Repeat for every frame → stack of colourised rows → spectrogram

The 87.5% overlap ratio sounds high but is correct for a Hann window. Because the window tapers to zero at the edges, samples near the frame boundaries are heavily attenuated. A 75–87.5% overlap ensures every audio sample falls near the centre of at least one frame where it carries full weight. Without sufficient overlap, faint events between frames can disappear entirely from the display.


Windowing — Spectral Leakage and Its Cure

The FFT assumes the analysis window is a fragment of an infinitely repeating signal. If the audio does not start and end at exactly the same value at the frame edges — which almost never happens with real-world sound — the FFT sees a sharp discontinuity. That discontinuity has energy at every frequency: a pure 440Hz tone smears energy into 441Hz, 450Hz, 600Hz and beyond. This is spectral leakage — energy appearing in bins it has no business being in, obscuring faint harmonics near strong ones.

Multiplying by a window function that tapers smoothly to zero at both edges eliminates the discontinuity. The Ant64 offers four window choices, selectable per analysis session:

Window Characteristics Best for
Hann Raised cosine, 2 terms. Good frequency resolution, good leakage suppression. The practical default for almost all audio work. General purpose — synths, instruments, voice
Blackman-Harris 4-term cosine. Very high leakage suppression (~92dB side-lobe rejection). Wider spectral peaks — less frequency precision, but faint harmonics near strong ones become visible. Complex tonal sounds — distinguishing close harmonics
Flat-top Optimised for amplitude accuracy, not frequency resolution. Very wide peaks, very accurate peak amplitude measurement. Calibration — measuring the precise level of a specific harmonic
Rectangular No windowing — maximum time resolution, maximum leakage. Every sample weighted equally. Transient / percussion analysis — when timing matters more than frequency accuracy

The Hann formula for reference (N = FFT size, n = sample index 0..N−1):

  w[n] = 0.5 × (1 − cos(2πn / (N−1)))

Colour Maps — Remapping Greyscale to Colour

Once each FFT bin magnitude is normalised to 0.0–1.0, it is mapped through a colour lookup table stored in FireStorm BRAM. The choice of colour map significantly affects what is visible — a naive greyscale misses quiet features; a poorly chosen colour scale can create false structure that looks like real signal.

Four maps are available, each suited to different tasks:

Map Colours Character Best for
Viridis Dark purple → blue → teal → green → yellow → bright yellow Perceptually uniform — equal steps in magnitude appear as equal steps in perceived brightness. Colourblind-safe. Readable printed in greyscale. Default. General analysis, publication, colourblind users
Inferno Black → deep red → orange → yellow → white Slightly better quiet-end contrast than Viridis. Dramatic, high dynamic range feel. Sounds with a lot of detail in the noise floor
Classic hot Black → red → orange → yellow → white Traditional audio analyser look. Familiar to DAW and spectrum analyser users. Users coming from a DAW background
Green phosphor Black → dark green → bright green → white The Fairlight CMI aesthetic. Monochrome but with identity. Mode 2 (3D Waterfall) — the authentic Fairlight look

The LUT is 256 × 3 bytes = 768 bytes of BRAM — negligible. Switching maps at runtime is a single BRAM write from the CPU; the FireStorm rasteriser sees the new table immediately on the next frame with no pipeline stall.


Parameters Reference
Parameter Value Notes
FFT size 2048 or 4096 pt User selectable — frequency vs. time resolution trade-off
Hop size 512 samples New frame every ~11.6ms at 44.1kHz; 87.5% overlap with 4096-pt FFT
Window Hann (default) / Blackman-Harris / Flat-top / Rectangular See window table above
Frequency resolution ~10.8Hz/bin (4096pt) · ~21.5Hz/bin (2048pt) Resolves harmonics of notes above ~20Hz at 4096pt
History depth 128–1024 frames 1.5 to 11.8 seconds at 44.1kHz / 512-hop
Colour map Viridis (default) / Inferno / Classic hot / Green phosphor LUT in FireStorm BRAM — zero-cost swap
dB range −96dB to 0dB Configurable floor and ceiling
Amplitude → colour linear mag → log (dB) → normalise → LUT index Full pipeline described above

Frequency resolution trade-off:

  4096-point FFT:  fine frequency resolution (~10.8Hz/bin)   coarser time (~93ms/frame)
  2048-point FFT:  coarser frequency (~21.5Hz/bin)            finer time  (~46ms/frame)

For tonal sounds (synths, pitched instruments): use 4096. For percussive sounds (drums, transients): use 2048 — attack timing matters more than harmonic detail. The Rectangular window pairs naturally with 2048 for maximum transient precision.


Input Sources

All four modes share the same input selection:

Source Description
Loaded sample Analysis run over entire sample offline — full history available
Live ADC Mic or line in via WM8958 — continuous real-time scrolling display
Resample-own-output Any FireStorm voice or the full mix — watch synthesis live
Memory buffer Any PCM buffer held in DBFS — for analysis of intermediate results

Key Interactions

Control Action
Jog dial 1 Time scrub / scroll speed
Jog dial 2 Frequency zoom — focus on a harmonic region
Jog dial 3 History depth / waterfall length
Jog dial 4 Amplitude range / dB floor
Jog dial 5 Perspective yaw (Mode 2 only)
Jog dial 6 Perspective tilt / pitch (Mode 2 only)
1 / 2 / 3 / 4 Switch display mode
LOG / LIN Toggle log or linear frequency axis
PLAY Scrub loaded sample — all modes scroll in sync
MARK Place time cursor — use with Page S to jump to that sample point
SEND Push selected frame's spectrum to Page A and Page H
FREEZE Lock display at current frame for inspection

Relationship to Other Pages

Page D shows the time evolution of the spectrum — the whole life of the sound. Page A shows a single moment — the geometric decomposition of one frozen frame. Page H exposes the harmonic content as editable sliders, fed by analysis from either. Page S shares the same time cursor — marks placed in Page D jump to the exact sample position in the waveform editor, making it easy to find a click or a loop problem and fix it without switching mental context.

  Page D (time cursor MARK)  ──────────→  Page S (jump to sample position)
  Page D (SEND one frame)    ──────────→  Page A (phasor view of that frame)
  Page D (SEND one frame)    ──────────→  Page H (harmonic sliders populated)
  Page A (SEND)              ──────────→  Page H (same path, from phasor view)
  Page H (IFFT)              ──────────→  Page W (resulting waveform)
  Page W                     ──────────→  FireStorm wavetable engine

Spectral Drawing — Painting Sound Directly onto the Spectrogram

Page D's spectrogram is not read-only. The drawing tools work directly on the frequency domain canvas — painting magnitude values per bin per frame. The ISTFT engine reconstructs audio from whatever is drawn, in real time. The spectrogram becomes a two-dimensional instrument: frequency is one axis, time is the other, and brightness is amplitude.

The Drawing Tools

Tool Behaviour Sonic result
Pencil Sets magnitude at cursor — brush size = frequency bandwidth Pure tone or narrow band
Line Sweep between two points — straight or curved Glissando, pitch sweep
Fill Flood fill a frequency region Noise band — bandwidth = region height
Spray Random magnitude scatter in a region Textured, granular noise
Eraser Sets bins to silence Remove unwanted content
Clone Copy a time-frequency region, paste elsewhere Repeat a motif, delay effect
Mirror Reflect a region horizontally or vertically Symmetric spectral structures

What You Draw — What You Hear

Drawing Sound
Horizontal line at fixed frequency Sustained pure tone at that pitch
Diagonal line rising left to right Glissando — smooth upward pitch sweep
Stack of horizontal lines at harmonic intervals Pitched timbre — more lines = richer
Broad soft horizontal band Filtered noise — bandwidth = band height
Formant blobs at ~500Hz, ~1500Hz, ~2500Hz Vowel "ah" — move bands to change vowel
Short vertical smear Click, transient, attack
Bright isolated dot Short blip or percussive hit
Anything Something worth hearing

Phase Synthesis

Drawing provides magnitudes — phase must be synthesised. Three strategies:

Mode Method Character
Coherent (default) Phase advances per bin by 2π × k × hop / N each frame Clean, smooth tones — best for melodic drawing
Random Independent random phase per bin per frame Natural-sounding textures, noise, complex surfaces
Zero All phases set to zero Symmetric signal, can sound buzzy — useful for waveform design

NOISE toggle switches between Coherent and Random. Coherent for pitched lines, Random for textured fills — the two most common drawing scenarios.

ISTFT Reconstruction

The reverse of the analysis pipeline — drawn magnitudes reconstructed to PCM:

  Drawn spectrogram (magnitude per bin per frame)
          │
          │  Phase synthesis (coherent / random / zero)
          │  Combine magnitude + phase → complex FFT bins
          │
          ↓
  IFFT per frame → time-domain segment (N samples)
          │
          │  Multiply by synthesis Hann window
          │  (same window used in analysis — ensures perfect reconstruction
          │   when hop and window are matched)
          │
          ↓
  Overlap-add with hop size
  (successive frames summed at their offset positions)
          │
          ↓
  Continuous PCM audio stream → FireStorm output

Overlap-add with matched analysis and synthesis windows gives perfect reconstruction — a drawing that exactly reproduces the original spectrogram will produce audio indistinguishable from the original. Simplified drawings produce simplified audio. The quality of the reconstruction is entirely determined by what was drawn.

Playback is live — drawing and listening happen simultaneously. As the musician paints, the audio updates within one hop period (~11.6ms). The canvas is a live instrument.


Spectral Tracing — Simplifying Real Sounds

The reference layer places the original sound's spectrogram underneath the drawing canvas at reduced opacity, like tracing paper over a photograph. The musician draws on top — tracing the components they want, ignoring those they do not. Playback reconstructs only what was drawn.

  ┌─────────────────────────────────────────────────────────┐
  │  LAYER 0 — Reference (read-only, ghost)                 │
  │  Original sound spectrogram                             │
  │  50–70% opacity — visible but not dominant              │
  │  Plays back only when A/B mode held                     │
  ├─────────────────────────────────────────────────────────┤
  │  LAYER 1 — Drawing (editable, full opacity)             │
  │  User-drawn spectrogram — starts empty                  │
  │  This is what the ISTFT reconstructs and plays          │
  └─────────────────────────────────────────────────────────┘

Ghost opacity is controlled by jog dial 8. Hold AB to temporarily play the reference layer for direct comparison. The ear judges the tracing quality faster than any meter.

What Tracing Reveals

The process of tracing is itself an education in acoustics. Working from coarse to fine:

  Pass 1 — Fundamental only
  A single horizontal line at the pitch of the note.
  Plays back as a pure sine wave. How much identity survives?
  Usually: recognisably the right pitch but no timbre.

  Pass 2 — Harmonics
  Add lines at 2×, 3×, 4× the fundamental.
  Match brightness to the original. Timbre begins to emerge.
  Three or four harmonics recovers much of the character.

  Pass 3 — Formants (for voice and acoustic instruments)
  Draw broad soft bands at the formant frequencies.
  For voice — ~500Hz, ~1500Hz, ~2500Hz for "ah".
  The vowel identity appears. Move the bands: the vowel changes.

  Pass 4 — Transients
  Short vertical smears at note attack points.
  Articulation and pluck/bow character return.

  Pass 5 — Noise (optional)
  Spray tool for breath, rosin, room, string noise.
  The difference between clinical and organic.
  Your choice whether to include it.

Each pass is a separate sub-layer, independently toggleable. Mute the noise layer alone to hear the clean tone underneath. Solo the formants to hear just the vowel character. The sub-layers are a spectral mixing desk — each drawn element is a channel.

What Tracing Achieves

Manual psychoacoustic compression. You decide which frequency content the ear actually needs. Often far less than the original recording contains. MP3 does this algorithmically; here the musician decides, with their ears as the judge. The result is frequently more musical than algorithmic compression because the choices are deliberate.

Clean noise reduction. Rather than subtracting an estimated noise floor (which leaves artefacts), draw only the signal content and leave the noise unpainted. It simply does not exist in the reconstruction.

Source separation by hand. Two instruments overlapping in a recording have different harmonic series. Trace one, ignore the other. Imperfect but often effective for melodic material where the fundamental lines are distinct.

Understanding a sound's identity. The tracing process reveals which components are perceptually essential. A violin traced to just four harmonics still sounds like a violin. Remove the transient smear and it loses its bow attack. Remove the noise layer and it loses its rosin. Each element's contribution becomes viscerally clear.


Spectral Drawing as Synthesis Source

Every synthesis engine on the Ant64 accepts sources derived from a spectrogram drawing. The drawing is a universal source editor — the physicist's view of sound made directly manipulable, with all three synthesis paradigms reachable from a single canvas.

Path 1 → Additive Synthesis / Wavetable (Page H / Page W)

The drawn horizontal lines are additive synthesis components. Each line is a sine oscillator at a given frequency with a magnitude that varies over time. The SEND command extracts the harmonic content from the drawing and populates Page H directly — amplitude sliders, phase values, and per-harmonic envelopes all filled from the drawing. IFFT converts to a waveform cycle in Page W. FireStorm plays it with 128 polyphonic voices.

What tracing adds that pure synthesis cannot: the relative amplitudes and formant positions were derived from a real-world sound. The acoustic physics of the original instrument constrain and guide the synthesis. The result starts in a space that already sounds plausible rather than requiring the musician to discover it from scratch.

Path 2 → Sample Engine (PCM)

Run the ISTFT over the full time extent of the drawing — not just one cycle, the entire canvas. The result is a PCM audio buffer. Load it directly into the sample engine as a new source. It plays back with the full sample engine feature set: pitch tracking, loop points, velocity layers, multi-sample key zones.

This is the right path for sounds that cannot be reduced to a repeating cycle:

  • A traced vowel transition ("ah" → "ee") — formant bands move over time
  • A traced drum hit — transient smear and tail as a single gesture
  • A traced environmental texture — wind, breath, room, rain
  • A drawn melodic phrase — time-varying pitch and amplitude by hand

The drawn version is always cleaner than the original: noise not drawn is not present. Bleed from other instruments not traced simply does not exist in the reconstruction. The sample engine then plays this cleaned, simplified source with full polyphony.

Path 3 → FM Synthesis (Page F)

The most analytical path. When the drawing contains a harmonic series — a fundamental and its partials — the frequency ratios between the drawn lines define an FM operator configuration:

  Traced harmonics:       100Hz  200Hz  300Hz  400Hz  500Hz
  Ratio to fundamental:    1:1    2:1    3:1    4:1    5:1
  FM operator ratios:     C=1   M=2    M=3    M=4    M=5
  Modulation indices:     derived from relative amplitudes of each partial

The system cannot fully auto-generate FM patches from complex drawings — the magnitude-to-index relationship is not straightforwardly invertible for arbitrary spectra. But it analyses the drawing and proposes a starting patch: detected ratios, suggested modulation indices, recommended algorithm. The musician opens Page F and refines from a meaningful starting point rather than a blank operator graph.

Inharmonic sounds are where this is most powerful. A bell's partials are slightly stretched — visible in the spectrogram as non-integer spacing. The analysis tells the musician immediately that a non-integer FM ratio is needed, and approximately what it should be. Metallic textures, physical models, and acoustic percussion all reveal their FM structure in the drawing.

The Unified Source Creation Pipeline

  Real-world sound / live input / own output
           │
           ▼
  Page D — load as reference (ghost layer, Layer 0)
           │
           ▼
  Draw / trace on Layer 1 — simplified, essential, intentional
           │
           │  Play back drawing during tracing — live ISTFT
           │  A/B against reference — ear judges quality
           │  Sub-layers: harmonics / formants / transients / noise
           │
           ├──── PATH 1 ──────────────────────────────────────────────┐
           │     SEND to Page H (harmonic sliders populated)          │
           │     Edit amplitude, phase, per-harmonic envelope         │
           │     IFFT → Page W (waveform cycle)                       │
           │     → FireStorm additive / wavetable engine              │
           │     → 128 polyphonic voices                              │
           │                                                          │
           ├──── PATH 2 ──────────────────────────────────────────────┤
           │     ISTFT full time extent → PCM buffer                  │
           │     → Sample engine (256 voices)                         │
           │     → Loop points, velocity layers, key zones            │
           │     → Cleaner than original — noise not drawn = absent   │
           │                                                          │
           ├──── PATH 3 ──────────────────────────────────────────────┤
           │     Harmonic ratio analysis → FM patch proposal          │
           │     → Page F (operator graph, suggested ratios + indices)│
           │     → Refine by ear                                      │
           │     → FM engine (128 voices)                             │
           │     → Especially powerful for inharmonic / metallic      │
           │                                                          │
           └──── PATH 4 ──────────────────────────────────────────────┘
                 ISTFT single cycle → Page W wavetable
                 → VA oscillator (wavetable mode)
                 → Nonlinear ladder filter + SVF
                 → BBD chorus · VCA envelope
                 → 128 polyphonic voices
                 → Drawing is raw material — filter/envelope/chorus
                   reshape it dynamically on every note

Path 4 → Analog / VA Engine (Wavetable Oscillator Mode)

The VA engine accepts any wavetable as its oscillator source — including one derived from a spectrogram drawing. This is distinct from the other three paths because the drawing is not the final sound: it is the input to an ongoing analog signal chain. The filter, envelope, chorus, and saturation all act on it dynamically, every note, in real time.

  Drawn / traced wavetable (from Page W via IFFT, or direct from Page D)
          │
          ▼
  VA oscillator — wavetable mode
  (cycles through the waveform at the played pitch, with BLEP anti-aliasing)
          │
          ▼
  Nonlinear ladder filter
  ← cutoff envelope (ADSR) · resonance · keyboard tracking · velocity mod
  ← self-oscillation available at high resonance
          │
          ▼
  SVF — parallel or series with ladder
  ← independent cutoff · character: LP / BP / HP / notch
          │
          ▼
  BBD chorus
  ← rate · depth · stereo spread · bucket-brigade clock rate
          │
          ▼
  VCA + amplifier envelope
          │
          ▼
  128 polyphonic voices

The filter does not know or care where the waveform came from. It processes whatever spectral content the wavetable contains with full analog nonlinearity — the tanh saturation in the ladder stages, the self-oscillation at high resonance, the phase relationships of the BBD delay line. These interactions are not predictable from the drawing alone; they emerge from the physics of the emulated circuit.

What this produces that other paths cannot:

A drawn wavetable through the VA chain is dynamically reshaped with every note. The filter sweeps, envelopes open and close, the chorus introduces time-varying modulation. The source is fixed; the sound is alive. The other three paths (additive, sample, FM) play back the drawn content more or less directly. Path 4 treats it as raw material.

Example combinations:

Drawn source VA treatment Result
Traced violin harmonics Moog ladder filter sweep Violin-Moog hybrid — filter strips partials as cutoff closes, as it would a saw but with violin harmonic spacing
Drawn vowel "ah" formants Ladder resonance at formant frequency + slow filter LFO Talking filter — vowel character modulates with the resonance peak sweeping through it
Traced bell (inharmonic partials) BBD chorus at slow rate Metallic shimmer — chorus detune beats against the inharmonic partials, impossible to program conventionally
303 acid engine + traced bass clarinet wavetable Diode ladder + accent RC circuit Acid character acting on clarinet harmonics — the resonant squeal at frequencies the original 303 never had
M-86 hoover envelope shape + traced choir vowel Detuned oscillators + filter snap Hoover-vowel hybrid — the hoover envelope on vowel spectral content
Prime-numbered harmonics only (drawn) Self-oscillating ladder + slow LFO Unpredictable beating between the sparse harmonic series and the filter's own resonant frequency

Wavetable scanning:

If the drawing spans multiple frames in time, the VA engine can scan through them — the oscillator advances through the wavetable as a function of note length, velocity, an LFO, or an envelope. A drawing that transitions from "ah" to "ee" over 2 seconds becomes a vowel-morphing oscillator source, with the filter adding its own movement on top. This is wavetable synthesis in the classical Waldorf/PPG sense, with the wavetable itself authored from a spectrogram drawing rather than pre-baked ROM.

Updating the unified pipeline:

Path 4 sits alongside Path 1 — both use Page W as the wavetable source. The difference is the destination: Path 1 feeds directly to the FireStorm additive engine; Path 4 routes through the full VA signal chain first.

  Draw / trace on Page D
           │
           └──── PATH 4 ──────────────────────────────────────────────┐
                 ISTFT single cycle → Page W wavetable                │
                 → VA oscillator (wavetable mode)                     │
                 → Nonlinear ladder filter + SVF                      │
                 → BBD chorus                                         │
                 → VCA envelope                                       │
                 → 128 polyphonic voices                              │
                 → Dynamic — filter, envelope, chorus reshape the     │
                   source on every note, in real time                 │

The key distinction across all four paths:

Path Engine Source is... Sound is...
1 — Additive / WT FireStorm wavetable The final timbre Static per cycle, dynamic via modulation matrix
2 — Sample PCM playback The complete sound, full duration As drawn — cleaned, simplified original
3 — FM Operator synthesis A starting patch approximation Dynamically generated from operator interaction
4 — VA Oscillator → analog chain Raw spectral material Continuously reshaped by filter, envelope, chorus

Sounds That Have Never Existed

Because the drawing is unconstrained by physical acoustics, the musician can create sources impossible in the real world:

  • The formant structure of a human vowel with the harmonic spacing of a bell
  • The transient of a snare drum with the sustained body of a cello
  • A voice that speaks in perfect harmonic series (every partial exactly integer)
  • A piano with no inharmonicity — perfectly locked partials
  • A sound with energy only at prime-numbered harmonics
  • A choir vowel that morphs continuously from "ah" to "ee" to "oo" over 4 seconds

These are not available from any physical instrument, any sample library, or any conventional synthesis workflow. They exist only in the frequency domain, and the spectrogram canvas is the place to create them.

Gossip Network — Sharing Drawn Sources

Sources created by spectral drawing are compact:

Source type Typical size
Page H harmonic patch (64 harmonics, full envelopes) ~4KB
Page W wavetable (256 samples, 16-bit) 512 bytes
FM patch (8 operators, full parameters) ~1KB
VA patch (wavetable + filter + envelope + chorus params) ~2KB
PCM sample (4 seconds, 44.1kHz, 16-bit mono) ~353KB

All are shareable over the Ant64 gossip network to any other Ant64 on the same mesh. A musician on one machine traces a sound, creates a patch, shares it — another musician receives it, modifies the drawing, sends a variation back. Spectral tracing becomes a collaborative instrument design workflow across machines.


The Full Analysis-to-Synthesis Pipeline

 ┌──────────────────────────────────────────────────────────────────────┐
 │                        INPUT SOURCES                                 │
 │  Live ADC mic/line · Loaded sample · Resample-own-output             │
 │  Import image as sound (PNG pixel brightness → FFT magnitudes)       │
 └──────────────────────────┬───────────────────────────────────────────┘
                            │
            ┌───────────────▼─────────────────┐
            │   FFT / STFT ENGINE             │
            │   SG2000 big core · C906 @ 1GHz │
            │   Cooley-Tukey · 4096pt         │
            │   Hann / BH / Flat-top / Rect   │
            │   512-sample hop · 87.5% overlap│
            └───┬─────────────────────┬───────┘
                │                     │
   ┌────────────▼──────┐  ┌───────────▼───────────────────────────────┐
   │  PAGE A           │  │  PAGE D                                   │
   │  Phasor view      │  │  Mode 1: Spectrogram (default)            │
   │  Single frame     │  │  Mode 2: 3D Waterfall (Fairlight homage)  │
   │  Phase-aware      │  │  Mode 3: Live Spectrum (bar graph)        │
   │  Rotating circles │  │  Mode 4: Phasor (links Page A)            │
   │                   │  │                                           │
   │                   │  │  DRAWING TOOLS — canvas is live:          │
   │                   │  │  Layer 0: reference ghost (original)      │
   │                   │  │  Layer 1: drawing (plays back via ISTFT)  │
   │                   │  │  Sub-layers: harmonics / formants /       │
   │                   │  │             transients / noise            │
   └────────┬──────────┘  └──────────┬──────────────┬───────────┬────┬──────────┘
            │  SEND                  │ SEND         │ ISTFT     │ FM │ ISTFT
            │              (1 frame) │ to H+A       │ full time │ ↓  │ 1 cycle
            └──────────┬─────────────┘              │           │    │
                       │                            │           │    │
          ┌────────────▼────────────┐               │           │    │
          │  PAGE H                 │               │           │    │
          │  Harmonic Editor        │               │           │    │
          │  Amplitude · Phase      │               │           │    │
          │  Per-harmonic envelopes │               │           │    │
          └────────────┬────────────┘               │           │    │
                       │  IFFT                      │           │    │
          ┌────────────▼────────────┐               │           │    │
          │  PAGE W                 ├───────────────┼───────────┼────┘
          │  Waveform Drawing       │               │           │
          │  Fine sculpt post-IFFT  │               │           │
          └──────┬─────────────┬────┘               │           │
                 │             │                    │           │
                 │ (additive)  │ (VA wavetable)     │           │
          ┌──────▼──────┐  ┌───▼─────────────────┐  │           │
          │  FireStorm  │  │  VA ENGINE          │  │           │
          │  Additive / │  │  Wavetable osc      │  │           │
          │  Wavetable  │  │  Ladder filter      │  │           │
          │  128 voices │  │  SVF · BBD chorus   │  │           │
          └─────────────┘  │  VCA envelope       │  │           │
                           │  128 voices         │  │           │
                           └─────────────────────┘  │           │
                                          ┌─────────▼─────┐   ┌─▼──────────┐
                                          │  Sample Engine│   │  PAGE F    │
                                          │  PCM buffer   │   │  FM Graph  │
                                          │  256 voices   │   │  128 voices│
                                          │  Loop · Vel.  │   │  Inharmon- │
                                          └───────────────┘   │  ic ready  │
                                                              └────────────┘

This is a complete draw → play → refine → synthesise loop. A musician can take any sound from the real world, trace its essential components, discard what is not needed, and export the result to any of the four synthesis engines on the Ant64 — all without leaving the workstation app. Or they can draw from nothing, creating sounds constrained only by imagination rather than physical acoustics. The VA path in particular treats the drawing as raw material rather than a finished sound — handing it to the filter, chorus, and envelope to reshape dynamically on every note.


Application Architecture

FireStorm is the ImGui rendering backend for the workstation app.

The SG2000 big core builds the Dear ImGui draw list each frame — vertices, indices, and draw commands — and DMAs it to FireStorm, which rasterises it into the framebuffer in hardware. The CPU does zero pixel work. FireStorm's audio DSP and 2D rasteriser run on orthogonal FPGA resources (SRAM A for audio, SRAM B for rasteriser) and never contend. Full rendering architecture, rasteriser pipeline, and 3D two-pass LOD system are documented in the Display Architecture reference.

Why this matters for the music app: The big core runs pure C++ application logic — patch management, sequencer state, UI event handling, DBFS I/O — with no OS overhead. The full 1GHz C906 is available for application work. FireStorm handles all rendering in parallel with audio DSP.


FireStorm Internal Hard RISC-V Core

The GoWin 138k FPGA (GW5AST), used in the Ant64 and Ant64C only, contains a built-in hard RISC-V processor core — silicon on the die, not synthesized from LUTs. The GoWin 60k in the Ant64S does not have this core. On the Ant64/Ant64C this is used for internal FireStorm hardware debugging.

What "hard" means: A soft RISC-V core implemented in LUTs would consume thousands of LUT4 resources — fabric that would otherwise be available for audio DSP voices or rasterizer logic. The hard core costs zero LUTs. It is free silicon, already there.

What it does on the Ant64:

The hard RISC-V runs independently of all other FireStorm activity, monitoring internal state and providing a debug window into the FPGA's live operation:

  • Voice status monitoring — which FireStorm voices are active, envelope states, DSP pipeline occupancy
  • Performance counters — audio DSP cycle budgets, rasterizer triangle throughput, memory bus utilisation on SRAM A, SRAM B, DDR3
  • Error detection — underruns, overflows, FIFO stalls, QSPI protocol errors
  • Register inspection — read/write any FireStorm register from inside the FPGA without going via the external QSPI bus
  • Debug stream output — sends status packets to DeMon over a dedicated internal channel; DeMon forwards them to the SG2000 via hi-speed UART for display in AntOS
  • Assertion checking — configurable internal assertions that halt or flag specific voice states, timing violations, or data integrity issues during development

Why this matters for development:

Debugging an FPGA design traditionally means either adding logic analyser probes externally (slow, invasive) or instantiating an internal debug core like Xilinx's ILA (consumes LUTs). The hard RISC-V gives a third option: a resident debug processor that has full visibility into FireStorm's internals at silicon speed, consuming no programmable fabric, and communicating with the outside world through DeMon's existing JTAG and UART infrastructure.

It also has a role in production use — monitoring audio DSP health in real time and surfacing meaningful diagnostics to AntOS rather than opaque hardware failures.


Future FPGA: GoWin 7 Series (11nm)

By the time the Ant64 reaches production release, the GoWin 7 series at 11nm may be available. If so, the implications are significant:

What 11nm → 22nm means in practice:

  • Roughly 2× the logic density for the same die area — a 138k-equivalent chip could shrink to half the size, or a same-size chip could offer ~250–300k LUTs
  • Lower power consumption at equivalent clock speeds — important for a device with internal speakers and a passive or modest cooling solution
  • Higher maximum clock speeds — the audio DSP pipeline and rasterizer could run faster, giving more margin for complex voice algorithms
  • Potentially lower cost — smaller die at advanced node typically reduces per-chip cost once yields mature

What this means for the Ant64 architecture:

The FireStorm voice engine is designed to be node-agnostic — the fixed-point DSP pipeline, the QSPI/MIPI interfaces, the memory controllers, are all defined functionally. Porting to a GoWin 7 series FPGA would be a re-synthesis and re-timing exercise, not a redesign. The same HDL runs on both.

If a GoWin 7 device with 200k+ LUTs at $26 or less becomes available, the Ant64 could realistically push beyond 128 voices without any architectural changes — simply more resources available for time-multiplexed DSP pipeline slots.

Current position: The design is proceeding on GoWin 138k (22nm, $26/chip confirmed). GoWin 7 series availability will be evaluated at the time. The architecture is designed to benefit from it without depending on it.


AntOS Debug Server / Client — Remote Ant64 Debugging

AntOS running on the little core (700MHz C906) of the SG2000/SG2002 includes a built-in debug server and client. This means one Ant64 can debug another Ant64 over the network — no external probe, no USB cable, no debug pod required.

Ant64 A                              Ant64 B
─────────────────                    ──────────────────
AntOS little core                    AntOS little core
  dbg client                 WiFi      dbg server
  (developer machine)  ◄──────────►  (target machine)
       │                                    │
  AntOS shell                         exposes:
  - inspect memory                    - SG2000 memory
  - read registers                    - DeMon registers
  - set breakpoints                   - FireStorm state
  - stream logs                       - Pulse state
  - upload code                       - AntOS internals
  - control execution                 - log streams

What the debug server exposes on the target Ant64:

  • SG2000 memory inspection — read/write any address in the big core's address space, including FireStorm register map, shared DRAM regions, ImGui draw list buffers, voice parameter tables
  • AntOS internals — scripting VM state, DBFS contents, gossip subsystem, active scripts and their stack traces
  • DeMon passthrough — the dbg server can forward commands to DeMon, which in turn uses its JTAG to access FireStorm and the SG2000 debug ports. Full chip debug from a remote AntOS shell.
  • FireStorm state (via DeMon JTAG) — voice registers, DSP pipeline state, hard RISC-V debug stream (Ant64/Ant64C)
  • Pulse state (via DeMon SPI) — sequencer patterns, MIDI state, jog dial positions, LED buffer
  • Live log streaming — all subsystem logs forwarded in real time to the client's AntOS terminal

Transport:

The debug server uses a standard TCP/IP socket connection — not gossip. Gossip is the right tool for P2P discovery and broadcast (patch sharing, presence, chat), but for a debug session you want a direct TCP connection: reliable, ordered, low latency, compatible with standard tooling, and no relay overhead.

  • Connect by IP address or hostname over WiFi (ESP-C5 on DeMon) or Ethernet (Ant64C)
  • Standard BSD socket API on both client and server — straightforward to implement in AntOS's network layer
  • Gossip can be used for peer discovery — finding which Ant64s on the network have the debug server active — but the actual debug session runs over TCP directly
  • Ant64C's Ethernet port gives a dedicated wired channel independent of WiFi, useful for high-bandwidth log streaming or memory dump operations

Security:

The debug server requires explicit activation in AntOS — it is off by default. Once enabled, it can be locked to specific peer addresses (by Ant64 identity from the gossip subsystem) so only a trusted machine can connect.

Practical workflow:

# On target Ant64B — enable debug server
> dbg server start

# On developer Ant64A — connect and inspect
> dbg connect ant64b.local
[connected to Ant64B]
dbg> memory read 0x08000000 64     # read FireStorm voice registers
dbg> log stream firestorm          # live FireStorm audio DSP log
dbg> script traceback            # AntOS script stack on target
dbg> demon jtag firestorm regs     # dump FireStorm registers via DeMon

Why this matters:

A developer with two Ant64s can sit at one machine running the music app and debug the other machine's AntOS scripts, FireStorm voice state, or Pulse sequencer live — without touching the target device, without interrupting its audio output, and without any external hardware. This is a significantly better development experience than most embedded platforms offer even with dedicated debug hardware.

It also means the Ant64 development community can help each other debug remotely — with permission, a community member can connect to another's Ant64 to help diagnose a FireStorm audio issue or an AntOS script problem.

ImGui draw list format — what the big core sends to FireStorm each frame:

// ImDrawData structure (simplified):
ImDrawList {
    ImVector<ImDrawCmd>   CmdBuffer;   // draw commands
    ImVector<ImDrawIdx>   IdxBuffer;   // index buffer (uint16)
    ImVector<ImDrawVert>  VtxBuffer;   // vertex buffer
}

struct ImDrawVert {
    float    x, y;      // position
    float    u, v;      // texture coords (for font atlas)
    uint32_t col;       // RGBA colour
};

struct ImDrawCmd {
    uint32_t ElemCount;     // number of indices for this draw call
    ImTextureID TextureId;  // font atlas or null
    ImVec4 ClipRect;        // scissor rectangle
};

Every ImGui primitive — windows, buttons, waveform curves, piano roll notes, FM routing lines — is ultimately triangles in this format. FireStorm receives the packed vertex/index buffers via DMA and rasterises them in hardware. The full rasteriser pipeline is documented in the Display Architecture reference.


3D Rendering Architecture

3D world rendering on FireStorm uses a two-pass technique — far geometry rendered back-to-front with painter's algorithm (no z-buffer), near geometry with a 15/16-bit z-buffer covering only the near zone [0, D]. This solves the perspective z-buffer precision problem and allows LOD chunk baking for distant geometry. Full documentation including z-buffer format analysis, LOD chunk pipeline, vertex formats, and the complete frame render sequence is in the Display Architecture reference. See also: James Lambert's N64 world renderer series for the underlying technique.


FireStorm has three independent memory buses, each dedicated to a specific workload. No bus is shared between audio DSP and the rasterizer — they run in parallel with zero contention.

The Ant64 ships in three hardware tiers with different memory architectures:

Tier Bus A Bus B Bus C Contention
Premium 36-bit SRAM 36-bit SRAM 32-bit DRAM None — fully dedicated
Standard 32-bit SRAM 32-bit SRAM 32-bit DRAM None — fully dedicated
Starter DDR5 PSRAM Arbitrated shared bus

Premium and Standard share the same three-bus no-contention architecture — audio DSP and rasterizer each get a fully dedicated SRAM bus. The Starter uses DDR5 as a shared high-bandwidth bus with PSRAM for secondary storage, requiring bus arbitration but remaining fully capable.

     Ant64 / Ant64C                    Ant64S
     ─────────────                     ──────
      FireStorm (GoWin 138k)           FireStorm (GoWin 60k)
           │                                 │
    ┌──────┼──────┐                    ┌─────┴──────┐
    │      │      │                    │  8MB PSRAM │
┌───▼──┐┌──▼──┐┌──▼──┐                 │  (embedded │
│DDR3  ││SRAM ││SRAM │                 │  in FPGA)  │
│32-bit││  A  ││  B  │                 └────────────┘
│ 1/2GB││36bit││36bit│                 All audio+video
│ 2GB  ││fast ││fast │                 shared on single
└──────┘└─────┘└─────┘                 PSRAM bus
samples  audio  raster
         DSP    izer

DRAM Bus — 32-bit, up to 2GB

Standard DDR SDRAM interface. Large capacity at moderate speed — suitable for data that is large but not accessed at audio sample rate or pixel clock rate.

Contents:

  • Long audio samples (piano multisamples, orchestra, full recordings)
  • Wavetable banks (large collections, hundreds of tables)
  • DX7 / FM patch banks, sysex archives
  • DBFS working buffers (patch I/O, sample import/export)
  • MIPI transfer staging buffers (draw lists in transit, audio stream capture)
  • Operating scratch space for the music workstation app
  • Video frames for the VJ tool (Mode 4 clip storage)

At 2GB this is effectively unlimited for any musical purpose. A full multisample piano library (all keys, 8 velocity layers) might be 500MB. A complete DX7 library is a few MB. There is headroom for years of content.

SRAM Bus A — Audio DSP Dedicated

All bits are data. No parity on either tier.

On the Premium Ant64, standard SRAM chips come physically 36 bits wide (32 data + 4 parity). All 36 bits are used as data — the 4 parity bits become 4 extra data bits. On the Standard Ant64, 32-bit SRAM chips are used directly.

Audio precision by tier:

Premium — 36-bit SRAM:

  • S4.31 format: ±15 full scale, 31 bits fractional
  • ~186dB dynamic range internally
  • 4 integer bits = 7 bits of headroom above full scale when summing 128 voices — the mix bus cannot clip internally under any musical input
  • FM phase accumulators: sub-cent accuracy across the full keyboard range

Standard — 32-bit SRAM:

  • S1.31 format: ±1 full scale, 31 bits fractional
  • Same ~186dB dynamic range — fractional precision is identical
  • 1 integer bit: mix bus accumulates with careful per-voice gain staging to prevent overflow; still far beyond any commercial synthesizer's internal precision
  • In practice indistinguishable from the Premium tier in audio output quality

Contents of SRAM A:

  • Active voice sample data (current notes playing — hot samples)
  • BBD chorus delay line (2048 × 36-bit samples)
  • Reverb FDN delay lines (8 lines × up to 4096 samples each)
  • Physical modeling waveguide delay lines
  • Granular grain buffer (current live granular window)
  • FM sine lookup table (1024 × 36-bit, single-cycle access)
  • Filter tanh lookup table (4096 × 36-bit, nonlinear ladder)
  • BLEP correction table (NCO anti-aliasing)
  • Voice envelope/LFO state registers (128 voices × ~32 values)

Every memory access in the audio DSP hot path is a single-cycle SRAM A read or write. No DRAM latency, no bus contention with the rasterizer.

SRAM Bus B — Rasterizer Dedicated

All bits are data. No parity on either tier.

The rasterizer has its own completely independent SRAM bus. The audio DSP never touches this bus and the rasterizer never touches SRAM A.

Framebuffer pixel format by tier:

Premium — 36-bit SRAM → RGB12 (3 pixels per word):

Word[35:24] = Pixel N+0  (R[11:8] G[11:8] B[11:8] = 12-bit per channel)
Word[23:12] = Pixel N+1  (12-bit RGB)
Word[11: 0] = Pixel N+2  (12-bit RGB)

4096 levels per channel → 68 billion colours internally. HDMI output dithers to 8-bit per channel. Zero banding in smooth gradients. Jog dial colour sweeps and waveform backgrounds look flawless.

Standard — 32-bit SRAM → RGB888 (1 pixel per word, 8 bits spare):

Word[31:24] = spare (alpha, stencil, or flags)
Word[23:16] = Red   (8-bit, 256 levels)
Word[15: 8] = Green (8-bit, 256 levels)
Word[ 7: 0] = Blue  (8-bit, 256 levels)

Standard 24-bit colour — identical to a PC monitor's native format. The 8 spare bits per pixel can carry per-pixel alpha for UI compositing. Visually excellent; only noticeable vs RGB12 in the subtlest gradients.

Framebuffer sizes — double-buffered:

Resolution Premium RGB12 Standard RGB888
640×480 900 KB 1.2 MB
1280×720 (720p) 2.7 MB 3.6 MB
1920×1080 (1080p) 6.0 MB 8.3 MB

720p double-buffered fits comfortably in modest fast SRAM on both tiers. 1080p is achievable on both tiers with sufficient SRAM capacity.

Contents of SRAM B (both tiers):

  • Framebuffer A (active display) and Framebuffer B (being rendered)
  • ImGui draw list buffer A and B (double-buffered)
  • Font atlas texture (BRAM preferred; SRAM B fallback)

The rasterizer reads the draw list, writes pixels to the framebuffer, and the HDMI timing generator reads the framebuffer — all on SRAM B, all simultaneously, all without touching SRAM A.

Why Three Buses Matters

The critical insight is that audio and video are completely memory-isolated on both tiers:

Audio DSP Rasterizer
Memory bus SRAM A (36-bit premium / 32-bit std) SRAM B (36-bit premium / 32-bit std)
Access pattern Streaming + random (samples + LUT) Sequential (framebuffer write)
Clock domain Audio clock (derived from 48kHz) Pixel clock (~74MHz for 720p)
Contention None None

No arbitration logic. No bus priority. No stall cycles. Both workloads run at full SRAM bandwidth simultaneously, every clock cycle, on both tiers.

DRAM holds cold data. Hot data for each workload lives in its dedicated SRAM. The result is a memory hierarchy perfectly matched to a real-time audio-visual synthesizer — and it holds whether you're building a Premium or Standard Ant64.

Tier summary:

Feature Ant64C (Creative) Ant64 (Power) Ant64S (Starter)
FPGA GoWin 138k GoWin 138k GoWin 60k
CPU SG2000, 512MB DDR3 SG2000, 512MB DDR3 SG2002, 256MB DDR3
FireStorm cores 4 4 2
FireStorm instruction width 36-bit 36-bit 32-bit
FireStorm registers 128 (2×D0-D31 + 2×A0-A31) 128 64 (2×D0-D15 + 2×A0-A15)
FPGA memory 2GB 1GB 8MB PSRAM (embedded)
Bus A 36-bit SRAM 36-bit SRAM PSRAM (shared)
Bus B 36-bit SRAM 36-bit SRAM
Bus C 32-bit DDR3 32-bit DDR3
Audio precision S4.31 (36-bit) S4.31 (36-bit) 32-bit (managed)
Framebuffer format RGB12 (3px/word) RGB12 (3px/word) 32-bit (RGB888)
Bus contention None None Shared/arbitrated
Supervisors Pulse + DeMon Pulse + DeMon 1× RP2350
DIN MIDI In/Out/Thru
USB MIDI
Optical digital audio ✔ (FireStorm)
Ethernet
WiFi 2.4 + 5GHz 2.4 + 5GHz 2.4GHz
Display outputs HDMI / VGA / DisplayPort HDMI / VGA HDMI / VGA
DIN MIDI In/Out/Thru

Ant64S (Starter) Memory Architecture — Embedded PSRAM

The Ant64S uses a GoWin 60k FPGA with 8MB of PSRAM embedded inside the FPGA itself. There are no external memory buses on the FPGA side. Audio DSP and rasterizer share this single embedded PSRAM — bus arbitration is required, but the smaller FPGA and 8MB constraint already define the workload scope.

Embedded PSRAM characteristics:

  • Integrated inside the GoWin 60k FPGA package — no external memory chips
  • 8MB total — adequate for the Ant64S's audio and graphics scope
  • Single shared bus: audio DSP and rasterizer time-share access
  • Audio DSP gets absolute priority — rasterizer stalls rather than audio

What 8MB supports on Ant64S:

  • Active voice sample data for currently playing notes
  • Framebuffer (640×480 × 32-bit = 1.2MB double-buffered)
  • ImGui draw list buffer (~256KB)
  • BBD chorus and reverb delay lines
  • FM sine LUT, filter tables
  • Patch data for current instrument

Long samples and large wavetable banks are loaded from the SG2002's DDR3 (256MB) into the FPGA's PSRAM as needed, managed by the FireStorm subsystem.

FireStorm instruction width on Ant64S: 32-bit. On Ant64 and Ant64C, FireStorm instructions are 36-bit — which is why those models have 36-bit SRAM buses. On the Ant64S, 32-bit instructions mean 32-bit memory buses are sufficient and no parity bits are repurposed.

FireStorm cores on Ant64S: 2 (vs 4 on Ant64/Ant64C). Registers: 64 total (2× D0–D15 data, 2× A0–A15 address).

Ant64S is not a cut-down synthesiser. The audio engine, synthesis paradigms, and voice architecture are the same. The constraints are memory capacity and FireStorm core count — not synthesis capability. It is best thought of as the same instrument in a smaller, lower-cost FPGA with less RAM, running a subset of the most demanding configurations.

Font atlas — stored in FPGA BRAM (on-chip) as a pre-rendered 1bpp texture. FireStorm samples it in a single clock cycle without touching SRAM B at all.

Performance envelope (rasterizer): At 720p/60fps, FireStorm has ~16.7ms per frame. A complex ImGui music editor frame with 15,000 triangles × 50 cycles @ 100MHz = 7.5ms rasterization. The 1.35MB draw list + framebuffer transfers on SRAM B take ~1ms at fast SRAM speeds. Total: ~8.5ms, leaving 8ms headroom. 1080p/60fps is achievable with a faster FPGA clock.


CPU ↔ FireStorm Interconnect: Two Buses, Two Roles

The SG2002 big core communicates with FireStorm over two complementary buses — a low-latency control plane and a high-bandwidth data plane.


Bus 1 — QSPI in NOR Addressing Mode (control plane)

FireStorm presents itself to the CPU as a NOR flash device over QSPI. The SG2002's existing QSPI controller talks directly to FireStorm using the standard NOR command set — no custom driver needed. From C++ the FPGA address space is just a pointer:

volatile uint32_t* fpga = (volatile uint32_t*)FPGA_QSPI_BASE;

// Write a voice parameter — a single memory write, ~100ns:
fpga[VOICE_REG(voice, PARAM_CUTOFF)] = cutoff_value;

// Signal new draw list ready:
fpga[DRAWLIST_PTR] = (uint32_t)drawlist_addr;
fpga[DRAWLIST_LEN] = (uint32_t)drawlist_bytes;
fpga[DRAWLIST_CMD] = RENDER_GO;

// Read DSP status:
uint32_t env = fpga[DSP_STATUS(voice, ENV1_LEVEL)];

QSPI NOR address map (FireStorm internal):

FPGA_BASE + 0x00000  Voice register bank (128 voices × 64 regs = 32KB)
FPGA_BASE + 0x08000  Global synth registers (tempo, FX, routing = 4KB)
FPGA_BASE + 0x09000  DSP status readback, read-only (4KB)
FPGA_BASE + 0x0A000  ImGui draw list buffer A (256KB)
FPGA_BASE + 0x4A000  ImGui draw list buffer B (256KB, double-buffered)
FPGA_BASE + 0x8A000  Rasterizer control registers (256 bytes)

QSPI bandwidth: ~40 MB/s. Adequate for register writes and small transfers. A single voice parameter write is ~100ns. Real-time jog dial events at 100/sec consume ~10µs/sec of bus time — negligible.

QSPI is the control plane — low latency, always-on, zero overhead per write.


Bus 2 — Dual 4-Lane MIPI D-PHY (data plane, future)

A dual 4-lane + clock MIPI D-PHY link between the SG2002 and FireStorm, running simultaneously in both directions. The SG2002's existing MIPI CSI/DSI PHYs are repurposed as a general-purpose high-speed serial link.

SG2002                              FireStorm
  MIPI TX (4 lane + clk) ─────────► MIPI D-PHY RX   ~500 MB/s CPU→FPGA
  MIPI RX (4 lane + clk) ◄───────── MIPI D-PHY TX   ~500 MB/s FPGA→CPU

Full duplex ~1 GB/s aggregate — 12× QSPI in each direction simultaneously.

CPU → FPGA (MIPI link A, ~500 MB/s):

Transfer Size Time
ImGui draw list (typical) 150 KB 0.3 ms
ImGui draw list (complex) 300 KB 0.6 ms
Full 1080p framebuffer 4 MB 8 ms
Wavetable bank (128 tables) 2 MB 4 ms
1s stereo 48kHz sample 192 KB 0.4 ms
DX7 sysex patch bank 4 KB instant

Large sample banks load mid-performance in milliseconds. 1080p UI becomes viable — the 4MB framebuffer transfers in 8ms, leaving 8ms for rasterization within a 16.7ms frame budget.

FPGA → CPU (MIPI link B, ~500 MB/s):

Stream Purpose
Mixed stereo audio (384 KB/s) Lossless resample-own-output — no ADC, no analog roundtrip
Raw waveform, all 128 voices Oscilloscope / FFT display at full resolution
DSP state bulk (4 KB/frame) All envelope/LFO values for rich UI animation
ADC input stream Live sample capture at full rate
Framebuffer readback Screenshot, video stream, network output

The audio stream back to CPU is the key feature here — FireStorm streams the mixed stereo output to the CPU at 384 KB/s (trivial on a 500 MB/s link). The resample function captures this directly to DRAM with zero quality loss: no DAC, no ADC, no analog stage, no quantisation error. It is a perfect digital copy.

Two buses, two roles — summary:

QSPI MIPI
Bandwidth ~40 MB/s ~500 MB/s each direction
Duplex Half Full simultaneous
Purpose Control plane Data plane
Use Register writes, render trigger Draw lists, samples, audio stream
Latency ~100ns per write Buffer latency (ms range)
Status Current Future

Pulse ↔ FireStorm interconnect:

Pulse (RP2350) connects to FireStorm via two independent interfaces:

  • QSPI bus — control plane, same NOR addressing mode as the SG2002 big core. Pulse reads/writes FireStorm registers directly: voice parameters, audio engine control, sequencer state. Low latency, memory-mapped from Pulse's perspective.

  • 4-line PIO fast transfer bus — four GPIO lines running custom RP2350 PIO state machines, forming a dedicated high-bandwidth data path between Pulse and FireStorm entirely separate from QSPI.

    At RP2350 PIO speeds with 4 parallel lines, this bus can sustain well over 100 MB/s with a tight custom protocol — far beyond what QSPI can offer for streaming workloads. This is the data plane between Pulse and FireStorm, complementing QSPI's role as the control plane.

    What the 4-line bus carries:

    • Live sample data from FireStorm's ADC input → FireStorm SRAM (Pulse triggers and controls the capture; FireStorm's codec is the source)
    • Bulk voice parameter updates when many voices change simultaneously (e.g. patch change affecting all 128 voices at once)
    • Sequencer event bursts — dense note-on/off blocks at the start of a bar where QSPI's per-transaction overhead would add up
    • Live sampling data — ADC capture streamed from Pulse → FireStorm SRAM at full sample rate without CPU involvement
    • Wavetable or sample block transfers from Pulse's PSRAM → FireStorm SRAM

    The PIO protocol is fully defined in Pulse firmware — the RP2350 PIO state machine handles the bit-level framing. FireStorm implements the receiver logic in FPGA fabric. Both sides are deterministic and clock-accurate.

Pulse responsibilities:

  • MIDI receive/transmit (DIN on Ant64C, USB MIDI all models)
  • Jog dial quadrature decode (encoder events → SG2002 big core via mailbox)
  • RGB LED state (WS2812B chain, DMA-driven)
  • Joypad / controller input (USB host)
  • Audio sequencer (pattern playback, timing, voice triggering via QSPI + PIO to FireStorm)
  • Retro speech synthesis — SAM-style formant speech engine (see below)

Note: audio codec (WM8958/WM8960) is on FireStorm directly, not Pulse. Pulse has no audio I/O path — it is purely a control and sequencing processor. Speech audio is rendered by Pulse and streamed to FireStorm for output.


Trigger / CV Inputs — 4× 3.5mm TS Jacks

Four trigger/CV input ports on the jog dial controller (the Rotary satellite MCU, an ATtiny series AVR connected to Pulse via I2C). The jog dial controller runs at 5V natively, making 5V-compatible inputs straightforward.

Connector: 3.5mm TS mono jack — industry standard. Compatible with:

  • Eurorack gate/trigger (0–10V)
  • Teenage Engineering sync cables
  • Roland/Korg sync
  • Drum machine trigger outputs
  • Footswitches
  • Any CV gate source made in the last 40 years

Per-port pinout:

Pin Signal
Tip Input signal
Sleeve GND
(separate pin on PCB) 5V out — powers passive sensors, LEDs, contact closures

Protection circuit per input — NPN transistor buffer:

External signal (0–12V supported)
    │
   [R1 10kΩ] ── NPN base (2N3904 / BC547, ~3p each)
                    │              │
                Collector      Emitter ── GND
                    │
               [R2 10kΩ] ── 5V
                    │
              ATtiny GPIO ── [100nF cap to GND]
  • R1 limits base current — safe at 0–12V input (Eurorack 10V gate: 1mA, well within ratings)
  • Transistor saturates on high input → GPIO pulled LOW (inverted in firmware)
  • MCU GPIO only ever sees 0 or 5V — fully isolated from external voltage
  • Below ~0.6V threshold → transistor off → GPIO HIGH — excellent noise immunity
  • 100nF cap provides hardware debounce for footswitches

The ATtiny's ADC mode (not just digital) gives:

  • Configurable threshold detection in firmware
  • Rough velocity from trigger slope (crossing speed)
  • Input 3 or 4 can optionally act as 0–5V CV input for pitch/filter/parameter modulation, not just gates — configurable per port in AntOS

BOM per port: 2 resistors + 1 transistor + 1 capacitor + 1 jack = ~£0.15 Total for 4 ports: ~£0.60 additional component cost

Firmware routing (configurable in AntOS per port):

Function Description
Sidechain trigger Duck specified voices on trigger — kick pumping effect
Secondary sidechain Independent sidechain for snare gating, etc.
External clock / sync Replace internal BPM — lock to drum machine, Eurorack
Note trigger Trigger a specific voice or sequencer step
Punch-in effect Activate KO II-style effect on trigger
Scene commit Trigger a scene advance — hands-free live performance
Record arm Footswitch to arm/disarm sampling without touching the machine
CV gate 0–5V modulation → any FireStorm parameter via mod matrix

Default suggested mapping:

  • Input 1 → Sidechain A (kick)
  • Input 2 → Sidechain B (snare / clap)
  • Input 3 → External clock sync
  • Input 4 → Footswitch / record arm

All remappable in AntOS.


Pulse Speech Synthesizer — SAM Style

Pulse runs a retro formant speech synthesizer in the tradition of S.A.M. (Software Automatic Mouth, 1982) — the iconic robotic voice of the Commodore 64, Atari 8-bit, and Apple II era. The RP2350 at 300MHz has enormous headroom for this alongside its sequencing duties; the original SAM ran on a 1MHz 6502.

What SAM-style synthesis is:

SAM converts text to speech using two stages:

Text input
    │
    ▼
Reciter (text-to-phoneme)
    Text → phoneme string using English pronunciation rules
    e.g. "HELLO" → /HH EH L OW/
    │
    ▼
Phoneme-to-speech (formant synthesis)
    Each phoneme → three formant frequencies (F1, F2, F3)
                 → three formant amplitudes (A1, A2, A3)
                 → pitch contour
                 → voiced/unvoiced classification
    Combined via additive synthesis → audio samples
    │
    ▼
PCM audio buffer → streamed to FireStorm via PIO fast transfer bus

The formant synthesis produces the characteristic 1980s robotic voice — not because the algorithm is simple, but because it is computing speech with very limited resolution (8-bit samples, low sample rate) in a deliberately constrained way. On the Ant64, running at full RP2350 speed with the output going to the WM8958 codec, the same algorithm produces a much cleaner sound — authentically SAM-like but with modern fidelity.

Implementation on Pulse:

The open-source C port of SAM (s-macke/SAM, ~39KB) runs directly on the RP2350. No OS needed — a bare-metal C function call. The pipeline:

  1. AntOS or the music app sends a text string to Pulse via SPI/UART
  2. Pulse's Reciter converts text → phoneme string
  3. Phoneme engine renders PCM samples into a buffer in Pulse's PSRAM
  4. Completed buffer streamed to FireStorm via the 4-line PIO fast transfer bus
  5. FireStorm mixes speech into the audio output alongside music voices
  6. Result: speech comes out of the internal speakers, phono out, HDMI, or optical

Parameters (matching original SAM):

Parameter Range Effect
Speed 0–255 Talking rate (72 = normal SAM)
Pitch 0–255 Voice fundamental frequency (64 = normal)
Throat 0–255 Formant F1 shaping — voice quality (128 = normal)
Mouth 0–255 Formant F2 shaping — vowel colour (128 = normal)

Preset voices (matching original SAM manual):

Voice Speed Pitch Throat Mouth
SAM (default) 72 64 128 128
Elf 72 64 110 160
Little Robot 92 60 190 190
Stuffy Guy 82 72 110 105
Little Old Lady 82 32 145 145
Extra-Terrestrial 100 64 150 200

Musical uses beyond basic speech:

SAM can sing — phoneme durations and pitch can be driven by the sequencer, turning the speech engine into a vocal synthesizer:

  • Vocoder-style leads: sequence phoneme events from the pattern sequencer, drive pitch from MIDI note data, get a singing robot voice
  • Arpeggio speech: rapid phoneme triggering at musical intervals
  • Daft Punk style: pitch-shifted formant speech over a chord progression
  • System announcements: AntOS can use Pulse speech for boot messages, error alerts, patch names spoken aloud — authentic 8-bit computer character
  • Live performance: jog dials control Speed/Pitch/Throat/Mouth in real time while the sequencer drives the phoneme stream

Extension beyond SAM:

The RP2350 has far more processing power than a 6502. The speech engine can be extended beyond the original SAM algorithm:

  • Higher sample rate output (original SAM: ~7kHz; Pulse can do 48kHz)
  • Smoother formant interpolation between phonemes
  • Additional phoneme sets (non-English languages)
  • Pitch envelope per phoneme (more natural intonation)
  • The SG2000's TPU could assist with neural TTS for more natural speech, while Pulse handles the retro formant mode independently

IPC — CPU ↔ Pulse:

  • Jog dial events: Pulse sends encoder events to SG2002 big core via mailbox
  • MIDI events: Pulse forwards to big core for sequencer and UI feedback
  • LED state: big core writes to Pulse LED buffer (RGB for jog dials)

IPC — Big core ↔ Little core (AntOS):

  • Hardware mailbox registers: memory-mapped, interrupt-driven, non-blocking
  • App → AntOS: DBFS load/save requests, sample load, MIDI route config
  • AntOS → App: patch data ready, jog dial events, MIDI events, system events

Consistent UI conventions (all pages):

  • Dial 1/2 = cursor X/Y navigation without mouse
  • Dial 3–8 = context-sensitive parameter adjustment for current selection
  • Any dial push = confirm / select / toggle
  • Mouse primary for freehand drawing (waveform, harmonic sliders)
  • MIDI keyboard = note input in Page R piano roll

Retro aesthetic option: Green-on-black phosphor rendering mode, selectable per page. Implemented as an ImGui style override — background #001400, foreground #00FF41, custom draw list colours. Not a gimmick: the Fairlight's visual identity is inseparable from its cultural impact, and having it as an option is a genuine homage. The music app ships with both a modern colour theme and the classic phosphor theme.


Competitive Comparison and Cost Analysis

What Compares to the Ant64?

The honest answer is: nothing currently in production does. The Ant64 occupies a category that has no living occupant. To understand why, it helps to look at what each existing product does well, then see how many of them you would need to buy to match the Ant64's combined capability.


The Closest Competitors — and Their Gaps

Waldorf Kyra (FPGA, 128 voices, VA+WT)

The closest thing in voice count and FPGA architecture. Discontinued August 2023. Was the only other FPGA-based synthesizer with 128+ voices at production scale.

What it had: 128 voices, 8-part multitimbral, excellent VA sound, solid build. What it lacked: No FM, no sampling, no computer, no video, no speech synthesis, no DIN MIDI (desktop module only), poor software support that ultimately killed it, no open platform. Retail was ~€1,600–2,000 and is now gone.

Waldorf Quantum MK2 (hybrid, 16 voices)

The previous gold standard for synthesis depth per voice. Discontinued April 2025.

What it had: Excellent hybrid architecture, granular sampling, wavetable, live audio input, polyphonic aftertouch. What it lacked: Only 16 voices, no FM, no video, no computer, no DIN MIDI, closed platform, ~€4,800.

Sequential Prophet X (hybrid, 16 voices, ~€3,800)

Best-in-class sample+synthesis combination, excellent analog filters.

What it had: 150GB sample library, analog filters per voice, solid build quality. What it lacked: No live audio input (USB only for samples), 16 voices, no FM, no video, no computer, no speech synthesis, no open platform.

Access Virus TI2 (digital VA, 80 voices, ~€3,000)

The long-reigning high-polyphony VA synthesizer. Discontinued but still used widely.

What it had: 80 voices, excellent VA engine, solid FM-like features. What it lacked: No sampling, no video, no computer, no speech synthesis, closed.

Fairlight CMI (historical, 1979–1985)

The only machine that historically combined sampling, synthesis, sequencing, and visual editing in a single system with a comparable philosophy.

What it had: Everything the Ant64 draws inspiration from — visual waveform drawing, graphical sequencer (Page R), sampling, synthesis, the light pen interface. What it cost: £20,000–50,000 at launch (equivalent to £150,000–400,000 today). What it lacked vs Ant64: FM synthesis, 128+ voice polyphony, MIDI (early models), optical audio, retro speech synth, RGB control surface, open hackable platform.


The Stack You Would Need to Match Ant64C

Ant64C capability Closest equivalent Price (2026)
128-voice VA+WT synth Waldorf Kyra (discontinued) ~€1,600 used
Full FM synthesis (6-op+) Yamaha Montage M / Modx+ ~€2,000+
Live sampling + S&S engine Sequential Prophet X ~€3,800
Granular synthesis Waldorf Quantum MK2 (discont.) ~€3,500 used
DIN MIDI In/Out/Thru iConnectivity mioXL hub ~€400
Multi-track hardware sequencer Squarp Pyramid MK3 ~€700
303 acid engine Roland TB-03 ~€350
Video synthesizer / visualiser Critter & Guitari EYESY ~€500
Fairlight-style visual editor Nothing available
Retro speech synthesizer Nothing available
Home computer (AntOS, coding) Raspberry Pi 400 ~€70
Internal stereo speakers External monitors ~€100
Optical audio out External DAC ~€100
Total 10+ separate devices ~€13,000+

And that stack still doesn't give you: all synthesis engines layerable per voice, the RGB jog dial performance surface, DX7 sysex import, Fairlight-style waveform drawing, the open FPGA bitstream, the FireStorm EE custom execution engine, or the unified AntOS operating system tying everything together.


The Waldorf Kyra Is the Most Direct Point of Comparison

The Kyra was ~€1,600 at clearance for a 128-voice FPGA synth with no computer, no FM, no sampling, no video, no speech synthesis, and no active development. It was discontinued because it never fulfilled its potential — the FPGA was capable of far more than Waldorf ever shipped.

The Ant64 targets everything the Kyra had plus everything it never got plus things nobody has attempted in a single product. It is also open — the FPGA bitstream, the FireStorm EE ISA, AntOS — none of it is locked down. The community can add synthesis engines. The Kyra community could only petition Waldorf for updates and wait.


Why No One Has Done This Before

The combination that makes the Ant64 unique has been technically possible for several years but has not been commercially attempted because:

  1. Commercial synth companies fear cannibalising existing product lines — a truly open, hackable synthesizer that does everything undermines upsell.

  2. FPGA expertise is rare in music hardware companies — most synth makers use DSP chips or CPUs, not FPGAs. GoWin making competitive 138k FPGAs at accessible prices is relatively recent.

  3. The home computer + synthesizer combination was last attempted in the 1980s (C64, Atari ST, Amiga) and the paradigm was abandoned as PCs and DAWs took over. Nobody has attempted to revive it with modern silicon until now.

  4. Video + audio integration in a single instrument has simply never been productised for musicians. LZX makes video synths for visual artists. Synth companies make audio synths for musicians. Nobody built the bridge.

Important: The Ant64 family of home computers are at early design/prototype stage, everything you see here is subject to change.