FireStorm — FPGA

FireStorm is not a fixed "sound chip" or a simple video controller — it is a reconfigurable, multi-paradigm execution fabric at the heart of the Ant64, hosting the main CPU and the entire chipset on a single die.


Overview

FireStorm is built around the GoWin GW5AST-LV138Pg484A FPGA. It hosts:

  • The FireStorm Execution Engine — a 64-bit RV64GC-compatible RISC-V core with seven custom extensions, synthesised in FPGA fabric (a soft core, ~380 MHz)
  • A hard RISC-V coprocessor — an 800 MHz 32-bit RISC-V core embedded in the FPGA silicon itself. Not user-programmable as an application target — it runs system / debug firmware that monitors FireStorm internals at silicon speed. See Hard RISC-V Coprocessor below.
  • The graphics chipset — layered display pipeline, blitter, copper, rasteriser, raycaster
  • The audio engine — 128+ voice synthesiser + mixer, with codec driven directly from the FPGA
  • All bus arbiters, DMA controllers, and memory interfaces

The whole CPU-plus-chipset is one bitstream. Different bitstreams (personality cartridges) can change everything about the system — instruction set additions, chipset layouts, or even targeting a completely different ISA.

  • Combined with a WM8958 or WM8960 codec for audio I/O + optical digital audio out
  • Runs various hardware emulators — Amiga, Atari STE, SNES, NES, C64, VIC-20, PacMan, Spectrum, MSX2 / Yamaha VDP — implemented either as FireStorm-CPU code or as alternative bitstreams

Memory

  • DDR3 SDRAM — up to 2GB (model-dependent), DDR3-800; 64-bit wide (6.4 GB/s peak) — capacity (512MB / 1GB / 2GB) is the model differentiator, not bus width — direct FPGA access via dedicated controller
    • Ant64S: 512MB · Ant64: 1GB · Ant64C: 2GB
  • Wide-mode SRAM — up to 1M × 36-bit (~4.5MB) of pipeline-synchronous fast SRAM
  • On-chip BSRAM — ~765KB (GW5AST-138; 6,120 Kbits Block SRAM + 1,080 Kbits Shadow SRAM)
  • FPGA boot flash — config bitstream + read-only assets (system fonts, FM tables, wavetables, palette LUTs); shares its bus (clock + lower four IO lines) with the PSRAM below, distinguished by a separate CS
  • FPGA PSRAM16MB octal-DDR PSRAM (3.3 V, 133 MHz, ~266 MB/s; ISSI IS66WVO16M8DBLL-133 or AP Memory APS12808L-3OBMx), on the same boot bus with its own CS and its own upper IO + DQS lines. Primary role: ROM / cartridge / disk images for simulated systems — on a bus independent of DDR3 and SRAM, so personality ROM traffic runs in parallel without contending for either; also general scratch in native mode. See memory for the full treatment.

Display Inputs (MIPI RX)

The FPGA exposes two independent MIPI D-PHY receivers, each 2-lane plus differential clock, matching the ESP32-P4's MIPI DSI TX (2 lanes × 1.5 Gbps = 3 Gbps per supervisor):

  • MIPI RX #0 — fed by DeMon's ESP32-P4 (AntOS UI overlay layer)
  • MIPI RX #1 — fed by Pulse's ESP32-P4 (sequencer / mixer UI overlay layer)

For the return direction, FireStorm has no MIPI TX hardcell — the GW5AST silicon hardens MIPI on RX only. FireStorm instead drives LVDS TX pins into an external LVDS-to-MIPI CSI bridge (Lontium LT9211), one per supervisor, feeding a CSI input on each ESP32-P4 — an asynchronous ~375 MB/s readback path (video capture, telemetry, audio, bulk data) to DeMon and Pulse.

Each input becomes a compositable layer within the chipset's display pipeline, alongside FireStorm's own native layers (tilemaps, sprites, command list, vectors, etc.). The same physical link is also used for bulk data transfer from each supervisor into FireStorm's DDR3 when the framebuffer feed has idle bandwidth.

Security

The FPGA supports both 128-bit AES encrypted and unencrypted bitstream images. The key is stored in a locked, write-once secure location — once set it cannot be read or overwritten. This allows encrypted personality cartridges whilst still permitting unencrypted community images.

The AES key will be set before distribution — if not set first, anyone could lock it with their own key.

DeMon programs FireStorm via JTAG at boot. While the FPGA is configuring, DeMon's ESP32-P4 MIPI display feed drives the screen with a boot UI or recovery interface — so the user sees something on screen well before FireStorm itself is running.


FPGA Comparison

Feature eZX Spectrum Ant64S / Ant64 / Ant64C Next (KS1) Next (KS2) Next (KS3)
Device GW5AT-60 GW5AST-138 XC6LX9 XC7A15T XC7A35T
LUT4 59,904 138,240 9,152 16,640 33,380
Shadow SRAM (Kbits) 468 1,080 90 200 400
Block SRAM (Kbits) 2,070 6,120 576 900 1,800
Number of BSRAM 118 340 ?? 25 50
DSP multipliers 118 298 16 45 90
Hosts FireStorm EE ✓ (narrow + wide)
Wide-mode SRAM bus

The eZX Spectrum is a constrained sibling on the smaller GW5AT-60 — no FireStorm soft core and no wide-mode SRAM bank — so every Ant64 model (all now GW5AST-138) is a strict superset of it, which is what lets the eZX run as an Ant64 personality.


Hard RISC-V Coprocessor

Alongside the FireStorm EE (a soft core in FPGA fabric), the GW5AST-138 silicon contains a hard RISC-V CPU baked into the die itself.

Specifications

  • 800 MHz clock speed
  • 32-bit RISC-V architecture, base ISA + standard extensions
  • Hardware multiplier and divider (RV-M)
  • 32 KB I-cache + 32 KB D-cache, LRU replacement
  • 64 KB ILM (Instruction Local Memory) + 64 KB DLM (Data Local Memory) — tightly-coupled single-cycle SRAM embedded with the core (separate from the caches); deterministic-latency tier for firmware code and hot working data
  • Instruction address space: up to 256 MB
  • Data address space: up to 2 GB
  • XIP (eXecute In Place) — the core can fetch instructions directly from an attached SPI flash through its memory-mapped interface, with the 32 KB I-cache absorbing access latency. This makes running a real operating-system kernel (e.g. FreeRTOS) on the hard core practical: code lives in cheap, large SPI flash; ILM holds the hot path (ISRs, critical sections); DLM holds task control blocks and working state.
  • Hardware Performance Monitors (HPM) — cycle, retired-instruction, cache-miss, branch-miss, and configurable event counters
  • Memory Management Unit (MMU) — virtual-memory support (address translation, paging) for the firmware running on the core
  • Physical Memory Protection (PMP) entries — region-based access control independent of the MMU
  • Hardware stack protection extension — overflow detection in silicon
  • Zero LUT cost — it occupies no programmable fabric; it's free silicon, already there

This is a fully-featured application-class RV32 CPU on silicon, not a stripped-down microcontroller. With XIP-from-flash for code, ILM for hot paths, DLM for state, and an MMU/PMP/HPM/stack-protection feature set, it's perfectly capable of hosting a real RTOS. The capability budget significantly exceeds what its current debug-and-monitor role on the Ant64 requires — there is substantial headroom for future expansion of the firmware's responsibilities.

Memory Hierarchy

The hard core has four distinct memory tiers:

  1. Tightly-coupled local memory (ILM/DLM) — 64 KB instruction + 64 KB data, single-cycle access, deterministic latency. Used for the resident firmware's hot path (interrupt handlers, debug-packet assembly, ring buffers, performance-counter accumulators). Private to the hard core; does not appear in FireStorm's address space.

  2. Cached local memory access — 32 KB I-cache + 32 KB D-cache (LRU) backing a wider address space (256 MB instruction / 2 GB data) for firmware too large for ILM. Code stored in FPGA flash or chipset BSRAM can be fetched and cached; large diagnostic datasets can live in caches over wider memory.

  3. XIP from SPI flash — the core can execute directly from an attached SPI flash through its memory-mapped interface, with the I-cache absorbing fetch latency. This is the realistic home for a full firmware image: a complete FreeRTOS kernel + tasks fits easily, leaving ILM as a fast-path cache for ISRs and critical sections. SPI flash is cheap, dense, and a perfectly comfortable medium for firmware that doesn't need DDR3-scale storage.

  4. Shared DDR3 (configurable) — the GW5AST-138 supports a Shared DDR3: Logic R/W Mode option that allows both the hard core ("MCU side") and FPGA fabric ("FireStorm side") to access DDR3 simultaneously through an arbiter. The address bus is 32-bit, giving the hard core access to the full DDR3 capacity (1 GB on Ant64, 2 GB on Ant64C). When this mode is enabled on the Ant64, the hard core can:

    • Inspect FireStorm's DDR3-resident state in place — sample buffers, voice tables, frame-coherent debug snapshots — without round-tripping through QSPI
    • Share large diagnostic regions with FireStorm (FireStorm writes a buffer, the hard core reads and analyses it; or vice versa)
    • Hold large debug captures or telemetry buffers that wouldn't fit in DLM

    The mode is disabled by default in the GoWin IP configurator and must be explicitly enabled in the bitstream configuration. When disabled, DDR3 is owned exclusively by the FPGA fabric (i.e. FireStorm), and the hard core is restricted to ILM/DLM + I-cache/D-cache + XIP flash. Shared mode adds bus contention with FireStorm's DDR3 accesses, but the hard core's bandwidth requirement is light (occasional reads for inspection), so the impact in practice is minimal.

Role

In the current Ant64 design the hard core is not a user-facing CPU. AntOS doesn't run there and user code can't target it directly. Its job is to run resident system / debug firmware with privileged visibility into FireStorm's internals — typically structured as a FreeRTOS image stored in SPI flash and executed via XIP, with multiple tasks coexisting under the RTOS scheduler:

  • Voice-state monitoring task — which FireStorm voices are active, envelope states, DSP pipeline occupancy
  • Performance counter task — audio DSP cycle budgets, rasteriser triangle throughput, memory bus utilisation across DDR3 / SRAM / BSRAM (the hard core's own HPM CSRs add another set of silicon-level counters for its firmware's own use)
  • Error / assertion task — underruns, overflows, FIFO stalls, QSPI protocol errors, audio-engine assertion failures, configurable internal assertions that halt or flag specific voice states / timing violations / data-integrity issues
  • Register inspection task — read or write any FireStorm chipset register from inside the FPGA, with no QSPI traffic, at silicon speed; responds to inspection requests from DeMon
  • Debug stream task — assembles status packets and sends them to DeMon over a dedicated internal channel; DeMon forwards them to AntOS for display
  • (Optional) Shared-DDR3 inspector task — when Shared DDR3 mode is enabled, walks DDR3-resident state on request without disturbing FireStorm

The MMU, PMP, and stack-protection features support running this firmware safely — tasks can be isolated, FreeRTOS's own kernel can be protected, and runaway code paths get caught at silicon level during development.

Why It Matters

Debugging an FPGA design traditionally means either external logic analyser probes (slow, invasive, limited bandwidth) or instantiating an internal debug core in fabric (consumes thousands of LUTs that would otherwise serve audio DSP, rasteriser logic, or voice count). The hard core is a third option: a resident application-class RV32 processor with full silicon-speed visibility into FireStorm's internals, consuming no programmable fabric, communicating through DeMon's existing JTAG and the AntOS debug server.

In production use it remains active — surfacing meaningful diagnostics to AntOS (which voice underflowed, which DMA stalled, what cycle budget the current frame consumed) instead of opaque hardware failures.

Possible Future Direction — Software CPU Personalities

The hard core's current role is debug and monitoring only. But the 800 MHz RV32 core is a genuine application-class processor, and one plausible future use is hosting software-based CPU personalities.

Today a personality cartridge recreates a guest CPU (Z80, 68000, 6502, …) as a fabric softcore synthesised into the FPGA. An alternative, for guests where a software emulator is preferable to a fabric softcore, would be to run the emulator on the hard core rather than in fabric: the 800 MHz clock, 32 KB I/D caches, and tightly-coupled ILM/DLM are well suited to an interpreter or dynarec hot loop, and Shared DDR3 mode gives the emulator direct access to guest memory alongside FireStorm's chipset. That would let a software CPU personality run at silicon speed while leaving the FPGA fabric free for the chipset rather than spending LUTs on a softcore.

This is a future possibility, not a current capability — the hard core ships running debug firmware, and nothing about software CPU personalities is implemented yet. It's noted here because the hardware headroom exists: the same silicon that watches FireStorm during development could later carry emulated guest CPUs in production.

Distinction from FireStorm

Aspect FireStorm EE Hard RISC-V Coprocessor
Where FPGA fabric (soft core) FPGA silicon (hard core)
Architecture RV64GC + 7 custom extensions RV32 base + M; MMU + PMP + HPM; hardware stack protection
Clock ~380 MHz 800 MHz
L1 caches none (Xcrisp prefetch buffers, BSRAM scratchpads) 32 KB I-cache + 32 KB D-cache (LRU)
Local memory n/a 64 KB ILM + 64 KB DLM tightly-coupled
Address space 64-bit (DDR3 + SRAM + BSRAM + chipset MMIO) 256 MB instruction / 2 GB data (32-bit); can reach DDR3 via Shared DDR3 mode
Programmable by user yes — applications, games, demos no — system / debug firmware only
Visible to AntOS yes, as the application CPU only as a diagnostic data source
Configurable bitstream-defined; personality cartridges can replace it fixed (silicon)

The two are architecturally independent. FireStorm runs the user-facing software stack at its own clock and has its own memory subsystem; the hard core runs its debug firmware at its faster silicon clock and watches the chipset from the inside. There is no shared instruction stream, no shared register file, and no scheduler coupling between them. The one point of physical sharing is DDR3, when Shared DDR3: Logic R/W Mode is enabled — both can read/write DDR3 concurrently through the arbiter, which is what makes the hard core's diagnostic role on the Ant64 substantially more capable. Even there, the coupling is just bus arbitration; there's no software-visible interaction unless the firmware on each side cooperates by convention (e.g. a shared debug-buffer region).

Connections to the Rest of the System

The FPGA has several external interfaces to the supporting chips:

Interface Connected to Purpose
JTAG DeMon (ESP32-P4) FPGA bitstream programming, flash access, FireStorm debug
QSPI (register window) DeMon, Pulse, optional accelerator Each gets its own FRAM-mapped register window
MIPI RX #0 (2-lane + clock) DeMon ESP32-P4 MIPI TX AntOS UI display feed + bulk-data path
MIPI RX #1 (2-lane + clock) Pulse ESP32-P4 MIPI TX Sequencer UI feed + bulk audio data
LVDS TX → DeMon (via LT9211 bridge) DeMon ESP32-P4 MIPI CSI FireStorm → DeMon return: capture, telemetry, audio, bulk data
LVDS TX → Pulse (via LT9211 bridge) Pulse ESP32-P4 MIPI CSI FireStorm → Pulse return: mixer readback, telemetry, bulk data
FPGA expansion bus Pi Zero 2W (optional) Accelerator data path
GPIO / status DeMon Reconfig, READY, DONE, reset, interrupt
Audio I2S WM8958 / WM8960 codec Audio output
Video out HDMI / VGA / DP transmitters Display

JTAG

Signal Lines

Signal Direction Description
TCK In (from DeMon) Clock
TMS In Mode select
TDI In Data in
TDO Out Data out

Status Pins

  • READY
  • CONFIG
  • DONE

DeMon (ESP32-P4) drives JTAG, monitors status pins, and can re-configure the FPGA on demand. The same JTAG chain also gives DeMon access to FPGA flash for asset updates.

On the Tang Primer 25K prototype board, the debug test points are the JTAG lines.


QSPI Register Windows (FRAM-style)

The main supporting chips — DeMon, Pulse, and the optional accelerator (Pi Zero 2W) — interface to FireStorm via a FRAM (Fake QSPI RAM) register window. Hosts map this directly into their memory space:

  • All chips get an aliased view into FireStorm's chipset register file
  • Each FRAM16 interface leaves 12MB spare per chip in the memory map
  • Independent windows — each chip can have a different mapping and different access permissions

This is the same model as classic memory-mapped peripherals: each external chip writes to "FireStorm address X" and the chipset register at address X updates. Programming model identical to POKE on a C64 or peek/poke on the Amiga.

Both DeMon and Pulse — now both ESP32-P4-based with similar internal architecture — drive a MIPI feed into the FPGA in addition to their QSPI register windows. The two MIPI feeds become composited display layers within the FireStorm chipset (one per supervisor), giving each its own real-time UI surface alongside the main application output.


Audio Codec

The WM8958 or WM8960 codec is wired to an I2S output, with I2C controlled by Pulse. The DAC drives:

  • Headphone / microphone connection
  • 2× internal speakers
  • Audio sampler connections
  • Phono line out

WM8958 — I/O Routing

Input Source Output Destination
L1 (in1) Headphone microphone (mono) LHP (2nd DAC) Headphone L — 5.3mW
R1 (in1) RHP (2nd DAC) Headphone R
L2 (in1) LSAM — Sampler left LSPK (1st DAC) L internal speaker — 2W
R2 (in1) RSAM — Sampler right RSPK (1st DAC) R internal speaker
L3 (in2) (unused — reclaimed) L line out 1
R3 (in2) (unused — reclaimed) R line out 1
L4 (in2) L line out 2
R4 (in2) R line out 2

AIF1 is connected directly to FireStorm.

WM8960 — I/O Routing

Input Source Output Destination
L1 Headphone microphone (mono) LHP Headphone socket L
R1 RHP Headphone socket R
L2 LSAM — sampler socket left LSPK Left internal speaker
R2 RSAM — sampler socket right RSPK Right internal speaker

These L3/R3 inputs are free for other uses (e.g., Pi Zero 2W audio if the accelerator generates audio).


Prototype 1 — Tang Primer 25K

Prototype 1 features: HDMI out · 64MB 5ns SDRAM · 8MB QSPI NOR Flash · ESP32-P4 MIPI interface · I2S raw audio from Pulse · VSync out · HSync out · Pixel clock out · Pulse Sequencer SPI master · I2C system master (all chips).

(Pinouts unchanged from previous prototype board; see hardware schematics for details.)


Special Pins

Pin Default Description
IOB01A RECONFIG_N Global reset for GowinCONFIG logic — active low
IOB33B EMCCLK Optional external clock input for master mode configuration. Ignored in JTAG/slave modes.
IOB37A READY High = device can be programmed; Low = device cannot be programmed
IOB37B MCS_N/CSO_B MSPI mode: enable signal, active-low
IOB52A MODE1 Set to 0 via resistor — Master mode, FPGA reads config from external Flash via SPI
IOB54A MI2 MSPI X4 mode: parallel data bit 2 — connects to DQ2/WP#/IO2 of external Flash
IOB54B MI3 MSPI X4 mode: parallel data bit 3 — connects to DQ3/HOLD#/IO3 of external Flash
IOB58A MISO MSPI X1: serial data input. X2/X4: parallel data bit 1 — connects to DQ1/SO/IO1
IOB58B MOSI MSPI: serial instruction/address output. X2/X4: parallel data bit 0 — connects to DQ0/SI/IO0
IOB62A CCLK Slave mode: clock input from external source. Master mode: clock output
IOB62B MODE0 Set to 1 via resistor — Master mode, FPGA reads config from external Flash via SPI
IOB64A DONE Input: low = delay activation until high. Output: high = config complete; low = config failed or incomplete
IOR01A TCK JTAG: serial clock input
IOR01B TDI JTAG: serial data input
IOR03A TMS JTAG: serial mode input
IOR03B TDO JTAG: serial data output
IOT29A PUDC_B Active-low weak pull-up selection during configuration. Low = all GPIOs weak pull-up; High = all GPIOs high impedance. Must not be left floating.

Reference Links

Important: The Ant64 family of home computers are at early design/prototype stage, everything you see here is subject to change.