FireStorm

FireStorm is the CPU and chipset of Ant64 in a single FPGA. The "main CPU" and "custom chip" are the same silicon — the part of Ant64 where graphics, audio, and software are not just rendered, but performed.
CPU and Chipset in One
Most modern systems split work between a general-purpose CPU and a separate GPU or chipset, connected over a bus (PCIe, AXI, AHB). Every interaction across that bus costs latency and predictability. The Ant64 puts the CPU and the chipset on the same FPGA, sharing the same memory ports, the same scratchpad BSRAM, and the same clock domain.
This means:
- The CPU writes a sprite attribute and the next scanline reflects it.
- The CPU starts a blit and the blitter responds in the same cycle, with completion visible in a register before the next instruction.
- The CPU programs the Copper and the Copper modifies palette mid-scanline.
- The CPU is part of the timing graph alongside the audio engine and the display pipeline.
There is no driver. There is no kernel mode-switch. There is no "GPU command buffer". The CPU just uses the chipset, the way a 68000 used Paula and Denise.
FireStorm Execution Engine
The CPU at the heart of FireStorm is a 64-bit RISC-V core — fully RV64GC compatible so existing RISC-V code runs unmodified — plus a stack of seven custom extensions designed for the Ant64's specific workloads:
- Xcrisp — memory primitives (auto-inc, indexed addressing, memory-fused arithmetic, block memory, B-tree primitives, PIC family, compare-mem-branch)
- Xstack — hardware BSRAM stacks for U/S/M privilege
- Xcond — predicated R-type instructions
- Xlate — per-register memory translators (endian swap, bit-reverse, etc.)
- Xctx — hardware context switching (32 contexts)
- Xmath — games / audio / DSP math (fused MAC, saturating, transcendentals, BAM trig, vector bundles, 2D math, quaternion, distance heuristics)
- Xwide — 64-register file and wider immediates in wide mode
The microarchitecture is shallow out-of-order: in-order issue with register scoreboarding for out-of-order completion, both RVC-pair and 32-bit-pair dual-issue. DSP-block-backed multipliers. ~380 MHz on the GoWin GW5AST.
The Real-Time Chipset
Around the CPU on the same FPGA, FireStorm runs in lockstep with the video beam:
- Hardware sprites, layers, palettes, and raster effects managed directly in hardware
- Blitter primitives — hardware sprites, tilemaps, and more driven by Copper or by the FireStorm EE
- Objects can be placed on screen instantly and modified while a frame is being drawn
- Audio synthesis shares the same precise timing domain — sound and visuals stay perfectly synchronised
- Because it operates on scanlines rather than pixels, the system scales naturally — trade resolution for object count and effects as needed
- Dedicated hardware resources remain always available for overlays, cursors, and diagnostics
- Colony Connection — a high-speed connection between Ant64 computers and peripherals. Want to know more? Colony Connection
The Copper
The Copper is a beam-synchronised list processor — the Amiga's defining trick, brought back. It executes a list of operations tied to scanline position:
- Modify any chipset register at a specific scanline
- Modify palettes mid-scanline
- Inject commands into the command-list rendering layer
- Trigger FireStorm EE interrupts at exact beam positions
Combined with FireStorm's deterministic timing, this lets you do tricks that were impossible on the original Amiga: mix the CPU into the raster loop. The Copper can wake a paused FireStorm context (via Xctx HALT/RESUME) at a specific scanline, run a few hundred cycles of game code, and yield back before the next scanline.
System Frame-Sync Output
FireStorm's display compositor generates a system-wide vertical-sync signal as a dedicated hardware output — a single 3.3 V TTL line that pulses once per display frame, at the moment the last visible scanline has been drawn. The pulse means exactly: the previous frame is done; the compositor is ready to accept the next frame's input. It is the system's frame-rate heartbeat.
Logically the same event drives the analog VGA / SCART vertical-sync pin (see RGB Out), so the entire machine — internal supervisors, external displays, anyone watching the pin on a logic analyser — sees frame edges at the same instant.
FireStorm chipset draws frame N
│
▼
── vsync pulse (last visible line of frame N drawn) ──► to DeMon, Pulse,
│ VGA / SCART, …
▼
DeMon and Pulse push frame N+1 over MIPI (video + audio chunk)
│
▼
FireStorm composites frame N+1 (MIPI layers + chipset layers)
│
▼
── vsync pulse for frame N+1 ──► (repeats at the display refresh rate)
What the signal coordinates:
- DeMon and Pulse wait for vsync, then push the next frame of their MIPI display layer — UI + sprite + tilemap layers, composited locally in PSRAM by the 2D-PPA — and the next frame's audio chunk along the same link (SID / SAM on DeMon, AMY and emulator audio on Pulse). Between vsyncs the MIPI link is idle.
- Anything else with per-frame work — Copper instruction-list swap, Blitter next-frame job-queue swap, application-side per-frame logic — rendezvous on the same pulse if it wants frame-locked behaviour.
Typical supervisor framebuffers are small — about 320 × 240 in RGB888 or RGBA8888 (~225–300 KB) — well under 400 KB, and tiny against MIPI's per-frame budget (~6 MB per frame at 60 Hz from the 375 MB/s link, so a typical supervisor frame uses around 5 % of it). FireStorm scales each MIPI layer to the active output resolution via per-layer pixel replication (Render Small, Output Big), so a supervisor doesn't have to render at 4K or even 1080p.
Audio chunked per frame keeps audio and video in lockstep without a separate clock — one frame of audio at 48 kHz / 60 Hz is ≈ 800 stereo samples (~3 KB), trivial next to the video.
The FPGA
FireStorm is built on a GoWin GW5AST-138 FPGA (138K LUT4, 298 DSP blocks) on all three boards — Ant64S, Ant64, and Ant64C. The same bitstream architecture targets every board, with the same layer count, scratchpad, and wide-mode SRAM access; the models differ in DDR3 capacity and peripheral set, not in FPGA fabric.
The GW5AST-138 silicon also contains a hard RISC-V coprocessor — an 800 MHz 32-bit RISC-V CPU baked into the FPGA die itself, separate from the FireStorm soft core. It isn't a user-facing CPU: it runs resident system / debug firmware that monitors FireStorm internals at silicon speed (voice states, performance counters, error detection, register inspection), with zero LUT cost since it lives in silicon rather than fabric. See the FPGA reference for details.
Full technical details — pin maps, memory architecture, codec wiring, JTAG, hard coprocessor — are covered in the FPGA reference page.