Ant64 Memory Architecture

Overview

The Ant64 family's memory architecture centres on the FireStorm FPGA, which hosts both the main CPU (a 64-bit RISC-V core) and the entire chipset (graphics, audio, blitter, copper, raycaster). Because CPU and chipset live in the same silicon, they share the same memory ports — but the architecture uses multiple physically-separate memory banks so that critical bandwidth consumers (instruction fetch, blitter, audio DSP, scratchpad-resident data) don't contend for the same bus.

There is no separate main-CPU DRAM — the FireStorm Execution Engine runs from the FPGA's DDR3 and the wide-mode SRAM bank, sharing the same physical buses with the chipset blocks but with bus arbiters tuned for predictable real-time behaviour.

On-Chip BSRAM

GoWin GW5AST-138: 340 BSRAM blocks × 18 Kbits = ~765KB total.

Dual-port, up to 380 MHz, 72-bit data width — ~3.4 GB/s per block at full speed. Used for:

Sprite line buffers, tilemap data, clip tables, palette RAM
CRT simulation LUTs, texture cache hot set
Copper lists, command-list buffers
FireStorm-specific structures: Xstack BSRAM stacks (U/S/M), Xctx hardware contexts, scratchpad for hot data structures, FireStorm prefetch buffers (no I-cache)

Hard RISC-V Coprocessor Local Memory

The hard RISC-V coprocessor embedded in the GW5AST-138 silicon (see Hard RISC-V Coprocessor) has its own memory hierarchy inside the silicon, separate from BSRAM and from FireStorm's memory subsystem:

Tightly-coupled local memory (private to the hard core):

64 KB ILM (Instruction Local Memory) — holds the hard-core debug / monitor firmware's hot path (interrupt handlers, debug-packet assembly, ring-buffer code); single-cycle fetch, deterministic latency
64 KB DLM (Data Local Memory) — working state (debug ring buffers, performance-counter accumulators, accumulated diagnostics); single-cycle access

ILM and DLM are private to the hard core — they do not appear in FireStorm's address space, and FireStorm cannot read or write them directly.

Cached wider memory access (also private):

32 KB I-cache + 32 KB D-cache with LRU replacement
256 MB instruction address space
2 GB data address space (32-bit address bus)

The cache hierarchy lets the hard core run firmware too large to fit in ILM and process diagnostic datasets too large to fit in DLM.

XIP from SPI flash:

The hard core supports XIP (eXecute In Place) from an attached SPI flash through its memory-mapped interface. This is the realistic home for a full firmware image: the FreeRTOS kernel + monitoring / debug / telemetry tasks live in cheap, dense SPI flash, are fetched on demand via the SPI interface, and the 32 KB I-cache absorbs SPI access latency to keep execution fluent. ILM stays available for the hottest code paths (ISRs, critical sections) and DLM for FreeRTOS task control blocks and working state.

This is what makes the hard core practical to use: it's not a fixed-firmware monitor that has to fit in 64 KB ILM, it's a real RTOS-hosted firmware target with caching, MMU, PMP, and proper task structure.

Shared DDR3 (configurable):

The GW5AST-138 silicon supports an optional Shared DDR3: Logic R/W Mode that lets both the hard core ("MCU side") and the FPGA fabric ("FireStorm side") read/write DDR3 concurrently through the same arbiter. The hard core's 32-bit data address space maps cleanly onto Ant64's DDR3 (1 GB on Ant64, 2 GB on Ant64C).

When this option is enabled in the bitstream configuration:

The hard core can inspect FireStorm's DDR3-resident state in place — sample buffers, voice state tables, large diagnostic snapshots — without round-tripping through QSPI
FireStorm and the hard core can share regions by convention — FireStorm writes a debug buffer, the hard core reads and analyses it (or vice versa) — no IPC protocol needed beyond the agreed-on memory layout
The hard core's firmware can grow beyond 64 KB ILM by being placed in DDR3 and cached through its L1s

The option is disabled by default and must be explicitly turned on in the GoWin IP configurator. When disabled, DDR3 belongs exclusively to FireStorm and the hard core is confined to ILM/DLM + cached BSRAM/flash. Shared mode introduces some bus contention with FireStorm's DDR3 accesses, but the hard core's bandwidth use is light (occasional reads for inspection), so the impact is minimal in practice.

Diagnostic data still flows primarily outward — the hard core observes FireStorm's chipset registers from inside the FPGA, accumulates state in DLM, and writes debug packets out over the dedicated internal channel to DeMon. Shared DDR3 is an additional capability for richer inspection, not a replacement for that primary path.

Wide-Mode SRAM — 1M × 36-bit

A single bank of pipeline-synchronous fast SRAM, mapped as 1M × 36-bit (~4.5MB) on its own dedicated bus, exclusively for FireStorm wide-mode code:

Parameter	Total
Organisation	1M × 36-bit
Capacity	~4.5MB
Clock speed	200 MHz
Bandwidth	~900 MB/s
Access mode	Pipeline — 1-cycle latency, burst
Wait states	None

Pipeline mode: address on cycle N, data valid on cycle N+1. Burst mode: sequential addresses stream at full 200 MHz clock rate after the initial pipeline fill — ideal for sequential instruction streaming. No wait states, no refresh, no page boundaries, no bus turnaround penalty.

The 36-bit word width is matched to FireStorm's wide-mode instruction format exactly. 32 bits encode a standard RV64 instruction; the extra 4-bit nibble provides the register-extension prefix that doubles the CPU's architectural register file from 32 to 64 GPRs and 32 to 64 FPRs (see FireStorm EE §7).

This bus is exclusively FireStorm's instruction fetch bus. No other device shares it. FireStorm fetches wide-mode instructions at full burst bandwidth regardless of what the blitter, audio DSP, or any other subsystem is doing.

The 36-bit wide-mode SRAM bank backs wide mode and its wide-mode-only features (Xcond predication, wide register file, indexed loads).

FireStorm DDR3

The main DDR3 bank is shared between the FireStorm EE and the chipset blocks (blitter, audio DSP). All models use the same DDR3 speed grade; they differ in capacity and bus width:

DDR3-800 (800 MT/s, 400 MHz I/O clock with double-data-rate transfer)

Model	Capacity	Bus width	Peak bandwidth	Sustained
Ant64S	512 MB	64-bit	6.4 GB/s	~4.0–5.0 GB/s
Ant64	1 GB	64-bit	6.4 GB/s	~4.0–5.0 GB/s
Ant64C	2 GB	64-bit	6.4 GB/s	~4.0–5.0 GB/s

DDR3 is a 64-bit bus at 6.4 GB/s peak, with the 36-bit wide-mode SRAM bank handling instruction fetch off it. The only memory difference between models is DDR3 capacity (512 MB / 1 GB / 2 GB); the bus width, the SRAM bank, and the instruction-fetch path are the same.

6.4 GB/s peak gives substantial headroom — even pathological mixed workloads (heavy rasteriser + large sample streaming + dense CPU working set) stay well below saturation. Typical mixed workloads run at 1.5–2 GB/s, leaving most of the budget unused, and the BSRAM scratchpad and wide-mode SRAM absorb the hottest accesses on top of that.

Used for:

FireStorm narrow-mode code — vanilla RV64GC binaries and .text sections (full address range)
FireStorm data — heap, stack overflow, application memory, file system cache
Art library, audio sample banks, streamed audio, bitmap framebuffers
Cartridge / SD-streamed assets

A simple 8 KB direct-mapped write-through D-cache covers DDR3 data accesses for FireStorm, catching the patterns the BSRAM scratchpad can't predict (pointer chasing, hash buckets, dynamically-allocated data). Streaming code uses cache-bypass via bit 63 of the address to avoid polluting the cache.

Arbitration between FireStorm code/data fetches, blitter texture reads, and audio DSP DMA is handled by the FireStorm DDR3 arbiter. The arbiter prioritises:

FireStorm EE instruction fetch (narrow-mode, or wide-mode misses not served by the SRAM bank)
FireStorm D-cache fills
Audio DSP DMA (real-time priority)
Blitter sequential reads (burst-friendly)
Background DMA from Xcrisp DMACPY operations

FPGA Flash + PSRAM — Third Memory Bus

FireStorm's third external memory bus carries two devices that share their lower data lines and clock but have separate chip-select lines: the configuration / asset flash and a 16MB octal-DDR PSRAM (covered below). The flash and the PSRAM share the clock and the lower four IO lines (IO0–3) — the lines the FPGA boots from; the PSRAM adds its own upper four IO lines (IO4–7) plus a DQS strobe for full x8 operation, and each device has its own CS#. Only one device is selected at any instant.

The flash is a read-only asset store for built-in system data that never changes at runtime:

Built-in wavetables — standard waveforms (sine, square, saw, triangle, noise variants)
FM voice data — operator configurations, envelope presets, algorithm tables
Standard font bitmaps — the system font and built-in character sets
Default palette data — factory palettes, CRT profile LUTs
Boot graphics — splash screen, AntOS boot assets

Flash read bandwidth is lower than SRAM but adequate for these workloads — font data and FM tables are small enough to be copied to BSRAM or DDR3 at boot time for faster runtime access if needed. The key advantage is that none of this consumes DDR3 capacity: static assets that never change live in flash; the fast RAM is reserved for scene data that does.

DeMon has access to the FPGA flash via the FPGA's JTAG chain — it can reprogram flash contents without the FPGA bitstream being involved.

FPGA PSRAM — 16MB Octal DDR

On that bus is a 16MB octal-DDR PSRAM (ISSI IS66WVO16M8DBLL-133 or AP Memory APS12808L-3OBMx — see below). It shares the clock and the lower four IO lines with the flash — the same lines the FPGA uses to read its configuration at power-up — and brings its own upper four IO lines (IO4–7) and DQS strobe for full x8 operation; the two devices are distinguished purely by their separate CS# lines. The arrangement is safe across boot because the PSRAM's CS# is held deasserted at power-on: while the FPGA configures itself from the flash over the shared lower lines, the PSRAM stays off the bus and the flash wins the cold boot uncontested. Once configuration completes, the GoWin MSPI pins revert to fabric I/O, FireStorm drives both CS# lines itself, and the PSRAM runs as a full octal-DDR device.

Unlike the read-only flash, the PSRAM is read/write volatile store, on the third bus independently of the DDR3 and 36-bit SRAM buses. Its primary role is holding ROM data for simulated systems — OS / firmware images, game-cartridge contents, disk and tape images, and the fixed asset sets a personality core needs. The supervisor loads these images into the PSRAM; the personality core then reads them as the simulated machine's ROM/cartridge address space (swapping a cartridge or disk is just the host rewriting that PSRAM region).

Keeping that data in the PSRAM is a deliberate bus-isolation choice. Because the PSRAM is on its own bus — shared only with the config flash, which is essentially idle at runtime — a personality core streams cartridge and ROM bytes from it in parallel with FireStorm's wide-mode instruction fetch on the SRAM bus and its working data on the DDR3 bus, without contending for either. The simulated machine's ROM traffic stays off the two buses the host CPU depends on for throughput, so adding a faithful personality doesn't steal bandwidth from code fetch or working memory.

This is also what makes faithful compatibility cheap. The eZX Spectrum board pairs the same 16MB octal-DDR PSRAM with its SRAM, so FireStorm carrying the identical part lets the eZX cartridge reproduce that machine's memory map exactly — ROM sets, Classic-mode image staging, and tape/disk buffering land where the eZX expects them, served from the same independent bus. Other personalities use it the same way for their own OS ROMs and cartridge images.

In native Ant64 mode the 16MB is available as general-purpose working store — staging, caching, scratch — which likewise keeps that load off DDR3 and SRAM. (The native-mode allocation is left to system software; the hardware just provides the 16MB on its own bus.)

The trade-off is first-access latency: even at 266 MB/s streaming, the PSRAM's read latency (5 clocks ≈ 38 ns at 133 MHz before a burst begins) is higher than the parallel SRAM or DDR3. The win is high bulk bandwidth on a contention-free bus, not single-access latency — so latency-critical bytes can still be staged from PSRAM into BSRAM, SRAM, or DDR3 at load time, while bulk ROM and cartridge images live in the PSRAM and stream from it during execution.

Part and performance. The device is a 16MB octal (xSPI / OPI) DDR PSRAM in the 3.3 V grade, chosen so it can share the 3.3 V bus with the NOR flash — either the ISSI IS66WVO16M8DBLL-133 or the equivalent AP Memory APS12808L-3OBMx. Both are JEDEC xSPI / Xccela OPI x8 DDR parts in the BGA-24 footprint at 3.3 V / 133 MHz with write-latency code 5, so they are interchangeable second sources needing no controller or layout change. At the 133 MHz octal-DDR rate either part moves 8 bits × 2 edges × 133 MHz = 2128 Mbit/s ≈ 266 MB/s, with a DQS strobe for read capture and a 5-clock latency (≈38 ns at 133 MHz) before a burst streams. Both use fixed/variable-latency burst read/write modes — there is no quad-SDR page-crossing clock penalty.

That is far more than any simulated ROM requires. The 8/16-bit targets clock their ROM at the CPU bus rate:

System	ROM access cadence	Peak ROM bandwidth
Amiga 500 (68000 @ 7.16 MHz, 16-bit)	word / ~558 ns	~3.6 MB/s
Atari ST (68000 @ 8 MHz, 16-bit)	word / 500 ns	~4.0 MB/s
Mega Drive (68000 @ 7.67 MHz, 16-bit)	word / ~521 ns	~3.8 MB/s
SNES (65816 @ 3.58 MHz FastROM, 8-bit)	byte / ~280 ns	~3.6 MB/s

The PSRAM out-runs every one by ~65–100×, and even a worst-case cold random read — its 5-clock read latency (≈38 ns at 133 MHz) plus the command/address phase, a few tens of ns in total — lands well inside the 68000's ~500 ns bus cycle, so the simulated machine never stalls on it. A faithful personality throttles its CPU to the original clock regardless, so the PSRAM is never the limiting factor — and period-correct wait states are reproduced deliberately.

Why the 3.3 V grade. These OctalRAM parts come in two grades: 1.8 V at up to 200 MHz (~400 MB/s) and 3.3 V at up to 133 MHz (~266 MB/s). The 3.3 V grade is the one used here, specifically so the PSRAM can share the flash bus — the faster 1.8 V grade would need its own level-shifted 1.8 V bus and couldn't sit on the 3.3 V flash lines. The octal-DDR width is what keeps throughput high despite the lower 3.3 V clock ceiling: x8 at double data rate gives 266 MB/s at 133 MHz, four times what a quad-SDR part manages at the same clock. The ~130 MB/s given up versus the 1.8 V grade is irrelevant to the workload — 266 MB/s already beats every simulated ROM by nearly two orders of magnitude.

DeMon reaches the PSRAM over the same JTAG-chain path it uses for the flash bus. The PSRAM sits on the boot bus, which (with its flash+PSRAM pair) is common to every FireStorm board.

The FRAM Model — FireStorm Chipset Registers as Memory-Mapped Peripherals

From the perspective of every external chip — Pulse, DeMon, optional accelerator — the FireStorm chipset is simply a block of memory-mapped addresses accessible over QSPI. There is no special protocol, no command layer, no driver handshake. Writing to an address updates a FireStorm chipset register. Reading from an address returns a register value. It is exactly the same programming model as any other memory-mapped peripheral on an embedded system — the FPGA just has a very large and very capable register file.

From the perspective of the FireStorm EE itself, chipset registers live in the standard MMIO region (0xFxxx_xxxx, see FireStorm EE §5.4) and are accessed with ordinary load/store instructions. No QSPI needed — the CPU and chipset share the same internal address fabric.

This means:

The FireStorm EE writes a palette entry by storing a 32-bit value to the palette RAM address. The palette updates on the next scanline. No IPC, no kernel call, no DMA setup.
DeMon writes a display mode register to force a specific CRT profile at boot — one 16-bit write to the right QSPI address.
Pulse writes audio parameter registers — volume, pitch, envelope — as ordinary memory writes, timed by its own PIO logic.
Any chip can read back status registers, scanline counters, blitter job IDs, or completion flags as simple memory reads.

The FPGA controls what address ranges are visible in each external chip's FRAM window and what they map to internally. A window can expose palette RAM, sprite attribute tables, layer registers, the Copper list, audio registers, blitter job queues, or any region of DDR3 (within addressable bounds). Each of the three FRAM interfaces (Pulse, DeMon, accelerator) is independent — they can have different mappings, different access permissions, and never contend with each other.

If you wrote games or demos in the 80s or 90s, you already know this feeling. POKE $D000 on the C64 and a sprite moved. Write to $DFF180 on the Amiga and a colour changed on the next scanline. The machine did what you told it, immediately, because you spoke to it directly. The FRAM model gives external chips that feeling on the Ant64 — and the FireStorm EE has it natively, since chipset registers are just MMIO from its perspective.

Inter-Subsystem Data Paths

DeMon → FireStorm (AntOS UI + bulk data in + SID and SAM audio): DeMon's ESP32-P4 MIPI TX drives FireStorm's MIPI RX #0 — a 2-lane D-PHY link at 1.5 Gbps per lane, 3 Gbps total (~375 MB/s peak). Traffic is frame-locked by FireStorm's system vsync: on each pulse DeMon bursts the next frame (typical 320 × 240 RGB framebuffer ~225–300 KB + a per-frame audio chunk), and the link is idle between bursts — so the peak figure is headroom, not the steady-state load. Primarily used to push DeMon's CPU-rendered AntOS UI, sprite and tilemap layers (via the P4 2D-PPA blitter) into the FireStorm compositor. The same physical link also carries audio data from DeMon's Triple SID engine (L/C/R, 9 SID voices) to the FireStorm mixer, plus arbitrary bulk data (asset uploads, OTA payloads, debug captures) into FireStorm DDR3 at D-PHY speeds — orders of magnitude faster than QSPI for large transfers.

Pulse → FireStorm (sequencer UI + sprite/tilemap layers + bulk audio): Pulse's ESP32-P4 MIPI TX drives FireStorm's MIPI RX #1 — a 2-lane D-PHY link at 1.5 Gbps per lane, 3 Gbps total (~375 MB/s peak). Same frame-locked cadence as the DeMon link: a vsync pulse from FireStorm triggers Pulse to burst the next frame's video + audio over MIPI; idle between pulses. Carries Pulse's CPU-rendered sequencer / mixer / sample-browser UI, sprite and tilemap layers (via the P4 2D-PPA blitter), AMY synth output, classic-emulator audio, and sample data into FireStorm DDR3. Pulse retains a QSPI register window for low-latency control writes (note-on / note-off / parameter updates) where the MIPI link's latency is not appropriate.

FireStorm → DeMon and FireStorm → Pulse (return paths, planned): FireStorm has no native MIPI TX hardcell — the GW5AST silicon hardens MIPI only on the RX side. To send data back to each ESP32-P4's MIPI CSI input, the system uses an external LVDS-to-MIPI CSI bridge between FireStorm's LVDS TX pins and the supervisor's CSI lanes — one bridge per supervisor, so DeMon and Pulse each gain an independent return channel. The bridge is the Lontium LT9211 (a MIPI/LVDS/TTL ↔ MIPI/LVDS/TTL converter — it takes LVDS in and emits MIPI CSI-2 at up to 1.8 Gbps/lane, comfortably above the 1.5 Gbps target, and is the same vendor family as the dev-board LT8912B). The Toshiba TC358748 is not a drop-in alternative here: it's a parallel (24-bit) ↔ CSI-2 bridge, not LVDS, and caps at 1 Gbps/lane — it would only fit if FireStorm drove a parallel RGB bus instead of LVDS, at lower bandwidth. Target lane rate ~1.5 Gbps per lane (matching the ESP32-P4 D-PHY ceiling and the inbound MIPI RX paths) over 2 lanes — 3 Gbps total, ~375 MB/s peak per bridge. Asynchronous to the outbound MIPI links: each return path runs at its own cadence, not vsync-locked, so FireStorm can stream back whenever it has data ready without waiting for a frame boundary. The DeMon return carries composited video readback for capture / streaming / recording, FireStorm chipset telemetry, audio buffers from the FireStorm mixer (e.g., for OTA stream encoding via the ESP32-C5), and arbitrary bulk transfers from FireStorm DDR3 into AntOS memory. The Pulse return carries mixer output back for resampling / recording / sample capture into the sequencer, plus telemetry and bulk transfers.

Pi Zero 2W ↔ FireStorm: The Pi Zero 2W (optional accelerator) connects to the FPGA via its expansion interface. The exact protocol is TBD — likely a parallel data bus driven by the Pi's GPIO or a high-speed SPI/QSPI link. The Pi is treated as an opt-in coprocessor: FireStorm can request work (e.g., decode this frame of video, run this ML inference, emulate this fast 68040 instruction), and the Pi returns results into shared FireStorm memory.

DeMon ↔ FireStorm / FPGA flash: DeMon's QSPI FRAM interface gives it register-level access to FireStorm for system control, monitoring, and configuration. DeMon's JTAG connection gives it programming and debug access to the FPGA bitstream and — via the FPGA's internal JTAG chain — the FPGA flash. DeMon can reprogram the entire FPGA contents (bitstream + flash) from one supervisor chip over JTAG.

MIPI as a Data Fabric, Not Video Transport

The MIPI links between ESP32-P4 chips and FireStorm are treated as general-purpose high-bandwidth pipes, not as video transport in any meaningful sense. The endpoints — ESP32-P4 MIPI TX/CSI peripherals and FireStorm's MIPI RX hardcells — move bytes from one side to the other regardless of what those bytes represent. A "frame" is just a chunk of memory transferred under D-PHY framing rules; the receiver decides what to do with the contents.

In practice this means the inbound links from DeMon and Pulse carry interleaved payload types — composited UI layers, sprite/tilemap data, audio buffers (SID, SAM, AMY, sample playback), command lists for the FireStorm blitter, asset uploads, OTA payloads, debug captures. The transmitter packs a header indicating type, the receiver dispatches. Real video frames and audio buffers and arbitrary bulk data all share the same physical link at D-PHY speeds (~375 MB/s per 2-lane direction at 1.5 Gbps lanes), with QSPI FRAM reserved for low-latency single-register writes where MIPI's burst-oriented nature would add too much latency.

The return paths (FireStorm → DeMon and FireStorm → Pulse, each via its own LVDS-to-MIPI bridge) inherit the same framing: even though the bridge chip thinks it's converting a video signal, FireStorm can pack whatever data it wants into the "scanlines" it emits, and the receiving supervisor parses the CSI buffer by header type. The bridge cannot tell whether the bytes are real pixels or arbitrary payload and doesn't need to.

This framing turns the bidirectional MIPI pair into a clean asymmetric peer link between FireStorm and each ESP32-P4: outbound frame-locked at ~375 MB/s for vsync-aligned UI + audio flow, inbound asynchronous at ~375 MB/s for capture / telemetry / readback. Combined throughput approaches 750 MB/s in each direction between FireStorm and the ESP32-P4 chips — substantially higher than any other inter-chip link in the system, and roughly the same order of magnitude as the FireStorm DDR3's own bandwidth.

The AntOS text and videotext layer is one specific consumer of this fabric on the DeMon side — see the AntOS Text Architecture document for the cell format, font system, three logical display modes (Terminal, System, Videotext), and the VT100/ANSI parser that produces the rendered pixel layer. At 640×360 RGB888 / 60 Hz the text layer occupies ~40 MB/s, only ~10% of one MIPI direction's bandwidth, leaving the rest free for the other data types that share the link.

Memory Map Summary

Memory / Bus	Capacity	Bandwidth	Ant64S	Ant64	Ant64C	Accessible to
BSRAM (on-chip)	~765KB	~3.4GB/s/block	✓	✓	✓	FireStorm EE (Xstack/Xctx/scratchpad), chipset (line buffers, palette, copper, audio voice state)
Wide-mode SRAM	~4.5MB	~900MB/s	—	✓	✓	FireStorm EE instruction fetch (wide mode only)
FPGA flash	varies	read-only	✓	✓	✓	All FireStorm subsystems; DeMon via JTAG. Shares the third (boot) bus with the PSRAM (separate CS)
FPGA PSRAM	16MB	Octal DDR (3.3 V, 133 MHz, ~266 MB/s)	✓	✓	✓	Simulated-system ROM / cartridge / disk images — independent bus, parallel with SRAM & DDR3; also native-mode scratch. Shares the flash's lower IO + clock, separate CS
FireStorm DDR3	512MB / 1GB / 2GB	DDR3-800; 6.4 GB/s peak (64-bit)	✓	✓	✓	FireStorm EE code/data; blitter; audio DSP
FRAM (Pulse)	window	QSPI speed	✓	✓	✓	Pulse ↔ FireStorm mapped window
FRAM (DeMon)	window	QSPI speed	✓	✓	✓	DeMon ↔ FireStorm mapped window
FRAM (accelerator)	window	QSPI speed	✓	✓	✓	Pi Zero 2W ↔ FireStorm mapped window
MIPI RX #0 (from DeMon)	—	3 Gbps (2-lane)	✓	✓	✓	DeMon → FireStorm AntOS UI overlay + bulk data in
MIPI RX #1 (from Pulse)	—	3 Gbps (2-lane)	✓	✓	✓	Pulse → FireStorm sequencer UI + bulk audio data
MIPI CSI (FireStorm → DeMon, via LVDS-to-MIPI bridge)	—	3 Gbps (2-lane) target	✓	✓	✓	FireStorm → DeMon return path: video capture, telemetry, audio/OTA readback, bulk data into AntOS memory
MIPI CSI (FireStorm → Pulse, via LVDS-to-MIPI bridge)	—	3 Gbps (2-lane) target	✓	✓	✓	FireStorm → Pulse return path: mixer-output resampling / recording / sample capture, telemetry, bulk data
Pi expansion bus	—	TBD	✓ (optional)	✓ (optional)	✓	Pi Zero 2W ↔ FireStorm accelerator data

A Note on the Previous Architecture

Earlier Ant64 prototypes used a separate RISC-V SoC as the main CPU, with FireStorm acting as a chipset accessed over QSPI. That arrangement has been superseded: FireStorm now hosts the CPU directly. The benefits of the integration:

No bus crossing between CPU and chipset — the CPU's writes hit chipset registers in the same FPGA clock cycle.
Shared scratchpad BSRAM — the same on-chip memory holds CPU stacks (Xstack), task contexts (Xctx), and chipset structures (line buffers, palette, copper).
Unified timing domain — no skew between CPU dispatch and chipset response.
Simpler memory map — no separate "CPU DRAM" and "chipset DRAM"; one DDR3 with arbitrated access.
Lower BOM cost — one large chip instead of a large chip + a substantial RISC-V SoC.

The trade-off is a slightly more demanding FPGA design and lower theoretical CPU peak performance than a hardened RISC-V SoC could deliver. In practice the integration wins because Ant64's workloads are dominated by CPU↔chipset interaction patterns where the previous separate-CPU-plus-chipset architecture spent most of its time crossing buses.