FireStorm™ Execution Engine
The FireStorm Execution Engine is not a RISC-V, ARM, or 68k CPU. It is a hardware-scheduled task engine with a compact, expressive ISA designed specifically for real-time multimedia workloads.
The Execution Engine is a deterministic, multi-core execution engine embedded inside the FireStorm FPGA. It exists to run graphics, audio, copper, blitter, and system micro-tasks faster and more predictably than a general-purpose CPU.

The FireStorm EE Advantage
FireStorm EE code is always relocatable by default. No absolute jumps, no relocation tables, no GOT/PLT, no veneer thunks, no "linker magic". Position independent code is a core philosophy.
- Instructions are always one word
- Using Tri-Ops™, the EE can execute up to 3 operations per cycle/instruction
- EE cores are closer to four synchronised DSPs than a traditional SMP CPU
Core Architectural Differences
| Feature |
68K |
ARM32 |
FireStorm EE |
| Data registers |
8 — D0–D7 |
16 GPRs |
32 (Ant64S: 2× D0–D15) / 64 (Ant64/C: 2× D0–D31) |
| Address registers |
8 — A0–A7 |
Same as data |
32 (Ant64S: 2× A0–A15) / 64 (Ant64/C: 2× A0–A31) |
| Register pairing |
Implicit (mul/div) |
Awkward |
Explicit (.q) everywhere |
| Register blitting |
No |
No |
Any data register |
| Program counter |
Exception, can misalign |
Exception, can misalign |
Offset in opcode words, no misalignment |
| Stack register |
Fixed A7 |
Fixed SP |
Hardware SP, plus any A-reg can be a stack |
| Base register |
— |
— |
Points to the base of task data memory |
The FireStorm EE's separate data and address register files is very intentionally 68k-inspired — and it matters.
Instruction Set Expressiveness
| Capability |
68K |
ARM32 |
FireStorm EE |
| Conditional execution |
Limited |
Good (pre-Thumb) |
Full CC field 99% (base ISA) |
| Operand order |
source → dest |
dest → source |
68k-style source → dest |
| Addressing modes |
Rich |
Minimal |
Rich plus RISC three/four operands |
| Address vs data regs |
Yes |
No |
Yes |
| Code addressing |
Mixed |
Mixed |
All relative |
| Relocatable code |
Optional / manual |
Toolchain-dependent |
Always |
| PC-relative branches |
Yes |
Yes |
Native |
| Absolute jumps |
Common |
Common |
Not allowed |
| Jump tables |
Absolute / PC-rel |
Absolute / PC-rel |
Relative tables |
| Stack dependency |
Strong |
Strong |
Optional / register based |
| Self-contained tasks |
Partial |
Rare |
Yes |
| Copy and execute code blocks |
Risky |
Cache-sensitive |
Trivial |
| Post/pre inc addressing |
Yes |
No |
Yes |
| Scaled indexed addressing |
Yes |
Limited |
Yes |
| DBRA loop |
Yes |
No |
Yes (dbra, dbraz) |
| MOVEM |
Yes |
No |
Yes (alt ISA) |
| XCHG / SWAP |
Limited |
No |
Yes |
| Native fixed-point support |
Partial (manual) |
Partial (DSP extensions) |
Yes (architectural) |
| Fixed-point MAC |
No (68881 / DSP needed) |
Only with DSP extensions |
Yes |
| Saturation and rounding |
No |
Optional (SIMD/DSP) |
Built-in |
| Deterministic latency |
Mostly |
No (pipeline variance) |
Yes |
| Rotate + shift for scaling |
Multi-cycle |
Variable |
Single-cycle barrel |
| Fixed-point friendly ISA |
Accidental |
Secondary |
Designed for it |
| Audio-safe math |
Risky |
Cache-dependent |
Yes |
| 64-bit ops |
Awkward |
Limited |
Native via .q |
The FireStorm EE deliberately feels familiar to 68k programmers, but executes more like a modern RISC core.
Example: Memory Copy Loop
| 68K |
ARM32 |
FireStorm EE |
move.l #500-1,d0 |
ldr r0, =src |
move.l #500,d0 |
lea src,a0 |
ldr r1, =dst |
move.l src,a0 |
lea dst,a1 |
mov r2,#500 |
move.l dst,a1 |
loop: move.l (a0)+,(a1)+ |
loop: ldr r0,[r1],#4 |
copy.l (a0)+,(a1)+, d0 |
dbra d0,loop |
str r0,[r3],#4 |
(starts DMA copy in background) |
|
subs r2,r2,#1 |
|
|
bne loop |
|
The FireStorm EE betters 68k clarity, takes one fewer instruction (no branch needed), the copy executes in the background, AND runs on a modern FPGA clock with no bus arbitration penalty.
Memory and Hardware Integration
| 68K |
ARM32 |
FireStorm EE |
| Unified address space |
Harvard-ish |
Lives inside the FPGA |
| Self-modifying code possible |
Heavy cache management |
Zero-cost access to hardware |
| Hard to cache safely |
Self-modifying code expensive |
Can be started by hardware (Copper, etc.) |
| Memory-mapped hardware |
Everything goes through the CPU |
Can launch hardware (blits, audio) |
| CPU must wait on everything |
Peripherals are slow MMIO |
Code and data are logically separate |
| Copper/blitter are separate but dumb |
DMA exists but is OS-heavy |
Code memory is word-addressed |
|
|
Writes to code space are rare and controlled |
|
|
Prefetch window can be safely invalidated |
|
|
Safe background prefetching |
This is not a CPU talking to hardware. This is hardware talking to hardware, with the FireStorm EE as the glue logic.
Execution Model (the hidden superpower)
| Feature |
68K |
ARM32 |
FireStorm EE |
| Cores |
1 |
1–4 |
2 (Ant64S) / 4 (Ant64/C) |
| Instruction fetch |
Raw memory |
Cache/predictor |
BRAM prefetch window |
| Backward loop behaviour |
Decent |
Predictor dependent |
Explicit "behind window" |
| Context switch cache penalty |
Huge |
Huge |
Usually none |
| Hardware scheduling + cache stickiness |
No |
No |
Yes |
| Task switch |
Slow |
Slow |
~2 cycles (reg/state swap) |
| Scheduler |
OS |
OS |
Hardware |
| Idle cores |
Waste |
Waste |
Repurposed instantly |
| Reaper reclaimer |
No |
No |
Redundant tasks/jobs removed |
| Repeat without fetch |
No |
No |
Up to 16 instructions in length |
FireStorm EE cores are:
- Not tied to copper, blitter, or audio
- General-purpose, with fast context switching
- Can be scheduled by hardware events
Register Blitting
Register Blitting is a FireStorm EE feature you won't find on other CPUs. Instead of forcing every small effect through a separate graphics unit, data registers themselves can be attributed with store-time behaviour — swizzling formats, masking writes, XOR or alpha-controlled drawing, and read-modify-write accumulation.
Combined with large register files and tight counted loops, this lets the CPU handle fine-grained graphics and effects directly, while the hardware blitter focuses on bulk work. It's a deliberate design for games, demos, and creative computing.
Example: Alpha-Controlled Masked Merge
Slot 0 store does swizzle ARGB→BGRA, RMW merge using source alpha:
- If
A==0: new = old
- Else:
new = (old & ~MASK) | (swz(src) & MASK)
MASK selects which bytes/bits are affected when A!=0
- Example:
MASK = 0x00FFFFFF — update RGB, keep destination top byte
|
68K |
ARM32 |
FireStorm EE |
| Performance |
150–180 cycles/pixel |
15–30 cycles/pixel |
Typical 6–12, best case 3 cycles/pixel |
68K:
loop:
move.l (a0)+, d0
tst.b d0
beq.s skip
move.l d0, d1
and.l #$FF00FF00, d1
move.l d0, d2
and.l #$00FF0000, d2
lsr.l #16, d2
move.l d0, d3
and.l #$000000FF, d3
lsl.l #16, d3
or.l d2, d1
or.l d3, d1
move.l (a1), d4
and.l #$00FFFFFF, d1
and.l #$FF000000, d4
or.l d1, d4
move.l d4, (a1)
skip:
addq.l #4, a1
dbra d7, loop
ARM32:
loop:
ldr r3, [r0], #4
movs r4, r3, lsr #24
beq transparent
and r5, r3, #0xFF00FF00
and r6, r3, #0x00FF0000
lsr r6, r6, #16
and r7, r3, #0x000000FF
lsl r7, r7, #16
orr r3, r5, r6
orr r3, r3, r7
ldr r4, [r1]
and r3, r3, #0x00FFFFFF
bic r4, r4, #0x00FFFFFF
orr r4, r4, r3
str r4, [r1]
transparent:
add r1, r1, #4
subs r2, r2, #1
bne loop
FireStorm EE:
rgsetilo #lo16(CFG_SWZ_ARGB_TO_BGRA | CFG_RMW_ALPHAKEY), slot0
rbsetihi #hi16(CFG_SWZ_ARGB_TO_BGRA | CFG_RMW_ALPHAKEY), slot0
rbsetmlo #$FFFF, slot0
rbsetmhi #$00FF, slot0
rbsel slot0, D0
rbenable #$01, d0
loop:
move [A0]+, D0
move D0, [A1]+
dbra D2, loop
rbenable #0, d0
Tri-Ops™: Parallelism Without the Pain
Most CPUs chase performance with wider pipelines and complex schedulers. Ant64 takes a different path.
Tri-Ops™ allow three simple operations to execute together in a single cycle, using a compact, deterministic encoding that's easy for hardware — and for humans. No speculative execution. No hidden stalls. Just more work done per cycle.
Tri-Ops excel at pointer arithmetic, loop control, and bit-level work — keeping hot loops small, readable, and fast.
Forwarding Model
Each slot runs conceptually in parallel, but with an explicit forwarding path from earlier slots to later slots:
S0 → S1 (Slot1 can see Slot0's written value)
S0 → S2 (Slot2 can see Slot0's written value)
S1 → S2 (Slot2 can see Slot1's written value)
Tri-Ops Examples
; Example 1 — D0 &= (D0 - 1)
mov d0, d3 : dec d3 : and d3, d0
; D3 = D0
; D3 = D3 - 1
; D0 = D0 & D3
; Example 2 — 64-bit add: (D1:D0) += (D3:D2)
add d2, d0 : adc d3, d1 : nop
; Slot0: D0 = D0 + D2 (sets C)
; Slot1: D1 = D1 + D3 + C
Instruction Timing
| Instruction class |
68K |
ARM32 |
FireStorm EE |
| ALU ops |
2–8 cycles |
1 cycle |
1 cycle |
| Shift/rotate |
Variable |
1–3 cycles |
1 (barrel) |
| Mul |
38+ cycles |
3–5 cycles |
1–3 cycles (non-blocking) |
| Div |
Up to 158 cycles |
8–21 cycles |
16–32 cycles (non-blocking) |
| Branch |
Expensive |
Pipeline flush |
Cheap (no predictor) |
FireStorm EE trades speculation, caches, and out-of-order madness for predictability and bandwidth — which is exactly what demo code wants.
Architectural Overview
| Feature |
Ant64S |
Ant64 / Ant64C |
| EE cores |
2 |
4 |
| Pipeline depth |
3 stages |
4 stages |
| Instruction width |
32 bits |
36 bits |
| Registers |
64 (2× 16D + 2× 16A) |
128 (2× 32D + 2× 32A) |
| Alternate ISA |
✔ |
✔ |
| Tri-Ops™ |
✔ |
✔ |
| Fixed point |
✔ |
✔ |
| Float assist |
— |
✔ |
Each EE core:
- Executes one task at a time
- Has private register state
- Can be scheduled or descheduled in ~2 cycles
- Is returned to the scheduler via
ENDTASK
Execution Model
The FireStorm EE uses a hardware task scheduler.
Tasks are launched by: Copper · Blitter · Audio engine · FireStorm control logic
Tasks run until they execute ENDTASK or are pre-empted by the scheduler.
Context switching swaps the entire register file + flags in hardware — no software save/restore.
Syntax Rules
- Operand order: source, destination
- Only round brackets
- No square brackets (reserved)
- No PC-relative data
MOVE D0, D1
ADD #4, A0
MOVE (A0)+, D0
MOVE D1, -(A1)
Register Model
Ant64S — Base ISA — 64 registers (2× 32)
- Data registers:
D0–D15 (register blitting capable)
- Address registers:
A0–A15
- Both banks can be swapped with an alternate bank
Ant64 / Ant64C — Base ISA — 128 registers (2× 64)
- Data registers:
D0–D31 (register blitting capable)
- Address registers:
A0–A31
- Both banks can be swapped with an alternate bank
Base ISA semantics are identical between Ant64S and Ant64/C. Ant64 simply exposes more registers using v3 to v0.
Flags — BIRDXNZVC
| Flag |
Name |
Description |
B |
Blitting |
Register blitting effects enabled |
I |
Interrupts |
Interrupts enabled |
R |
Reserved |
— |
D |
Division |
Division by zero |
X |
eXtended |
Sticky carry |
N |
Negative |
Result is negative |
Z |
Zero |
Result is zero |
V |
oVerflow |
Arithmetic overflow |
C |
Carry |
Carry out |
Base ISA Instruction Format
Ant64/Ant64C adds 4 extension bits [35:32]:
| Name |
Bits |
Meaning |
V3 |
[35:35] |
Extend RD or variant 2 (Ant64/C only) |
V2 |
[34:34] |
Extend RA or variant 2 (Ant64/C only) |
V1 |
[33:33] |
Extend RB or variant 1 (Ant64/C only) |
V0 |
[32:32] |
Extend RC or variant 0 (Ant64/C only) |
Alt |
[31:31] |
Alternative ISA |
OPC |
[30:27] |
Opcode group (0–15) |
SUB |
[26:22] |
Sub-op within group (0–31) |
SZ |
[21:20] |
Size: 00=8, 01=16, 10=32, 11=variant/64-bit |
CC |
[19:16] |
Condition code |
RD |
[15:12] |
Destination register index |
RA |
[11:8] |
Register A |
RB |
[7:4] |
Register B / EA2b |
RC |
[3:0] |
Register C / EA2c |
Effective register index:
RD_eff = (V3<<4) | RD
RA_eff = (V2<<4) | RA
RB_eff = (V1<<4) | RB
RC_eff = (V0<<4) | RC
Variant-bit rule: If a sub-op does not use one of RD/RA/RB/RC, its corresponding V bit becomes a free variant bit (Ant64/C only). Ant64S has no such variants.
Immediate Encoding Table
✔ = part of immediate; otherwise column shows what that field is used for
| # |
Name |
Bits |
SZ |
CC |
RD |
RA |
RB |
RC |
| 0 |
NA |
0 |
SZ |
CC |
RD |
RA |
RB |
RC |
| 1 |
IMM2 |
2 |
✔ |
CC |
RD |
RA |
RB |
RC |
| 2 |
IMM4 |
4 |
SZ |
CC |
RD |
RA |
RB |
✔ |
| 3 |
IMM4b |
4 |
SZ |
✔ |
RD |
RA |
RB |
RC |
| 4 |
IMM6 |
6 |
✔ |
CC |
RD |
RA |
RB |
✔ |
| 5 |
IMM6b |
6 |
✔ |
✔ |
RD |
RA |
RB |
RC |
| 6 |
IMM8 |
8 |
SZ |
CC |
RD |
RA |
✔ |
✔ |
| 7 |
IMM8b |
8 |
SZ |
✔ |
RD |
RA |
RB |
✔ |
| 8 |
IMM10 |
10 |
✔ |
CC |
RD |
RA |
✔ |
✔ |
| 9 |
IMM10b |
10 |
✔ |
✔ |
RD |
RA |
RB |
✔ |
| 10 |
IMM12 |
12 |
SZ |
CC |
RD |
✔ |
✔ |
✔ |
| 11 |
IMM12b |
13 |
SZ |
✔ |
RD |
RA |
✔ |
✔ |
| 12 |
IMM14 |
14 |
✔ |
CC |
RD |
✔ |
✔ |
✔ |
| 13 |
IMM14b |
15 |
✔ |
✔ |
RD |
RA |
✔ |
✔ |
| 14 |
IMM16 |
16 |
SZ |
CC |
✔ |
✔ |
✔ |
✔ |
| 15 |
IMM16b |
16 |
SZ |
✔ |
RD |
✔ |
✔ |
✔ |
| 16 |
IMM18 |
18 |
✔ |
CC |
✔ |
✔ |
✔ |
✔ |
| 17 |
IMM18b |
18 |
✔ |
✔ |
RD |
✔ |
✔ |
✔ |
| 18 |
IMM20 |
20 |
SZ |
✔ |
✔ |
✔ |
✔ |
✔ |
| 19 |
IMM22 |
22 |
✔ |
✔ |
✔ |
✔ |
✔ |
✔ |
Addressing Modes
EA4 — 4-bit Effective Address
| EA4 |
EA |
Meaning |
| 0 |
data Rn |
Data register direct |
| 1 |
(Rn) |
Address register indirect |
| 2 |
(Rn)+ |
Address register indirect with postincrement |
| 3 |
-(Rn) |
Address register indirect with predecrement |
| 4 |
addr Rn |
Address register direct |
| 5 |
(Rn+) |
Indirect next, doesn't change register |
| 6 |
(-Rn) |
Indirect last, doesn't change register |
| 7 |
((Rn)) |
Address register indirect indirect |
| 8 |
data Rn |
Alt data register direct |
| 9 |
addr Rn |
Alt address register direct |
| 10 |
(Rn) |
Alt address register indirect |
| 11 |
(Rn)+ |
Alt address register indirect with postincrement |
| 12 |
-(Rn) |
Alt address register indirect with predecrement |
| 13 |
(Rn+) |
Alt indirect next |
| 14 |
(-Rn) |
Alt indirect last |
| 15 |
((Rn)) |
Alt address register indirect indirect |
EA2a — 2-bit Effective Address (2× fit in bits 0–3)
| EA2a |
EA |
Meaning |
| 0 |
data Rn |
Data register direct |
| 1 |
(Rn) |
Address register indirect |
| 2 |
(Rn)+ |
Address register indirect with postincrement |
| 3 |
-(Rn) |
Address register indirect with predecrement |
EA2b — Scale replaces RB (bit 0–2 = scale, bit 3 = +)
| EA2b |
EA |
Meaning |
| 0 |
(addr RA, #IMM12b), data RD |
Address indirect with 12-bit displacement to data |
| 1 |
(addr RA, data RC<<scale)[+], data RD |
Address indirect with index [,scale] [,postincrement] to data |
| 2 |
data RD, (addr RA, #IMM12b) |
Data to address indirect with 12-bit displacement |
| 3 |
data RD, (addr RA, data RC<<scale)[+] |
Data to address indirect with index [,scale] [,postincrement] |
EA2c — Scale replaces CC (bit 0–2 = scale, bit 3 = +)
| EA2c |
EA |
Meaning |
| 0 |
(addr RA, #IMM8b) |
Address indirect with 8-bit displacement |
| 1 |
(addr RA, data RC<<scale)[+] |
Address indirect with index [,scale] [,postincrement] |
| 2 |
((addr RA), #IMM8b) |
Address indirect indirect with 8-bit displacement |
| 3 |
((addr RA), data RC<<scale)[+] |
Address indirect indirect with index [,scale] [,postincrement] |
MAC Options
| IMAC |
Option |
Meaning |
| Bit 0 |
RND |
Rounding |
| Bit 1 |
SAT |
Saturation |
| Bit 2 |
RSVD |
Reserved |
| Bit 3 |
RSVD |
Reserved |
Conditional Branch Types
|
|
|
|
bra / bt |
beq |
bne |
bhi |
bls |
bcc / bhs |
bcs / blo |
bpl |
bmi |
bvc |
bvs |
bge |
blt |
bgt |
ble |
bf |
Reserved Vector Table
- Words 0–31 — Interrupt vectors. Flags and PC pushed to hardware SP. Word 0 = BREAK, rest are user interrupts. Return with
RTI.
- Words 32–63 — "LineA" OS functions. 13-bit immediate loaded into
D0. Return with RTS.
- Words 64–127 — "LineF" user functions. 12-bit immediate loaded into
D0. Return with RTS.
64-bit operand pairs use the "next in bank" rule — second reg is next within the 16-reg bank (wraps).
Base ISA Instruction Set
Group 0 — System
| OPC |
SUB |
Name |
Operands |
Description |
| 0 |
0 |
(Register shuffle 1) |
One register |
Bits 6–0 = minor #. Bit 7 = RD pair present |
| 0 |
Minor 0 |
END |
[RD] |
End task |
| 0 |
Minor 1 |
YIELD |
[RD] |
Voluntarily yield. RD pair = MS to yield (positive) or frames to wait (negative) |
| 0 |
Minor 2 |
PAUSE |
RD |
Pause job ID. Bit 7 = unpause |
| 0 |
Minor 3 |
WAIT |
[RD] |
Wait for remaining copies/tasks. If RD provided, check ID in RD |
| 0 |
Minor 4 |
STATUS |
[RD] |
Check job ID in RD, set/clear Z if not complete |
| 0 |
Minor 5 |
INTERRUPT |
[RD], #IMM6 |
Trigger interrupt [RD] + #IMM6. INTERRUPT #0 = BREAK |
| 0 |
Minor 5 |
VBLANK |
[RD], #IMM6 |
Yield to next ([RD] + #IMM6) vblank |
| 0 |
Minor 5 |
HBLANK |
[RD], #IMM6 |
Yield to next ([RD] + #IMM6) hblank |
| 0 |
Minor 32 |
MOVE |
RD, SP, #IMM6 |
Move addr RD + signed #IMM6 to SP |
| 0 |
Minor 33 |
MOVE |
SP, RD, #IMM6 |
Move SP + signed #IMM6 to addr RD |
| 0 |
Minor 34 |
MOVE |
RD, PC, #IMM6 |
Move addr RD + signed #IMM6 to PC |
| 0 |
Minor 35 |
MOVE |
PC, RD, #IMM6 |
Move PC + signed #IMM6 to addr RD |
| 0 |
Minor 36 |
MOVE |
RD, F |
Move data register RD to Flags |
| 0 |
Minor 37 |
MOVE |
F, RD |
Move Flags to data register RD |
| 0 |
Minor 38 |
PUSH |
F, (SP/RD)+ |
Push flags to address RD or SP |
| 0 |
Minor 39 |
POP |
-(SP/RD), F |
Pop flags from address RD or SP |
| 0 |
Minor 63 |
BANK |
ADDR, DATA |
Bit 7 = Address bank Hi, Bit 6 = Data bank Hi |
| 0 |
Minor 64 |
SWAP |
RD |
Swap hi/lo parts of register. Bit 7 = A or D |
| 0 |
1 |
(Register shuffle 2) |
Two registers |
Bits 3–0 = minor # |
| 0 |
Minor 0 |
SETPRI |
#imm8 |
Set task priority |
| 0 |
Minor 1 |
REPEAT |
RA, #imm4 |
Repeat #imm4 instructions RA times. CC to exit early |
| 0 |
Minor 8 |
XCHG |
RA, RD |
Exchange registers |
| 0 |
Minor 9 |
FINDNEXT |
RA, RD |
Find next |
| 0 |
2 |
START |
RD, #IMM8 |
Start task at code address RC, priority #imm8. Returns job # in RD (0 = fail) |
| 0 |
3 |
PREFETCH |
#imm18 |
Set prefetch forward to #imm18+1 words |
| 0 |
4 |
REPEAT |
#imm14, #imm4 |
Repeat #imm4 instructions RD times. 0 = max |
| 0 |
5 |
NEG.sz |
[X], EAMOD(RD) |
Negate. X = bit 0 |
| 0 |
6 |
EXT.sz |
EAMOD(RD) |
Sign extend |
| 0 |
7 |
ZERO |
RA, RB, RD |
Zero up to 3 registers. SZ bits enable 2nd/3rd operand |
| 0 |
10 |
LINEA |
#IMM5, #IMM13 |
BSR/JSR to 32+#IMM5, D0 = #IMM13 |
| 0 |
11 |
ZERO |
RA, RB, RD |
Zero up to 3 registers (no SZ) |
| 0 |
12 |
INC |
RA, RB, RD |
Increment up to 3 registers |
| 0 |
13 |
DEC |
RA, RB, RD |
Decrement up to 3 registers |
| 0 |
14 |
ROTATE |
RA, RB, RD |
Rotate 3 regs: temp=RD, RD=RB, RB=RA, RA=temp |
| 0 |
15 |
LINEF |
#IMM6, #IMM12 |
BSR/JSR to 64+#IMM6, D0 = #IMM12 |
| 0 |
16 |
MIN.sz |
RA, RB, RD |
RD = min(RA, RB) |
| 0 |
17 |
MAX.sz |
RA, RB, RD |
RD = max(RA, RB) |
| 0 |
18 |
CLAMP.sz |
RA, RB, RD |
Clamp RD: RA = min, RB = max |
| 0 |
19 |
SIGN.sz |
RA, RB, RD |
RD = sign(RA, RB) |
| 0 |
20 |
MOVE |
#IMM22, SP |
Move immediate to stack pointer |
Group 1 — Load / Store / Immediate
| OPC |
SUB |
Name |
Operands |
Description |
| 1 |
0 |
LD.sz |
EA2b[0] |
Address indirect with 12-bit displacement to data |
| 1 |
1 |
LD.sz |
EA2b[1] |
Address indirect with index [,scale] [,postincrement] to data |
| 1 |
2 |
LD.sz |
EA2c[0] |
Address indirect with 12-bit displacement to data |
| 1 |
3 |
LD.sz |
EA2c[1] |
Address indirect with index [,scale] [,postincrement] to data |
| 1 |
4 |
LD.sz |
EA4[RA], data RB, data RD |
Do EA4[RA] to RB then repeat to RD |
| 1 |
5 |
LD.sz |
(base, #IMM12), data RD |
Load from base + #imm12 |
| 1 |
6 |
LD |
(base, #IMM14), data RD |
Load from base + #imm14 |
| 1 |
7 |
LD |
(base, #IMM18b), data RD |
Load from base + #imm18b |
| 1 |
12 |
MOVE.sz |
#IMM12, data RD |
Move signed #IMM12 into RD |
| 1 |
13 |
MOVE |
#IMM14, data RD |
Move signed #IMM14 into RD |
| 1 |
14 |
MOVE |
#IMM18b, data RD |
Move signed #IMM18b into RD |
| 1 |
15 |
COPY.sz |
EA4(RA), EA4(RD), RC |
No CC. Copy from EA4(RA) to EA(RD), RC times |
| 1 |
16 |
ST.sz |
EA2b[2] |
Data to address indirect with 12-bit displacement |
| 1 |
17 |
ST.sz |
EA2b[3] |
Data to address indirect with index [,scale] [,postincrement] |
| 1 |
18 |
ST.sz |
EA2c[2] |
Data to address indirect with 12-bit displacement |
| 1 |
19 |
ST.sz |
EA2c[3] |
Data to address indirect with index [,scale] [,postincrement] |
| 1 |
20 |
ST.sz |
data RB, data RD, EA4[RA] |
Store RB to EA4[RA] then repeat with RD |
| 1 |
21 |
ST.sz |
data RD, (base, #IMM12) |
Store to base + #imm12 |
| 1 |
22 |
ST |
data RD, (base, #IMM14) |
Store to base + #imm14 |
| 1 |
23 |
ST |
data RD, (base, #IMM18b) |
Store to base + #imm18b |
| 1 |
27 |
MOVE.sz |
EA2c, EA4[RD] |
No CC. EA to EA |
| 1 |
28 |
MOVE.sz |
EA4[RD], EA2c |
No CC. EA to EA |
| 1 |
29 |
MOVE.sz |
EA4(RA), EA4(RD) |
Move EA4(RA) to EA4(RD) |
| 1 |
30 |
MOVE.sz |
EA2b, EA2b |
2nd EA2b uses RD rather than RA |
| 1 |
31 |
COPY.sz |
EA2a(RA), EA2a(RD), data RB |
Copy RB units from EA2a(RA) to EA2a(RD) |
Any memory read/write that doesn't use a 3rd register AND still has the CC field can set V0 for "strict" mode. Assembler syntax: ! after opcode, e.g. st! D0,(A0). Ignored on Ant64S.
Group 2 — ALU
| OPC |
SUB |
Name |
Operands |
Description |
| 2 |
0 |
ADD.sz |
EA2a(RA), data RB, EA2a(RD) |
Load from EA2(RA), add RB, result to EA2(RD) |
| 2 |
1 |
ADD.sz |
data RA, #imm8, data RD |
Load RA, add #imm8, result in RD |
| 2 |
2 |
ADD.sz |
#imm12, data RD |
Add #imm12 to RD |
| 2 |
3 |
ADD.sz |
#imm16, data RD |
No CC. Add #imm16 to RD |
| 2 |
4 |
ADD.sz |
EA2b |
|
| 2 |
8 |
ADC.sz |
EA2a(RA), data RB, EA2a(RD) |
Add with carry |
| 2 |
9 |
ADC.sz |
data RA, #imm8, data RD |
|
| 2 |
10 |
ADC.sz |
#imm12, data RD |
|
| 2 |
11 |
ADC.sz |
#imm16, data RD |
No CC |
| 2 |
12 |
ADC.sz |
EA2b |
|
| 2 |
16 |
SUB.sz |
EA2a(RA), data RB, EA2a(RD) |
Subtract |
| 2 |
17 |
SUB.sz |
data RA, #imm8, data RD |
|
| 2 |
18 |
SUB.sz |
#imm12, data RD |
|
| 2 |
19 |
SUB.sz |
#imm16, data RD |
No CC |
| 2 |
20 |
SUB.sz |
EA2b |
|
| 2 |
24 |
SBC.sz |
EA2a(RA), data RB, EA2a(RD) |
Subtract with carry |
| 2 |
25 |
SBC.sz |
data RA, #imm8, data RD |
|
| 2 |
26 |
SBC.sz |
#imm12, data RD |
|
| 2 |
27 |
SBC.sz |
#imm16, data RD |
No CC |
| 2 |
28 |
SBC.sz |
EA2b |
|
Group 3 — Mul / Div / MAC / Fixed Point
| OPC |
SUB |
Name |
Operands |
Notes |
| 3 |
0–2 |
MULS.sz |
EA2a / #imm8 / #imm12 |
Signed multiply |
| 3 |
4–6 |
MULU.sz |
EA2a / #imm8 / #imm12 |
Unsigned multiply |
| 3 |
8–10 |
DIVS.sz |
EA2a / #imm8 / #imm12 |
Signed divide |
| 3 |
12–14 |
DIVU.sz |
EA2a / #imm8 / #imm12 |
Unsigned divide |
| 3 |
16–18 |
MACS.sz |
EA2a / #imm8 / #imm12 |
Signed MAC (uses MACMOD) |
| 3 |
20–22 |
MACU.sz |
EA2a / #imm8 / #imm12 |
Unsigned MAC (uses MACMOD) |
| 3 |
24–26 |
REMU.sz |
EA2a / #imm8 / #imm12 |
Unsigned remainder |
| 3 |
28–30 |
REMS.sz |
EA2a / #imm8 / #imm12 |
Signed remainder |
Group 4 — Shifts / Rotates
| OPC |
SUB |
Name |
Operands |
Notes |
| 4 |
0 |
LSL.sz |
EA2a(RA), RB count, EA2a(RD) |
Also ASL |
| 4 |
1 |
LSL.sz |
data RA, #imm8, data RD |
Also ASL |
| 4 |
4 |
ROL.sz |
EA2a(RA), RB count, EA2a(RD) |
|
| 4 |
5 |
ROL.sz |
data RA, #imm8, data RD |
|
| 4 |
8 |
ROXL.sz |
EA2a(RA), RB count, EA2a(RD) |
Uses X as in/out bit |
| 4 |
9 |
ROXL.sz |
data RA, #imm8, data RD |
Uses X as in/out bit |
| 4 |
12 |
LSR.sz |
EA2a(RA), RB count, EA2a(RD) |
|
| 4 |
13 |
LSR.sz |
data RA, #imm8, data RD |
|
| 4 |
16 |
ROR.sz |
EA2a(RA), RB count, EA2a(RD) |
|
| 4 |
17 |
ROR.sz |
data RA, #imm8, data RD |
|
| 4 |
20 |
ROXR.sz |
EA2a(RA), RB count, EA2a(RD) |
Uses X as in/out bit |
| 4 |
21 |
ROXR.sz |
data RA, #imm8, data RD |
Uses X as in/out bit |
| 4 |
24 |
ASR.sz |
EA2a(RA), RB count, EA2a(RD) |
|
| 4 |
25 |
ASR.sz |
data RA, #imm8, data RD |
|
Group 5 — Logic
| OPC |
SUB |
Name |
Operands |
Description |
| 5 |
0 |
AND.sz |
EA2a(RA), data RB, EA2a(RD) |
|
| 5 |
1 |
AND |
#imm12, RD |
No SZ. V0/V1 extend to #imm14 |
| 5 |
2 |
AND |
#imm16, RD |
No SZ, no CC |
| 5 |
4 |
NOT.sz |
EA2a(RA), data RB, EA2a(RD) |
|
| 5 |
5 |
NOT |
#imm12, RD |
No SZ |
| 5 |
6 |
NOT |
#imm16, RD |
No SZ, no CC |
| 5 |
8 |
OR.sz |
EA2a(RA), data RB, EA2a(RD) |
|
| 5 |
9 |
OR |
#imm12, RD |
No SZ |
| 5 |
10 |
OR |
#imm16, RD |
No SZ, no CC |
| 5 |
12 |
EOR.sz |
EA2a(RA), data RB, EA2a(RD) |
|
| 5 |
13 |
EOR |
#imm12, RD |
No SZ |
| 5 |
14 |
EOR |
#imm16, RD |
No SZ, no CC |
| 5 |
20 |
SHLADD |
#imm4, data RB, RA, RD |
RD = RA + (RB << #imm5). Shift 0 = 32 |
| 5 |
21 |
SHLSUB |
#imm4, data RB, RA, RD |
RD = RA - (RB << #imm5) |
| 5 |
24 |
CMP.sz |
EA2a(RA), RB, data RD |
Load from EA2a(RA), shift by RB, CMP data RD |
| 5 |
25 |
CMP.sz |
#imm16, RD |
No CC |
| 5 |
26 |
CMP |
#imm17, RD |
No SZ, no CC |
Group 6 — Move / Copy
| OPC |
SUB |
Name |
Operands |
Description |
| 6 |
0 |
move.sz |
(addr RA)+, (addr RD)+, #imm8 |
Copy #imm8 units from (RA)+ to (RD)+ |
| 6 |
1 |
move.sz |
EA2a(RA), RB, EA2a(RD) |
Load to EA2a, 4-bit signed shift using RB bits |
| 6 |
2 |
move.sz |
EA2a(RA), lsl #0-3, EA2a(RD) |
2-bit shift left on data |
| 6 |
3 |
move.sz |
(addr RA)+, (addr RD)+, #imm12 |
No CC. Copy #imm12 units |
| 6 |
4 |
move |
EA2a(RA), shift, EA2a(RD) |
No SZ. 6-bit signed shift using RB+SZ |
| 6 |
16 |
LDLO |
[Z], #imm16b, RD |
No CC. Sign/zero extend to 32. Bit 21 = RD is A or D |
| 6 |
17 |
LDHI |
[Z], #imm16b, RD |
No CC. Write high 16 into Dn |
| 6 |
18 |
COPY.sz |
(addr RA)[+], (addr RD)[+], #imm, [WAIT] |
No CC. DMA copy. Bit 7 = wait, bit 6 = fixed value |
| 6 |
19 |
COPY.sz |
(addr RA)[+], (addr RD)[+], RB, [WAIT] |
DMA copy. RD = job ID (0 = fail), Z set on fail |
Group 7 — Bit Test / Clear
| OPC |
SUB |
Name |
Operands |
Description |
| 7 |
0 |
BTST.sz |
#IMM8\|data RA, EA4(RD) |
Bit test |
| 7 |
8 |
BSET.sz |
#IMM8\|data RA, EA4(RD) |
Bit test and set |
| 7 |
16 |
BCLR.sz |
#IMM8\|data RA, EA4(RD) |
Bit test and clear |
| 7 |
24 |
BCHG.sz |
#IMM8\|data RA, EA4(RD) |
Bit test and change |
| 7 |
28 |
TST.sz |
EA4(RD) |
Test from register |
| 7 |
29 |
TSTF |
#IMM16 |
Flags RDXNZVC bit test |
| 7 |
30 |
SETF |
#IMM16 |
Flags RDXNZVC bit set |
| 7 |
31 |
CLRF |
#IMM16 |
Flags RDXNZVC bit clear |
Group 8 — Language Assist
| OPC |
SUB |
Name |
Operands |
Description |
| 8 |
0 |
SCAN |
RA, RB, (RD)+, #imm5 |
Scan characters. Pattern = 5 bits (alpha, number, whitespace, null, eol…) |
| 8 |
1 |
PARSENUM.sz |
(RA)+, RB, RD, #imm2 base, #imm2 type |
Parse number at (RA). Base = binary/octal/decimal/hex |
Group 9 — Branching / Flow
| OPC |
SUB |
Name |
Operands |
Description |
| 9 |
0 |
SYS |
#imm14, addr RD |
Set PC to #imm14 |
| 9 |
1 |
SYS |
data RA, addr RD |
Set PC to contents of RA |
| 9 |
2 |
BRA |
rel #IMM18 |
Branch relative |
| 9 |
3 |
BRA |
rel #IMM22 |
No CC. Branch relative |
| 9 |
4 |
BRA |
(addr RD, v) |
Branch to (RD) + signed #IMM14 |
| 9 |
5 |
BRA |
(addr RD, #IMM18b) |
No CC. Branch to contents of return address |
| 9 |
6 |
BRA |
(rel #IMM14, addr RD) |
Branch to #IMM14 + RD |
| 9 |
7 |
BRA |
(rel #IMM18b, addr RD) |
No CC. Branch to #IMM18b + RD |
| 9 |
8 |
BSR |
rel #IMM14, addr RD |
Branch-with-link. PC → RD |
| 9 |
9 |
BSR |
rel #IMM18b, addr RD |
Branch-with-link. PC → RD |
| 9 |
9 |
JSR |
rel #IMM22 |
JSR. PC → -(SP) |
| 9 |
10 |
DBRA |
data RD, rel #IMM14 |
Decrement and branch if Dn < 0 |
| 9 |
11 |
DBRAZ |
data RD, rel #IMM14 |
Decrement and branch if Dn != 0 |
Group 10 — Register Blitter System / Control
| OPC |
SUB |
Name |
Operands |
Description |
| 10 |
0 |
RBENABLE |
#IMM8, RD |
Bit 0 = store enable, 1 = load enable, 2 = inverse on load |
| 10 |
1 |
RBSEL |
slot, Dn |
Bind Dn to slot |
| 10 |
2 |
RBSETILO |
#IMM16, slot |
No CC. Set slot control low to #IMM16, high bits zeroed |
| 10 |
3 |
RBSETIHI |
#IMM16, slot |
No CC. Set slot control high word, leave low as-is |
| 10 |
4 |
RBSETMLO |
#IMM16, slot |
No CC. Set slot mask low to #IMM16, high bits zeroed |
| 10 |
5 |
RBSETMHI |
#IMM16, slot |
No CC. Set slot mask high word, leave low as-is |
| 10 |
6 |
RBSET |
RA, RD, slot |
Set slot mask to RA, control to RD |
Register Blitter Control Word
| Bits |
Field |
Values / Description |
31:28 |
SWZ_EXT |
16 = none; 17–31 reserved |
27:24 |
FORMAT |
RGB565, ARGB1555, etc. (metadata) |
23:16 |
ALPHA_THR |
0–255. Write if A > thr. Default 0 = A==0 transparent |
15:12 |
RMW_OP |
See table below |
11 |
RMW_EN |
0 = no RMW; 1 = enable RMW read |
10 |
ALPHA_SRC |
0 = source pixel A byte; 1 = reserved |
9:8 |
SWZ |
0 = none; 1 = ARGB→BGRA; 2 = BGRA→ARGB; 3 = reserved |
7:6 |
WIDTH |
0 = 32-bit; 1 = 16-bit; 2 = 8-bit; 3 = reserved |
5 |
AFFECTS_A |
0 = leave dest alpha alone; 1 = allow alpha update |
4:0 |
Misc |
See below |
RMW_OP Values
| Value |
Operation |
Description |
| 0 |
NONE |
Plain store (may still swizzle) |
| 1 |
MASKMERGE |
new = (old & ~MASK) \| (src & MASK) |
| 2 |
XOR |
new = old ^ src |
| 3 |
OR |
new = old \| src |
| 4 |
ADD |
new = old + src (wrap) |
| 5 |
SUB |
new = old - src (wrap) |
| 6 |
ALPHAKEY_MERGE |
If A==0 keep old, else masked-merge |
| 7 |
ALPHAMASK_MERGE |
Use A as per-byte mask level |
| 8–15 |
— |
Reserved |
Misc Bits 4:0
| Bit |
Description |
| 0 |
Strict alignment required |
| 1 |
Forbid MMIO regions |
| 2 |
Treat A==0 as transparent |
| 3 |
Invert alphakey (A==0 = write, else keep) |
| 4 |
Reserved for saturation |
Alternate ISA: TriOps
Ant64/Ant64C adds 4 extension bits [35:32]:
| Name |
Bits |
Meaning |
!FLAGS |
[35:35] |
Don't set flags for slot 0 (Ant64/C only) |
EXT-S0 |
[34:34] |
Extend SRC0 (Ant64/C only) |
EXT-S1 |
[33:33] |
Extend SRC1 (Ant64/C only) |
EXT-S2 |
[32:32] |
Extend SRC2 (Ant64/C only) |
1 |
[31:31] |
Alternative ISA |
1 |
[30:30] |
TriOps |
SLOT0 |
[29:20] |
Slot 0 (sets flags by default) |
SLOT1 |
[19:10] |
Slot 1 |
SLOT2 |
[9:0] |
Slot 2 |
Slot Decoding
| Name |
Bits |
Meaning |
OPC |
[9:6] |
Opcode |
SRC |
[5:4] |
Data register 0–3 |
AD |
[3:3] |
Address or Data for DST |
DST |
[2:0] |
Data/Address register 0–7 |
TriOps Slot Opcodes
| OPC |
Name |
Operands |
Description |
| 0 |
META |
— |
See Meta table |
| 1 |
MOV |
SRC, DST |
dst = src |
| 2 |
ADD |
SRC, DST |
dst = dst + src |
| 3 |
SUB |
SRC, DST |
dst = dst - src |
| 4 |
AND |
SRC, DST |
dst = dst & src |
| 5 |
OR |
SRC, DST |
dst = dst \| src |
| 6 |
XOR |
SRC, DST |
dst = dst ^ src |
| 7 |
NOT |
SRC, DST |
dst = !src |
| 8 |
SHIFT |
SRC, DST |
Logical shift: << positive, >> negative |
| 9 |
CTZ |
SRC, DST |
Count trailing zeros |
| 10 |
CLZ |
SRC, DST |
Count leading zeros |
| 11 |
ADC |
SRC, DST |
dst = dst + src + c |
| 12 |
SBC |
SRC, DST |
dst = dst - src - (1-c) |
| 13 |
MIN |
SRC, DST |
dst = min(src, dst) |
| 14 |
MAX |
SRC, DST |
dst = max(src, dst) |
| 15 |
CMP |
SRC, DST |
flags = dst - src (dst unchanged) |
Meta Opcodes (SRC used = 4/8 opcodes)
| OPC |
Name |
Operands |
Description |
| 0 |
META 2 |
— |
See Meta 2 |
| 1 |
INC |
DST |
dst += 1 |
| 2 |
DEC |
DST |
dst -= 1 |
| 3 |
ADD4 |
DST |
dst += 4 |
| 4 |
SUB4 |
DST |
dst -= 4 (36-bit only) |
| 5 |
ADD8 |
DST |
dst += 8 (36-bit only) |
| 6 |
SUB8 |
DST |
dst -= 8 (36-bit only) |
| 7 |
CLR |
DST |
dst = 0 (36-bit only) |
Meta 2 Opcodes (A/D + DST used = 16 opcodes)
| OPC |
Name |
Description |
| 0 |
NOP |
No operation |
| 1 |
CLRF |
Clear flags |
| 2 |
SETC |
Set carry |
| 3 |
CLRC |
Clear carry |
| 4 |
CLV |
Clear overflow |
| 5 |
CLRNZ |
Clear N and Z |
| 6 |
EI |
Enable interrupts |
| 7 |
DI |
Disable interrupts |
| 8 |
RTS |
(SP)+ copies to PC |
| 9 |
RTI |
Restore flags and PC from (SP)+ |
| 10–15 |
— |
Reserved |
Alternate ISA: Broadcast Ops
Ant64/Ant64C adds 4 extension bits [35:32]:
| Name |
Bits |
Meaning |
SRC-AD |
[35:35] |
Src is A/D (Ant64/C only) |
SRC-HI |
[34:34] |
Extend source register (Ant64/C only) |
BANK |
[33:33] |
Use alt register banks (Ant64/C only) |
HI |
[32:32] |
Dest is registers 16–31 (Ant64/C only) |
1 |
[31:31] |
Alternative ISA |
0 |
[30:30] |
Not TriOps |
MODE |
[29:28] |
Mode 0–3. Broadcast Ops = 0 |
DST-AD |
[27:27] |
Dest is A/D |
OPC |
[26:20] |
Opcode (0–127) |
SRC |
[19:16] |
Source register |
MASK |
[15:0] |
Dest register mask |
Broadcast Opcodes
| OPC |
Name |
Operands |
Description |
| 0 |
RESERVED |
SRC, DST-LIST |
Reserved |
| 1 |
MOV |
SRC, DST-LIST |
dst = src |
| 2 |
ADD |
SRC, DST-LIST |
dst = dst + src |
| 3 |
SUB |
SRC, DST-LIST |
dst = dst - src |
| 4 |
AND |
SRC, DST-LIST |
dst = dst & src |
| 5 |
OR |
SRC, DST-LIST |
dst = dst \| src |
| 6 |
XOR |
SRC, DST-LIST |
dst = dst ^ src |
| 7 |
NOT |
DST-LIST |
dst = -dst |
| 8 |
CLR |
SRC, DST-LIST |
|
| 9 |
SET |
SRC, DST-LIST |
|
| 10 |
NEG |
SRC, DST-LIST |
|
| 11 |
PUSH |
DST-LIST, -(SRC) |
(inverts SRC-AD flag) |
| 12 |
POP |
(SRC)+, DST-LIST |
(inverts SRC-AD flag) |
| 13 |
MIN |
SRC, DST-LIST |
dst = min(src, dst) |
| 14 |
MAX |
SRC, DST-LIST |
dst = max(src, dst) |
| 15 |
PUSH |
DST-LIST, -(SRC) |
BANK flag = bit 16, both A and D = bit 17 |
| 16 |
POP |
(SP)+, DST-LIST |
BANK flag = bit 16, both A and D = bit 17 |
| 17 |
MUL |
SRC, DST-LIST |
dst = dst * src |
| 18 |
MULHIU |
SRC, DST-LIST |
dst = high(dst * src) unsigned |
| 19 |
MULHIS |
SRC, DST-LIST |
dst = high(dst * src) signed |
| 20 |
ABS |
DST-LIST |
dst = abs(dst) |
| 21 |
SHIFT |
SRC, DST-LIST |
Shift dst. Signed src: negative = right |
| 22 |
ROTATE |
SRC, DST-LIST |
Rotate dst. Signed src: negative = right |
| 23 |
CMPEQ |
SRC, DST-LIST |
dst = (dst == src) ? -1 : 0 |
| 24 |
CMPLT |
SRC, DST-LIST |
dst = (dst < src) ? -1 : 0 |
| 25 |
CMPGT |
SRC, DST-LIST |
dst = (dst > src) ? -1 : 0 |