Backend-Dependent Behavior on Legacy HBM GPUs

Advanced Evidence Edition: ROCm/HIP vs Vulkan on Vega/gfx900 — Within a Single Tested Setup

Akira Ito | AETS (Akatsuki Enterprise Technology Solutions) | aets-giken@hiroshima-aktk.com

IEICE GNW-68 | Kyushu Sangyo University | March 9, 2026

1

Background & Motivation

In ROCm 7.2, gfx900 is not listed in the official support matrix; however, this local environment retains residual build/runtime paths and shipped artifacts.
For students and small labs, the key question is whether execution is possible, stable, and reproducible under specific configurations.
We built a matched dual-backend testbed on the same Vega device to separate support policy, backend choice, and runtime behavior.
Goal: examine whether "not officially supported" and "not executable in any configuration" are the same statement — within this tested setup.

Scope is explicitly this environment + these runs. No general claim about Vega across all setups is made.

2

Problem Statement

Legacy Vega GPUs are often treated as categorically unsuitable for modern ROCm workloads.
Success or failure may be decided at different layers: distribution presets, build targets, init validation, backend macros, or runtime compute paths.
We ask: in these runs, is the observed failure caused by Vega hardware itself, or by a specific backend compute path under matched conditions?

Backend path isolation — not hardware condemnation — is the diagnostic question.

3

Evidence from Investigation: gfx900 Gate Matrix

Layer	Current handling of gfx900	Blocks / Allows
Official matrix (ROCm 7.2)	gfx900 not listed in GPU target list	Blocks (official scope)
CMake filter (ollama)	CMakeLists.txt:127 default regex `^gfx(94[012]\|101[02]\|1030\|110[012]\|120[01])$` — gfx900 excluded; manual `AMDGPU_TARGETS=gfx900` override possible	Blocks default / Allows manual
rocBLAS artifacts	Kernels.so-000-gfx900.hsaco found in /usr/lib/ollama/rocblas/library/	Allows (this env)
ROCm 7.2 package	MIOpen Perf DB: gfx900 = 169 K lines (gfx1100: none); rocBLAS: 128 files (gfx1100: 96); firmware: 16× vega10 blobs	Shipped (exceeds RDNA 3)
Runner init validation	Journal: "Device 0: AMD Radeon RX Vega, gfx900:xnack- (0x900), VMM: no, Wave Size: 64" — recognized as valid agent	Allows (this env)
Source macros	common.cuh:65 `GGML_CUDA_CC_VEGA=0x1000900`; `__gfx900__` dp4a path retained (software emulation via inline asm)	Allows
Runtime execution	In the matched runs, ROCm/HIP (override-assisted): all tested num_gpu values OK. Vulkan: SIGSEGV for num_gpu >= 1.	Backend-dependent (tested scope)

Source: work_log/investigations/2026-03-07_gfx900_gate_matrix.md | ollama/CMakeLists.txt:127 | ggml-cuda/common.cuh:65

Six of seven examined layers retained paths allowing execution in this environment. Official non-listing ≠ immediate impossibility — but scope is restricted to this setup.

4

CMake Default Filter: Exact Source Evidence

Source: ollama/CMakeLists.txt:121-128

check_language(HIP)
if(CMAKE_HIP_COMPILER)
    if(NOT AMDGPU_TARGETS)
        find_package(hip REQUIRED)
        list(FILTER AMDGPU_TARGETS INCLUDE REGEX
            "^gfx(94[012]|101[02]|1030|110[012]|120[01])$")
    endif()
    if(AMDGPU_TARGETS)
        add_subdirectory(.../ggml-hip)

Regex group	Targets included	Architecture
`94[012]`	gfx940, gfx941, gfx942	CDNA3 (MI300)
`101[02]`	gfx1010, gfx1012	RDNA1 (RX 5000)
`1030`	gfx1030	RDNA2 (RX 6000)
`110[012]`	gfx1100, gfx1101, gfx1102	RDNA3 (RX 7000)
`120[01]`	gfx1200, gfx1201	RDNA4 (RX 9000)
(absent)	gfx900 — NOT included	GCN5 / Vega56/64

gfx900 is excluded from ollama's default HIP build. The installed libggml-hip.so required an explicit AMDGPU_TARGETS override or pre-compiled artifact.

5

Code Trace: num_gpu Semantics (Exact Source Lines)

Stage	Code location	Exact content / Interpretation
Python Client	ollama/_types.py:104-110	`class Options: num_gpu: Optional[int] = None` — raw int, no semantic annotation
API call	ollama/_client.py:281-305	Passed as `options=options` in `/api/generate` POST body
Go type	api/types.go:604	NumGPU int `json:"num_gpu,omitempty"` in Runner struct
Default value	api/types.go:1075	`NumGPU: -1, // -1 here indicates that NumGPU should be set dynamically`
CLI help text	cmd/interactive.go:112	`"/set parameter num_gpu <int> The number of layers to send to the GPU"`
Server	llm/server.go:992, 1063-1076	`assignLayers(..., s.options.NumGPU, ...)`; requestedLayers upper-bounded by len(layers)
Runner	runner/llamarunner/runner.go:906-924	`numGPU += len(layers.Layers)`; `NumGpuLayers: numGPU`
Bridge	llama/llama.go:264-267	`cparams.n_gpu_layers = C.int(params.NumGpuLayers)`
llama.cpp API	include/llama.h:289	`int32_t n_gpu_layers; // number of layers to store in VRAM, a negative value means all layers`

num_gpu = offloaded layer count, not GPU device count. Journal confirms: "offloaded 2/25 layers to GPU" when num_gpu=2. Do not treat num_gpu>=2 as multi-GPU evidence.

6

HIP Backend: gfx900 Architecture Constants & dp4a Emulation

Constant / Macro	Value	Comment (ggml-cuda/common.cuh)
`GGML_CUDA_CC_GCN4`	0x1000803	Tonga/Fiji/Polaris — minimum for fast fp16 (line 64)
`GGML_CUDA_CC_VEGA`	0x1000900	Vega56/64 — minimum for fp16 dual issue (line 65)
`GGML_CUDA_CC_VEGA20`	0x1000906	MI50/Radeon VII — minimum for dp4a hardware (line 66)
`GGML_CUDA_CC_IS_GCN(cc)`	OFFSET_AMD < cc < CDNA1	gfx900 falls in GCN range, NOT CDNA (line 86)

dp4a dispatch in ggml_cuda_dp4a() (common.cuh:666–704):

Condition	Implementation	Cost
CDNA / RDNA2 / `__gfx906__`	`__builtin_amdgcn_sdot4(a, b, c, false)`	1 instruction (hardware)
RDNA3 / RDNA4	`__builtin_amdgcn_sudot4(...)`	1 instruction (hardware)
`RDNA1 \|\| __gfx900__`	6-insn inline asm: v_mul_i32_i24 ×4 + v_add3_u32 ×2	Software emulation (line 672–685)

V_DOT2_F32_F16_AVAILABLE (common.cuh:715) also excludes gfx900 — no hardware fp16 dot product instruction.

gfx900 executes in the HIP backend but uses software dp4a emulation — a documented architectural limitation, not a bug.

7

Engineering Challenges

gfx900 is not listed in the ROCm 7.2 official support matrix, yet residual build/runtime paths remain in this environment — distinguishing policy from capability requires layered analysis.
num_gpu is often misread as device count, causing wrong failure attribution when num_gpu>=2 is tested.
Under matched qwen3.5:2b conditions in these runs, the same Vega hardware shows different outcomes across backends — hardware alone does not determine the result.
ROCm results in this setup depend on HSA_OVERRIDE_GFX_VERSION=9.0.0; this is an override-assisted configuration, not a standard deployment.
Diagnosis requires layered evidence (source, logs, run artifacts), not single-run anecdotes.

The three independent variables — support policy, backend choice, and override configuration — must be held separate in any causal claim.

8

Evidence-First Investigation Strategy

Layer	Intervention	Effect
L1: Distribution / Build	Inspect support matrix, presets, target filters, installed artifacts	Locate where gfx900 is blocked/allowed at the build layer
L2: API Semantics	Trace num_gpu from client → server → runner → llama.cpp	Establish correct meaning: offloaded layers, not device count
L3: Runtime Comparison	Matched ROCm (:11435) vs Vulkan (:11434) — same hardware, same model, same workload	Isolate backend-specific failure in these runs
L4: Evidence Capture	result.json, journal, backend_probe, rocm-smi, ollama ps	Reproducible claims with run_id and file-path linkage

Principle: falsifiable diagnosis over optimistic one-off success.

Each layer answers a distinct question. Collapsing them leads to misattribution.

9

Experimental Setup

Component	Specification
GPU	AMD Radeon RX Vega 56 (gfx900), 8 GB HBM2
OS / Kernel	EndeavourOS, Kernel 6.12.74-1-lts
ROCm endpoint	Ollama 0.17.5 (03-07) / 0.17.6 (03-08) via :11435, libggml-hip.so
Vulkan endpoint	Ollama 0.17.4 via :11434, libggml-vulkan.so
Model / workload	qwen3.5:2b, NUM_PREDICT=512, num_gpu=0,1,2,-1
ROCm override	HSA_OVERRIDE_GFX_VERSION=9.0.0 (ROCm service only; Vulkan service: no override applied)

Override note: The ROCm result was obtained under an override-assisted service configuration (HSA_OVERRIDE_GFX_VERSION=9.0.0). This supports conditional executability in this setup — not official support restoration. The Vulkan service ran without any override.

Two services, same hardware. Matched one-epoch runs were repeated across two days; ROCm result is override-dependent, Vulkan result is not.

10

Matched Results — Reproduced Across Two Days

num_gpu	ROCm (:11435) run_20260307_012643 [override-assisted]	Vulkan (:11434) run_20260307_013050 [no override]
0	OK (46.689s, eval_count=512)	OK (45.002s, eval_count=512)
1	OK (48.742s)	HTTP 500 + SIGSEGV (2.672s)
2	OK (47.748s)	HTTP 500 + SIGSEGV (7.640s)
-1	OK (44.265s)	HTTP 500 + SIGSEGV (9.816s)

Easy reading guide (non-specialist)

num_gpu means how many "thinking blocks" are sent to GPU, not how many GPU cards exist.
OK means the answer finished normally. HTTP 500 + SIGSEGV means the program crashed during calculation.
When num_gpu=0, both backends work. When num_gpu is 1 or more, only Vulkan crashes in this qwen test.
Seconds (s) are total time; in error rows this is mostly "time until crash".

Matched rerun on 2026-03-08: ROCm run_20260308_201305 (0.17.6, override-assisted) again passed all tested values; Vulkan run_20260308_201641 again failed for num_gpu=1/2/-1 with nearly identical crash times (2.70 / 7.64 / 9.81s).

Observation (in these matched qwen3.5:2b runs): Vulkan failure is conditional on num_gpu >= 1, not universal. ROCm/HIP passed all tested values under override-assisted configuration on both days.

The ROCm/HIP vs Vulkan difference for qwen3.5:2b is no longer a one-off matched run; the same pattern was reproduced on 2026-03-07 and 2026-03-08.

11

Run-Level Data Snapshot

Run	num_gpu	status	elapsed_sec	eval_duration(ns)	response_chars
run_20260307_012643 (ROCm)	0	ok	46.689	42016540155	0
run_20260307_012643 (ROCm)	1	ok	48.742	43853670072	1809
run_20260307_012643 (ROCm)	2	ok	47.748	42307474936	1951
run_20260307_012643 (ROCm)	-1	ok	44.265	42084598410	1849
run_20260307_013050 (Vulkan)	0	ok	45.002	41778665097	0
run_20260307_013050 (Vulkan)	1/2/-1	error	2.672 / 7.640 / 9.816	—	—
run_20260308_201305 (ROCm)	0/1/2/-1	all ok	47.98 / 49.18 / 48.30 / 45.17	512 each	normal completion
run_20260308_201641 (Vulkan)	0	ok	45.53	512	normal completion
run_20260308_201641 (Vulkan)	1/2/-1	error	2.70 / 7.64 / 9.81	—	—
run_20260308_201924 (Vulkan)	0 × 5 epochs	5/5 ok	50.03 / 42.46 / 43.18 / 42.67 / 43.07	512 each	no crash

What each column tells us (simple words)

Run: one experiment batch ID (like a notebook page number).
num_gpu: how many model layers are moved to GPU.
status: success or failure.
elapsed_sec: wall-clock seconds until finish (or until crash).
eval_duration(ns): model calculation time only; bigger number means slower compute.
response_chars: output text length. Zero does not always mean crash.

Data source: vega_work_log/*/result.json | 16 matched-record rows extracted from the full log set.

Raw numbers now include one matched rerun and a 5-epoch Vulkan num_gpu=0 stability check. Error elapsed time reflects time-to-crash, not inference time.

12

Vulkan Failure Timeline (Journal Evidence)

Timestamp (JST)	Event	Meaning
01:31:37	load_backend: libggml-vulkan.so loaded; "int dot: 0 \| matrix cores: none"	Backend init completed; hardware feature caps confirmed
01:31:38	offloaded 1/25 layers to GPU (Vulkan0); compute graph Vulkan0: 257.1 MiB	Model load and GPU allocation completed normally
01:31:39	llama runner started in 1.72 seconds	Runner startup succeeded
01:31:39	SIGSEGV: segmentation violation (PC=0x7efbc5b801b5, addr=0x3a8000)	Runner process crash — first inference batch, not during load
01:31:39	ggml_backend_sched_graph_compute_async → computeBatch(runner.go:716)	Crash in compute graph scheduling path, via cgo
01:31:39	server.go: post predict … EOF	API side sees runner termination as EOF → HTTP 500
01:31:47 / 01:31:57	Same sequence repeats for num_gpu=2 and -1	Consistent pattern within these runs for all GPU-offload cases

What each column tells us (simple words)

Timestamp: when each step happened.
Event: what the system said or did at that moment.
Meaning: why that step matters for our diagnosis.
The key story is: load finished first, then the crash happened when real calculation started.

Source: vega_work_log/run_20260307_013050/ollama_journal_since_start.txt

Crash occurs at first compute, not at load — consistent with a compute-path issue, not an allocation or init issue.

13

Vulkan Crash: Full Stack Trace (Journal, 01:31:39 JST)

From: vega_work_log/run_20260307_013050/ollama_journal_since_start.txt — immediately after runner reported ready

SIGSEGV: segmentation violation
PC=0x7efbc5b801b5 m=41 sigcode=1 addr=0x3a8000
signal arrived during cgo execution

goroutine 853 [syscall]:
runtime.cgocall(0x55e54a0e2710, 0xc0000bdaa0)
    runtime/cgocall.go:167
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(...)
    _cgo_gotypes.go:979
github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(...)
    ml/backend/ggml/ggml.go:825 +0x1b2
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(...)
    runner/ollamarunner/runner.go:716 +0x862
github.com/ollama/ollama/runner/ollamarunner.(*Server).run.gowrap1()
    runner/ollamarunner/runner.go:459

How to read this block (simple words)

SIGSEGV means the program touched memory in a bad way and crashed.
cgo execution means the Go program was inside C/C++ backend code at the moment of failure.
computeBatch means this happened during real model calculation, not while just loading files.
This block does not tell us the exact Vulkan API call yet; it tells us which part of the software stack was active when it crashed.

Crash point: C function ggml_backend_sched_graph_compute_async called via cgo. Go runtime receives SIGSEGV and terminates the runner. The exact Vulkan API call within that C function is not resolved from this trace alone.

Stack trace is factual. Root cause within ggml_backend_sched_graph_compute_async (C/Vulkan side) remains unresolved — this is a stated limitation.

14

Targeted Follow-Up Tests: Model Dependence & Crash Timing

Test	Run / Condition	Observed result	Interpretation
Model swap	run_20260308_212345, Vulkan, `MODEL=phi4-mini:latest`, `NUM_GPU=0,1`	`num_gpu=0`: ok (18.82s, 318 tok); `num_gpu=1`: ok (31.55s, 512 tok)	Offload crash is not universal on gfx900 Vulkan; it is model/workload-dependent.
Token-count control	run_20260308_212254, Vulkan, qwen3.5:2b, `NUM_PREDICT=128`, `NUM_GPU=1`	error at 2.858s (vs mean 2.653s at `NUM_PREDICT=512`)	Crash timing is largely independent of requested output length.
Timing statistics	3 qwen3.5:2b Vulkan runs, `num_gpu>=1`	`num_gpu=1`: 2.653s ±0.059s; `2`: 7.630s ±0.014s; `-1`: 9.819s ±0.016s	Failure timing is highly deterministic, not obviously a random race.

What each column tells us (simple words)

Test: what extra question we asked.
Run / Condition: which model and settings we used.
Observed result: what actually happened.
Interpretation: the simplest lesson we can take from that result.
The big lesson here is: the crash is not universal, and it is not mainly caused by asking for more output tokens.

Source: run_20260308_212345, run_20260308_212254, and analysis_summary.md derived from 15 runs / 119 records in vega_work_log/.

Refined claim: qwen3.5:2b Q8_0 reproducibly crashes on Vulkan offload in this setup, but phi4-mini does not, and the crash timing is largely token-count independent. The failure is backend-sensitive and model-dependent.

15

Vulkan: gfx900 AMD_GCN Classification & Feature Impact

Code location	Content	Impact on gfx900
`ggml-vulkan.cpp:250-252`	`enum vk_device_architecture { OTHER, AMD_GCN, AMD_RDNA1, AMD_RDNA2, AMD_RDNA3 }`	gfx900 → AMD_GCN branch
`ggml-vulkan.cpp:296-297`	`if (maxSubgroupSize==64 && minSubgroupSize==64) return AMD_GCN;`	GCN wave64-only → classified AMD_GCN
`ggml-vulkan.cpp:3964`	`use_subgroups = subgroup_arithmetic && arch != AMD_GCN`	Subgroup arithmetic disabled for gfx900
`ggml-vulkan.cpp:3241-3243`	AMD_GCN + open-source driver → altered warptile: `{256,64,64,32,16,16,2,2,2,1,16}`	Different matrix multiply tile parameters from RDNA
`ggml-vulkan.cpp:3953-3957`	AMD_GCN: `rm_stdq=2, rm_kq=4, rm_stdq_int=4`	Different row-reduce parameters

Journal evidence (01:31:37):

ggml_vulkan: 0 = AMD Radeon RX Vega (RADV VEGA10) (radv) |
  uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 |
  shared memory: 65536 | int dot: 0 | matrix cores: none

int dot: 0 = VK_KHR_shader_integer_dot_product not hardware-accelerated. matrix cores: none = no cooperative matrix. Both are consistent with source-level AMD_GCN code paths.

Vulkan correctly identifies gfx900 as AMD_GCN and applies GCN-specific code paths. Whether those paths are crash-safe for this GPU is what these runs expose.

16

Deep Code Tracking Summary (11-Point Trace)

Point	File / Line	What it establishes
1–2	ollama/_types.py:104-110, _client.py:281-305	num_gpu is a load-time option forwarded as-is to /api/generate; no GPU-count semantics in client
3–4	ollama/api/types.go:600-608, 1071-1076	NumGPU belongs to model-load options; default -1 means dynamic policy, not "use all GPUs"
5	ollama/cmd/interactive.go:112	CLI defines num_gpu explicitly as "the number of layers to send to the GPU"
6–7	ollama/llm/server.go:992, 1063-1076	NumGPU is passed as requestedLayers; capped at len(layers) — confirms layer semantics
8	runner/llamarunner/runner.go:906-924	NumGpuLayers accumulated from layer counts per device — not device count
9–11	llama/llama.go:264-267, include/llama.h:289	n_gpu_layers: "number of layers to store in VRAM; negative value means all layers"

Nine independent source points converge on the same conclusion: num_gpu controls layer offload count, not GPU device selection.

17

Failure Diagnosis

Hypothesis	Expected evidence	Observed (in these runs)
Vega is generally unusable	Both backends fail on same GPU	Contradicted: ROCm/HIP succeeded in these runs
num_gpu means GPU count	Failure implies multi-GPU entry	Contradicted: code trace establishes layer-count semantics
Model-/workload-dependent Vulkan compute-path issue on AMD_GCN	qwen3.5:2b fails at first compute batch via cgo, while phi4-mini completes at num_gpu=1	Consistent with observed evidence

Four-Tier Verdict (Based on Collected Evidence)

Tier	Statement
1. Official support	gfx900 is not officially supported in ROCm 7.2 or ollama's default build targets.
2. Conditional executability	In this environment, with HSA_OVERRIDE_GFX_VERSION=9.0.0 and pre-existing artifacts, ROCm/HIP ran successfully across all tested num_gpu values on two matched-run days.
3. Shipped artifacts	The ROCm 7.2 package ships gfx900 Perf DB (169 K lines), 128 rocBLAS pre-compiled kernels, and 16 firmware blobs — exceeding gfx1100 (RDNA 3) and gfx1030 (RDNA 2) on multiple metrics. This implies active build-pipeline inclusion.
4. Backend-dependent stability	Under matched qwen3.5:2b conditions across two days, Vulkan SIGSEGV'd for num_gpu>=1 while ROCm/HIP did not; Vulkan remained stable at num_gpu=0, including a 5-epoch run. A phi4-mini follow-up completed at num_gpu=1, indicating model dependence.

rocBLAS Pre-compiled Files

gfx942 242

gfx906 156

gfx900 128

gfx1100 96

gfx1030 88

MIOpen Perf DB (top variant, ×1 K lines)

gfx942 470 K

gfx906 235 K

gfx900 169 K

gfx1030 111 K

gfx1100 0

The four tiers are independent claims. Tier 2 does not contradict Tier 1; Tier 3 shows that vendor packaging goes beyond code residue; Tier 4 is supported by two matched qwen runs, a 5-epoch Vulkan num_gpu=0 control, a phi4-mini counterexample, and deterministic crash-timing statistics.

18

Reproducibility Package

Artifact	Path	Role
Primary metrics	vega_work_log/run_*/result.json	Status, latency, error_type per num_gpu case
Crash phase logs	vega_work_log/run_*/ollama_journal_since_start.txt	Full stack trace and EOF timing
Backend identity	vega_work_log/run_*/backend_probe.txt	Confirms ROCm vs Vulkan service path, device recognition
Aggregate analysis	analyze_runs.py + analysis_summary.md	Full-log aggregation over 15 runs / 119 records; crash timing statistics and pivots
Code semantics trace	work_log/investigations/2026-03-07_numgpu_semantics_trace.md	Line-level evidence for num_gpu semantics
Gate analysis	work_log/investigations/2026-03-07_gfx900_gate_matrix.md	Layered block/allow model for gfx900

Replay command: OLLAMA_HOST=http://127.0.0.1:11435|11434 EPOCHS=1 NUM_PREDICT=512 NUM_GPU=0,1,2,-1 python vega-loop_qwen_rocm.py

Stability check: OLLAMA_HOST=http://127.0.0.1:11434 EPOCHS=5 NUM_GPU=0 NUM_PREDICT=512 python vega-loop_qwen.py

Every claim in this deck is linked to a run_id + file path. The matched rerun, the Vulkan num_gpu=0 control, the targeted phi4-mini / NUM_PREDICT follow-ups, and the aggregate analysis are all reproducible from the current files.

19

Limitations & Future Work

Limitations:

Matched comparison is still centered on qwen3.5:2b. However, the one-epoch matched run was reproduced on two days, Vulkan num_gpu=0 was additionally stable for 5 epochs, and phi4-mini did not reproduce the num_gpu=1 crash.
Ollama versions differ between endpoints (0.17.4 Vulkan vs 0.17.5/0.17.6 ROCm) — a potential confound remains.
ROCm result depends on HSA_OVERRIDE_GFX_VERSION=9.0.0; outcome without the override is not tested here, and the override-off check was deferred because it requires service restart and recovery handling.
Root cause within ggml_backend_sched_graph_compute_async (C/Vulkan side) is not resolved from the Go stack trace alone.
The new phi4-mini result indicates that gfx900 Vulkan stability is model- or workload-dependent; the exact trigger in qwen3.5:2b Q8_0 remains unresolved.

Future work: Additional models and quantization formats; matched Ollama version pair; same-version override-off vs override-on comparison; deeper Vulkan trace (RADV validation layers); broader multi-epoch stress under GPU-offload settings.

The new rerun removes the strongest one-off criticism, but the center of gravity has shifted: the key unresolved variable is now model/workload specificity, alongside version mismatch and override dependence.

20

Takeaway & Key Messages

Not officially supported

gfx900 is absent from the ROCm 7.2 support matrix and ollama's default HIP build targets. This is a factual statement about vendor policy, not execution capability.

Conditionally executable

In this environment, under override-assisted configuration, ROCm/HIP ran across all tested num_gpu values on both matched-run days. Five of six examined gate layers retained executable paths.

Shipped beyond code residue

ROCm 7.2 ships gfx900 Perf DB (169 K lines), 128 rocBLAS kernels, and 16 firmware blobs — exceeding RDNA 3/2 on multiple metrics. This implies active build-pipeline inclusion, not passive code survival.

Backend-dependent stability

For qwen3.5:2b across two matched runs, Vulkan SIGSEGV'd for num_gpu>=1 while ROCm/HIP did not; Vulkan stayed stable at num_gpu=0 for 5/5 epochs. A phi4-mini follow-up passed at num_gpu=1, so the failure is not universal to gfx900 Vulkan.

Based on the collected evidence: "not officially supported" and "not executable in any configuration" are distinct statements. The shipped-artifacts finding adds a fourth dimension: vendor packaging actively includes gfx900 in ways that surpass some supported architectures.

本資料が主張しないこと / This presentation does not claim that...

AMD の社内意思決定過程を断定するものではない / Internal decision-making processes are asserted.
llvm-project-private#389 の内容を推定で補完するものではない / The content of the private issue has been inferred.
本資料の事例が ROCm 全体の一般法則として確定しているとするものではない / A single case is generalized into a universal rule.
AMD の support policy 全体を完全に代表するものではない / AMD's support policy as a whole is fully represented.
AMD または特定個人への批判を意図するものではない / Any specific organization is being criticized.