MI25 / gfx900 で GPU inference はどこまで通るか

How far GPU inference runs on MI25 / gfx900

ROCm 7.2 と Ollama source build のローカル観測をもとに、MI25 / gfx900 上で GPU inference path がどのように observed されたかを整理するケーススタディです。

A case study organizing how the GPU inference path was observed on MI25 / gfx900, based on local observations with ROCm 7.2 and an Ollama source build.

MI25 gfx900 ROCm 7.2 Ollama source build local observation local observation

このセットアップでは、MI25 / gfx900 上で GPU inference path が observed された。今回の主要な不安定要因は、GPU 能力そのものよりも backend library の配置整合性にあったと読める。 In this setup, a GPU inference path was observed on MI25 / gfx900. The dominant reliability risk appears to have been backend-library placement integrity rather than intrinsic GPU capability itself.

注意: このページは 1 つのローカルセットアップから観測された事実を整理したものであり、MI25 全般や gfx900 全般にそのまま一般化するものではありません。 Caution: This page organizes observations from one local setup only. It should not be read as a universal claim about all MI25 or all gfx900 deployments.

Fact: 確認できた環境

Fact: Environment confirmed

まず、今回の観測がどの条件の上で行われたかを切り分けておきます。

First, separate the conditions under which these observations were made.

項目	Item	内容	Observed value
GPU	Radeon Instinct MI25 (gfx900)
Runtime	ROCm 7.2
Serving path	Ollama user service (source build)
Key runtime path	`OLLAMA_LIBRARY_PATH=/home/limonene/ROCm-project/ollama-src/build/lib/ollama`

Fact: 以前は何が失敗していたか

Fact: What had been failing before

今回の観測では「MI25 が絶対に無理だった」のではなく、再起動後に CPU fallback が見えやすい状態が先にあった。

The earlier state was not simply “MI25 cannot do this,” but a setup where CPU fallback tended to appear after restarts.

観測された失敗像

Observed failure mode

再起動後に CPU path が現れることがあった
A CPU path sometimes appeared after restart
journal では library=cpu や GPULayers:[] が見えることがあった
Journals sometimes showed library=cpu and GPULayers:[]

原因候補として見えたもの

What appeared to matter

backend directory の参照はあっても、実ファイルが欠けたり揃っていなかったりする状態
A state where backend directories were referenced but the actual runtime files were missing or inconsistent
その状態では GPU 自体より runtime library の整合性が支配的に見えた
In that state, runtime-library integrity appeared more decisive than the GPU itself

Fact: 何を直したか

Fact: What was changed

修正の中心は「MI25 向け backend library をちゃんと runtime path に揃える」ことだった。

The fix path centered on making sure the MI25-targeted backend libraries were actually present and aligned in the runtime path.

適用した修正

Fixes applied

build-ollama-gfx900.sh で backend libraries を再ビルド
Rebuilt backend libraries with build-ollama-gfx900.sh
libggml-hip.so などが runtime path に揃っていることを確認
Confirmed that libggml-hip.so and related files existed at the runtime path
不足時に早めに止まる preflight check を追加
Added preflight checks so the setup fails early when backend files are missing

Fact: どんな証跡が取れたか

Fact: What evidence was observed

今回の公開ページでは、代表例として tinyllama の A/B 実行と deepseek-r1:14b の実行結果を置く。

This page highlights two representative observations: the tinyllama A/B runs and a deepseek-r1:14b run.

ケース	Case	観測	Observation	読めること	Immediate reading
tinyllama A/B	8条件マトリクス、16 phase 合計で `GPU=15`、`UNSURE=1`。その UNSURE も後の rerun では GPU へ寄った。	Across an 8-case matrix and 16 phases total, `GPU=15` and `UNSURE=1` were observed. The unsure case later moved toward GPU on rerun.	少なくともこのセットアップでは、再現性のある GPU path がかなりの割合で見えていた。	At least in this setup, a repeatable GPU path appeared for the large majority of phases.
`deepseek-r1:14b`	`done=true`、`done_reason=length`。journal では `library=ROCm`、`compute=gfx900`、`GPULayers:49`。`rocm-smi` では GPU use 最大 99%、power 最大 217W、VRAM 約 58%。	`done=true` with `done_reason=length`. Journals showed `library=ROCm`, `compute=gfx900`, and `GPULayers:49`. `rocm-smi` showed GPU use up to 99%, power up to 217W, and roughly 58% VRAM use.	少なくともこの実行では、かなり明確な GPU offload の観測がある。	This run provides a fairly clear observation of GPU offload in the tested setup.

Interpretation: ここから何が読めるか

Interpretation: What these facts suggest

今回のセットアップから少なくとも言えそうなことを、観測に引きつけた範囲で整理する。

This section keeps interpretation close to the observed evidence and avoids stronger claims than the setup can support.

主な示唆

Main suggestion

今回の信頼性リスクは、MI25 の intrinsic capability より backend deployment integrity に強く依存していたように見える
The main reliability risk appears more tied to backend-deployment integrity than to intrinsic MI25 capability itself
少なくともこの観測範囲では、「MI25 だから必ず CPU fallback する」とは読めない
Within the observed range, the results do not read as “MI25 necessarily falls back to CPU”

速度メモ

Performance note

deepseek-r1:14b: 約 14.20 s / 140 eval、推定約 9.9 tokens/sec
deepseek-r1:14b: about 14.20 s / 140 eval, roughly 9.9 tokens/sec
tinyllama の代表例: 約 1.56 s / 96 eval、推定約 61.4 tokens/sec
Representative tinyllama case: about 1.56 s / 96 eval, roughly 61.4 tokens/sec

Open Question / Limitation

このページからはまだ断定しないことも明示しておく。

This page also states clearly what it does not yet claim.

まだ言えないこと

What remains open

他の MI25 環境でも同様に再現するかは未確認
It is still unverified whether the same pattern reproduces on other MI25 systems
gfx900 全般の一般法則として読めるかは別問題
Whether this should be generalized to gfx900 as a whole is a separate question
ROCm 公式 support policy の解釈と 1 対 1 に対応づけるものではない
This page does not map one-to-one onto ROCm's official support policy

今回の証跡セット

Evidence set used here

MI25_gfx900_inference-success-summary_20260320.md MI25_logging-and-benchmark-notes.md deepseek14b_generate_20260320_212146.json deepseek14b_journal_20260320_212146.log deepseek14b_rocm_smi_20260320_212146.log tinyllama_path_index_20260320_195741.tsv tinyllama_path_index_20260320_200424.tsv

関連リポジトリ: AETS-MAGI/ROCm-MI25-build

Related repository: AETS-MAGI/ROCm-MI25-build

ケース一覧へ戻る Back to the case-study index 実験の流れを見る Open the experiment history ROCm 全体構造へ Open the ROCm structure page GitHub: ROCm-MI25-build