gfx900 はなぜ今も動くのか
— ROCm における Capability-Based 設計の構造分析

Why Does gfx900 Still Run?
— Structural Analysis of Capability-Based Design in ROCm

AMD Vega (gfx900) が ROCm の公式推奨対象から外れた後も、なぜ複数のライブラリ経路で計算を実行できるのか。コードベース調査・実機検証・git 履歴調査に基づく、工学的構造分析を記録する。

An engineering investigation into why AMD Vega (gfx900) can still execute computation through multiple library paths in the ROCm stack, even after being removed from official support lists — based on source code tracing, runtime verification, and git provenance analysis.

AMD Radeon RX Vega 64 / gfx900 ROCm Stack (MIOpen · rocBLAS · Tensile · CK) 2026-03-15

現時点で最も堅い知見:
gfx900 が今も通れるのは、ROCm の設計が capability-based のフォールバック構造を持っているからである。この構造は gfx900 に限らず「能力が低い世代」全般に対して自然に働く。これは設計の寛容性であり、意図的か否かにかかわらず結果として後方互換が成立している。

Current strongest finding:
gfx900 remains functional because the ROCm stack employs a capability-based fallback architecture. This architecture naturally accommodates any generation lacking specific capabilities — not just gfx900. Regardless of whether this was intentional, backward compatibility is effectively maintained as a structural consequence of the design.

§1. コードベース観測（検証済み事実）

§1. Codebase Observations (Verified Facts)

以下はすべてソースコード調査で根拠を確認した観測事項である。仮説ではない。

All items below are grounded in source code inspection and are factual observations, not hypotheses.

gfx900 の実行経路は3層構造で残存している code_verified

gfx900 execution paths persist across three layers code_verified

層	内容	具体例
維持 (Build)	ビルドシステムに gfx900 ターゲットが残存	rocBLAS `CMakeLists` に ROCm 5.6–7.1 で `gfx900` が継続
管理 (Selection)	Solver / kernel 選択で capability 判定が機能	MIOpen の `IsApplicable` フィルタ、`IsXdlopsSupport` ガード
補充 (Fallback)	高速経路が使えないとき汎用経路へ落ちる	CK `dot4` 不在時の逐次積和、rocBLAS の XF32→FP32 フォールバック

Layer	Description	Example
Build	gfx900 target remains in build systems	rocBLAS `CMakeLists`: `gfx900` listed in ROCm 5.6–7.1
Selection	Capability-based solver/kernel selection	MIOpen `IsApplicable` filters, `IsXdlopsSupport` guard
Fallback	Generic paths activated when fast paths are unavailable	CK sequential multiply-add when `dot4` is absent; rocBLAS XF32→FP32 fallback

flowchart LR REQ["Request"] L1["L1 — Build\nrocBLAS · Tensile\ngfx900 target kept"] L2N["Selection ✗\nMLIR · XDLops\nblocked / gfx908+"] L2Y["Selection ✓\nWinograd · ASM v4r1\ngfx900 permitted"] L3["L3 — Fallback\nhipBLAS→FP32\ndot4→seq · Naive"] REQ --> L1 --> L2N L1 --> L2Y --> L3 style L2N fill:#ffdddd,stroke:#cc4444 style L2Y fill:#ddffdd,stroke:#44aa44

個別のコード根拠 code_verified

Specific code evidence code_verified

MIOpen: gfx900 向けの ASM implicit GEMM v4r1 dynamic 経路が残存
MIOpen: ASM implicit GEMM v4r1 dynamic path for gfx900 remains
MIOpen: MLIR iGEMM は gfx900 を明示除外し、別経路（ASM / Winograd）へ落ちる
MIOpen: MLIR iGEMM explicitly excludes gfx900; falls back to ASM / Winograd paths
MIOpen: XDLops 系は共通ガード IsXdlopsSupport で gfx900 を弾く
MIOpen: XDLops solvers blocked by common guard IsXdlopsSupport
MIOpen: Winograd / 旧 ASM 系にも gfx900 生存経路あり
MIOpen: Winograd and legacy ASM solvers have live gfx900 paths
rocBLAS: hipBLASLt → Tensile、XF32 → FP32 の二段フォールバック設計
rocBLAS: Two-stage fallback: hipBLASLt → Tensile, XF32 → FP32
Tensile: gfx900 向け lazy catalog / fallback code object の設計概念が存在
Tensile: Lazy catalog and fallback code object design for gfx900 exists
Tensile: AsmCaps で ISA (9,0,0) の dot4 capability は False
Tensile: AsmCaps reports dot4 capability as False for ISA (9,0,0)
CK: dot4 不在時の逐次積和フォールバック（gfx900 専用ではなく汎用設計）
CK: Sequential multiply-add fallback when dot4 is absent (generic, not gfx900-specific)

§2. 実機検証の結果

§2. Runtime Verification Results

Vega 64 実機上で MIOpen の solver 選択動作と DLOPS/XDLOPS 系の挙動を検証した結果。

Results from solver selection testing and DLOPS/XDLOPS behavior verification on a Vega 64.

FP32 solver の自然選択 runtime_verified

Natural FP32 solver selection runtime_verified

FP32 畳み込みにおいて、以下の solver が Vega 64 上で自然選択されることを確認した:

The following solvers were naturally selected on Vega 64 for FP32 convolution:

ConvBinWinograd3x3U — Winograd 系
ConvBinWinograd3x3U — Winograd family
ConvAsm1x1U — 旧 ASM 系
ConvAsm1x1U — Legacy ASM family
ConvHipImplicitGemmV4R1Fwd — implicit GEMM v4r1
ConvHipImplicitGemmV4R1Fwd — Implicit GEMM v4r1

DLOPS / XDLOPS 系の失敗モード分類 runtime_verified

DLOPS / XDLOPS failure mode taxonomy runtime_verified

solver family ごとに到達点が異なり、失敗モードは以下の3類型に分かれた:

Each solver family fails at a different stage. Three distinct failure modes were identified:

類型	症状	例
Applicability 棄却	`not applicable`, `rc=0x3`	`ConvCkIgemmFwdV6r1DlopsNchw` (全ケース)
Build / tuning 失敗	`MIIR_INVALID_PARAM` / `Perf Db: record not found` / `Code object build failed`	`ConvMlirIgemmFwd`, `ConvHipImplicitGemmForwardV4R5Xdlops`
Runtime abort	`std::vector::operator[]` assertion, `EXIT=134`	`ConvHipImplicitGemmFwdXdlops`

Mode	Symptom	Example
Applicability reject	`not applicable`, `rc=0x3`	`ConvCkIgemmFwdV6r1DlopsNchw` (all cases)
Build / tuning failure	`MIIR_INVALID_PARAM` / `Perf Db: record not found` / `Code object build failed`	`ConvMlirIgemmFwd`, `ConvHipImplicitGemmForwardV4R5Xdlops`
Runtime abort	`std::vector::operator[]` assertion, `EXIT=134`	`ConvHipImplicitGemmFwdXdlops`

同一 3×3 問題で dtype を変えると、FP16 → ConvOclDirectFwd、BFP16 → GemmFwdRest が選択され、solver finder が dtype ごとに別経路を自然選択することも確認された。

Changing dtype on the same 3×3 problem yielded FP16 → ConvOclDirectFwd and BFP16 → GemmFwdRest, confirming per-dtype path differentiation by the solver finder.

§3. 仮説群

§3. Hypotheses

以下はコード根拠に基づく仮説であり、断定ではない。確度を明示して記載する。

The following are hypotheses informed by code evidence, not assertions. Confidence levels are explicitly noted.

仮説 A:「表のサポート」と「設計上のサポート」は別の概念である hypothesis (code evidence)

Hypothesis A: "Official support" and "design-level support" are distinct concepts hypothesis (code evidence)

「サポート終了」とは、公式推奨リスト・QA 対象・優先修正対象から外れたことを意味する。しかしそれは、ソフトウェア設計上の実行経路が消滅することとは別の話である。

"End of support" means removal from official recommendation lists, QA targets, and priority fix queues. It does not necessarily mean the elimination of software execution paths.

層	定義	gfx900 の現状
表のサポート	公式推奨・QA対象・ベンチ対象・優先修正	弱い（公式リスト外）
設計上のサポート	抽象化・capability 判定・fallback・backend 切替	まだ残っている
運用上のサポート	バグ報告が通るか、CI にいるか	確認中
生存可能性	明示的に守られていないが、設計のため動ける	成立している可能性が高い

Layer	Definition	gfx900 status
Official support	Listed in official docs, QA-tested, benchmark-targeted	Weak (off-list)
Design-level support	Abstractions, capability checks, fallback, backend switching	Still present
Operational support	Bug reports accepted, present in CI	Under investigation
Survivability	Not explicitly maintained but structurally functional	Likely viable

仮説 B: gfx900 の生存は Capability-Based 設計の自然な帰結である hypothesis (code evidence)

Hypothesis B: gfx900 survival is a natural consequence of capability-based design hypothesis (code evidence)

gfx900 が今も通れるのは「放置された残骸」ではなく、ROCm のライブラリ設計が後方互換・多段フォールバックを強く意識しているため、その設計の自然な結果として gfx900 が通れる経路が残っている、という読み方ができる。

Rather than being a neglected remnant, gfx900's continued functionality can be read as a natural consequence of ROCm's library design, which is built around backward compatibility and multi-stage fallback.

MIOpen の solver finder は候補列挙 + IsApplicable フィルタ方式であり、「ある世代が使えない経路は落として次へ」という汎用設計
MIOpen's solver finder uses candidate enumeration + IsApplicable filtering: a generic design that drops unsupported paths and moves to the next
dot4 不在時の逐次積和は gfx900 専用ではなく、dot4 capability が立たない世代全般向けの汎用互換レイヤ
The sequential multiply-add fallback when dot4 is absent is not gfx900-specific — it is a generic compatibility layer for any generation without dot4
gfx900 は GCN 世代であり rDNA でも cDNA でもない。アーキテクチャファミリーを超えてこの構造が機能している事実は、設計が capability ベースであることの独立した根拠
gfx900 is GCN — neither rDNA nor cDNA. The fact that this design works across architecture families is independent evidence of a capability-based approach

仮説 C: 保守主体は層ごとに異なる hypothesis (partial evidence)

Hypothesis C: Maintenance ownership differs by layer hypothesis (partial evidence)

ここで重要なのは、「AMD かコミュニティか」を一語で決めるより、投入主体・維持主体・運用主体・修正可能主体を分けることです。

The key is not to force a single answer such as “AMD” or “community,” but to separate insertion ownership, maintenance ownership, operational ownership, and repair-capable ownership.

主体	問い
投入主体	その分岐やコードを最初に入れたのは誰か
維持主体	その後も壊れないように残し続けているのは誰か
運用主体	見つけて、動かして、回避策を共有しているのは誰か
修正可能主体	その層の問題を現実に直せるのは誰か

Role	Question
Insertion owner	Who introduced the branch or code?
Maintenance owner	Who keeps it from breaking over time?
Operational owner	Who discovers, runs, and shares workarounds?
Repair-capable owner	Who can realistically fix the issue at that layer?

flowchart LR P1["MLIR iGEMM\n(disabled)"] P3["Winograd FP32\n(active)"] P2["ASM v4r1\n(legacy)"] P6["Tensile fallback\n(community-patched)"] P1 --> I1["AMD(M)\nZhuoran Yin 2021-12"] P3 --> I3["ExtC\nTamazov 2017+"] P2 --> I2["AMD(C)\ncarlushuang 2020"] P6 --> I6["AMD(C)+ExtC\n2022–2024"] I3 --> M3["AMD staff\npatches 2021–25"] I2 --> M2["deletion-cost\nresidual"] I6 --> M6["ext PR merged\nthen reverted #1862→#1879"] style I1 fill:#ddeeff,stroke:#3366aa style I3 fill:#eeffee,stroke:#44aa44 style I2 fill:#eeffee,stroke:#44aa44 style M3 fill:#ddeeff,stroke:#3366aa style M2 fill:#f5f5f5,stroke:#999 style M6 fill:#fff8cc,stroke:#aaaa44

投入主体について強く言えること: MLIR の gfx900 除外は AMD 社員 Zhuoran Yin の commit 2407d2f であり、private issue 参照も AMD 側にある
What is strong for insertion ownership: the MLIR gfx900 exclusion was commit 2407d2f by AMD engineer Zhuoran Yin, with a private AMD-side issue reference
まだ弱いこと: そこからそのまま ROCm 全体の維持主体や運用主体まで一般化すること
What remains weak: generalizing from that insertion point to overall maintenance ownership or operational ownership across ROCm
より妥当な読み: AMD 起点の重要分岐 + capability/fallback 設計の残存 + コミュニティによる発見・運用・知見共有の重なり
More defensible reading: an overlap of AMD-origin critical branches, surviving capability/fallback design, and community discovery/operation/knowledge-sharing

補足: この論点は gfx900 を超えて ROCm 一般の設計思想に広がるため、別ページ What gfx900 Reveals About ROCm に整理した。 Note: Because this question extends beyond gfx900 into ROCm-wide design, a separate page summarizes it: What gfx900 Reveals About ROCm.

仮説 D: rDNA/cDNA 分離は再統合前の「繋ぎ」だった可能性 speculative

Hypothesis D: rDNA/cDNA split may have been a transitional phase speculative

AMD は rDNA（ゲーム向け）と cDNA（計算向け）を別系統で開発してきたが、2024年に UDNA による統合方向が示されている。もし毎回ぜんぶ切り捨てる設計なら、アーキテクチャが増えるたびにコンパイラ・ランタイム・ライブラリの保守コストが際限なく膨れる。ROCm のような大規模 OSS スタックが持続するには、どこかに共通化の芯が必要であり、結果として後方互換や抽象化の筋を残しておくのが合理的である。

AMD has developed rDNA (gaming) and cDNA (compute) as separate lines, but indicated a UDNA unification direction in 2024. If each new architecture completely discarded the old, compiler/runtime/library maintenance costs would grow without bound. A large-scale OSS stack like ROCm requires a shared abstraction core — making it rational to preserve backward compatibility and abstraction as a structural investment.

注意: この仮説は現時点で最も推測的である。UDNA の正体は未確定であり、git 履歴で「統合を見越した設計意図」が読めてから語るべき話である。現在のコードに「そう読めるだけの構造がある」ことは観測できるが、意図の断定は避ける。 Caution: This is the most speculative hypothesis. The nature of UDNA is unconfirmed, and claims about intentional unification design should be deferred until git history evidence is available. The code structure is consistent with this reading, but intent cannot be asserted.

仮説 E:「意図は未確定、構造は観測できる」 hypothesis (code evidence)

Hypothesis E: "Intent is undetermined; structure is observable" hypothesis (code evidence)

UDNA の正体そのものはまだ断言できない。しかし将来の再統合コストを下げるためには、後方互換や抽象化の筋をどこかに残しておくのが合理的である。現在のコードベース調査は、少なくとも「そう読めるだけの構造が存在する」ことをかなり強く支持している。したがって意図的設計だったか副産物だったかは保留しつつも、結果論として再統合しやすい形に寄っている可能性は高い。

The nature of UDNA cannot yet be determined. However, preserving backward-compatible abstractions is rational for reducing future re-integration costs. The codebase investigation strongly supports the existence of structures consistent with this reading. Whether this was intentional design or an emergent property remains open — but the result converges toward a re-integration-friendly form.

補足: 見落としやすい第三の可能性 — 「積極的に消していない」

Addendum: An easily overlooked third possibility — "Not actively removed"

「なぜ残っているか」の答えとして、「設計の自然な帰結」と「コミュニティの保守」の二択で考えがちだが、実はもう一つある。コスト-便益的に「消すコストのほうが高いから残っている」という状態である。これは「積極的に維持している」とも「積極的に切り捨てている」とも違い、git blame で「誰も最近触っていない」という結果が出れば支持される。

When asking "why does it remain?", it is tempting to choose between "natural consequence of design" and "community maintenance". But there is a third state: it remains because the cost of removing it exceeds the cost of leaving it. This differs from both active maintenance and active deprecation, and would be supported by git blame results showing no recent modifications.

§4. 出所調査（Git 履歴分析）

§4. Provenance Investigation (Git History Analysis)

gfx900 の MLIR 除外がいつ・誰により・なぜ投入されたかの追跡結果。

Tracking who introduced the gfx900 MLIR exclusion, when, and why.

MLIR iGEMM の gfx900 除外: Issue #389 の正体 code_verified (git blame)

MLIR iGEMM gfx900 exclusion: The nature of Issue #389 code_verified (git blame)

ConvMlirIgemmFwd::IsApplicable() における gfx900 明示除外について、git blame で provenance を確定した。

Provenance of the explicit gfx900 exclusion in ConvMlirIgemmFwd::IsApplicable() was established via git blame.

項目	内容
コミット	`2407d2f556c7`
著者	Zhuoran Yin (`zhuoryin@amd.com`) — AMD 社員
日時	2021-12-22
PR	MIOpen `#1328`: "[MLIR] Disable gfx900 from non-xdlops solver"
参照 issue	`ROCm/llvm-project-private/issues/389` — AMD 社内非公開リポジトリ
影響範囲	fwd / bwd / wrw の全3ファイルに同一パターンを一括投入

Item	Detail
Commit	`2407d2f556c7`
Author	Zhuoran Yin (`zhuoryin@amd.com`) — AMD employee
Date	2021-12-22
PR	MIOpen `#1328`: "[MLIR] Disable gfx900 from non-xdlops solver"
Referenced issue	`ROCm/llvm-project-private/issues/389` — AMD internal private repo
Scope	Same exclusion pattern applied to all 3 files (fwd / bwd / wrw)

構造の特徴: IsMlirSupportedHardware() には gfx900 が「対応ハード」として gfx906/908/90a/942 と並列にリストされている。しかし ConvMlirIgemmFwd::IsApplicable() が個別に gfx900 を除外している。つまり「MLIR 対応と表明しつつ、特定アーチで例外除外」という二層構造であり、設計意図としての残存と個別バグ回避が共存している。 Structural note: IsMlirSupportedHardware() lists gfx900 alongside gfx906/908/90a/942 as "supported hardware". Yet ConvMlirIgemmFwd::IsApplicable() individually excludes gfx900. This two-layer structure — declaring support while making architecture-specific exceptions — indicates coexistence of design-level inclusion and individual bug workarounds.

"Disable" の語感はバグ回避寄りだが、private issue の本文が非公開のため設計判断かバグ回避かの確定はできない。さらに 2026-03-14 の強制実行追試では、-S により IsApplicable() をバイパスすると CompileSolution → GetInvoker まで進み、Perf Db: record not found → boost::optional::get() assert に落ちることも確認された。したがって、現時点で確定しているのは「後段経路のどこかが未整備である」ことであり、単一原因を LLVM バックエンドだけに還元することはできない。

The word "Disable" suggests a bug workaround, but the private issue body is inaccessible, so whether this was a design decision or bug avoidance cannot be confirmed. A 2026-03-14 forced-run follow-up also showed that once -S bypasses IsApplicable(), execution can reach CompileSolution → GetInvoker and then fail with Perf Db: record not found → boost::optional::get() assertion. What is firmly established is therefore that some downstream path remains incomplete; the public evidence does not justify reducing everything to the LLVM backend alone.

追補（2026-03-15）: MiirIsConfigApplicable() の先にある Miir 実装は public な ROCm/rocMLIR（rocmlir-lib.cpp）で追跡可能。miirLowerTuningParams() は Applicability モードで pipeline を実行し、失敗時に MIIR_BUILD_FAILURE を返す。よって未解決なのは Miir 実装の不可視性ではなく、private issue #389 が示す 2021 年当時の判断根拠である。

Addendum (2026-03-15): The Miir implementation behind MiirIsConfigApplicable() is publicly traceable in ROCm/rocMLIR (rocmlir-lib.cpp). miirLowerTuningParams() runs the pipeline in applicability mode and returns MIIR_BUILD_FAILURE on failure. Therefore, the unresolved part is no longer Miir visibility itself, but the 2021 decision context hidden behind private issue #389.

仮説への影響

Impact on hypotheses

仮説 A を具体化: IsMlirSupportedHardware リストへの gfx900 残存は「設計意図としてのサポート表明」であり、solver レベルの skip は別レイヤの問題
Hypothesis A refined: gfx900's presence in IsMlirSupportedHardware constitutes a design-level support declaration; solver-level skip is a separate-layer concern
仮説 B を強化: MLIR 以外の経路（ASM / Winograd）は引き続き生存しており、capability-based な solver 選択が機能している
Hypothesis B strengthened: Non-MLIR paths (ASM / Winograd) remain live, confirming capability-based solver selection works
仮説 A/B をさらに補強: solver レベルの除外と、その後段にある tuning / Perf DB 欠落は別レイヤの問題であり、「サポート」を多層に分けて読む必要がある
Hypotheses A/B further reinforced: solver-level exclusion and downstream tuning / Perf DB absence are separate layers, reinforcing the need to read "support" as a multi-layer concept
仮説 C を部分確定: MLIR の gfx900 除外は AMD 社員が投入した修正であり、コミュニティパッチではない
Hypothesis C partially confirmed: The MLIR gfx900 exclusion was an AMD employee's commit, not a community patch

§5. 「サポート」の再定義

§5. Redefining "Support"

Vega/gfx900 は「完全非対応」というより、「主要サポート対象から外れたが、なお複数の実行経路とフォールバック経路が残っている世代」と捉えるのが適切である。

Vega/gfx900 is better understood as deprecated rather than strictly non-functional: while no longer positioned as a primary support target, multiple execution and fallback paths remain present in the ROCm software stack.

§6. 判定マトリクス（現時点の総括）

§6. Assessment Matrix (Current Summary)

問い	現在の答え	確度
gfx900 の経路はコードに残っているか	残っている	code_verified
それは設計上自然に残っているか	そう読める	hypothesis + code evidence
主要分岐の投入主体は誰か	少なくとも一部は AMD 側。MLIR `gfx900` 除外は AMD 社員 commit	history_verified
維持主体は誰か	未確定。積極維持・削除コスト由来残存・部分的コミュニティ支援の寄与は未分離	hypothesis
運用主体は誰か	コミュニティが発見・運用・回避策共有を担う可能性が高い	hypothesis
修正可能主体は誰か	層ごとに異なる。userspace は外部修正余地があり、backend 根本対応は到達困難な可能性	hypothesis
MLIR iGEMM 除外の投入者は誰か	AMD 社員 Zhuoran Yin	code_verified
MLIR 除外の根拠 issue は公開か	非公開（`llvm-project-private`）	code_verified
除外の性質（設計判断 vs バグ回避）	"Disable" はバグ回避寄り。確定不可	hypothesis
経路はコミュニティ実装で維持されているか	部分的に Yes。MLIR 除外は AMD 投入だが、Tensile の gfx900 残存性は外部 contributor PR（#1595/#1862）で補修	history_verified
実際に実機で選ばれているか	FP32 solver 選択は確認済み	runtime_verified
UDNA / 再統合との関係	意図は未確定、構造は観測できる	speculative

Question	Current answer	Confidence
Do gfx900 execution paths remain in code?	Yes	code_verified
Are they a natural consequence of design?	Consistent reading	hypothesis + code evidence
Who owns key insertion points?	At least some are AMD-side. The MLIR `gfx900` exclusion is an AMD employee commit	history_verified
Who owns long-term maintenance?	Unresolved. Active maintenance, deletion-cost survival, and partial community support are not yet separated	hypothesis
Who owns operation in practice?	The community likely carries discovery, operation, and workaround sharing	hypothesis
Who can realistically repair issues?	Depends on the layer: some userspace issues are reachable, backend root fixes may not be	hypothesis
Who introduced MLIR iGEMM exclusion?	Zhuoran Yin (AMD)	code_verified
Is the root issue public?	No (`llvm-project-private`)	code_verified
Nature of exclusion (design vs bug workaround)?	"Disable" leans bug-workaround. Cannot confirm	hypothesis
Are paths community-maintained?	Partially yes. MLIR exclusion is AMD-authored, while Tensile gfx900 survivability was reinforced by external-contributor PRs (#1595/#1862)	history_verified
Actually selected at runtime?	FP32 solver selection confirmed	runtime_verified
Relation to UDNA / re-integration?	Intent undetermined; structure observable	speculative

§7. 思考過程の記録

§7. Thinking Process Log

以下は調査過程で生じたブレインストーミング・仮説形成・各種コメントの記録である。これは陰謀論ではなく、工学的問いの思考過程であり、調査によってどのように解消・修正されていったかを含めて記載する。結論から読みたい方は §1–§6 のみで十分である。

Below is a record of brainstorming, hypothesis formation, and review comments generated during the investigation. This is not conspiracy theorizing — it is the thinking process behind an engineering question, including how ideas were resolved or corrected through investigation. Readers interested only in conclusions may skip to §1–§6.

初期の問い立て — 何が分からなかったのか

出発点は素朴な疑問だった:

Vega/gfx900 はなぜ今も生きているのか。それは情けで放置されているのか、設計上自然に残っているのか。

この問いには暗黙の前提がある。「サポート終了 = 動かなくなる」という思い込みである。この前提自体を疑うところから調査は始まった。結果として、「動く」と「サポートされている」の間には広いグレーゾーンがあることが判明し、仮説 A（表のサポートと設計上のサポートは別）が生まれた。

Initial framing — What was unknown

The starting point was a naive question:

Why does Vega/gfx900 still work? Is it neglected out of mercy, or naturally preserved by design?

This question contained an implicit assumption: "end of support = stops working". Questioning this assumption was the beginning of the investigation. The result was the discovery of a wide gray zone between "functional" and "supported", which led to Hypothesis A (official support and design-level support are distinct).

仮説の発展と修正 — どう考えが変わったか

段階 1: 「延命されている」仮説
最初は「誰かが gfx900 を特別扱いで延命しているのでは」と考えた。しかしコード調査の結果、dot4 不在時のフォールバックが gfx900 専用ではなく汎用設計であることが判明し、この仮説は棄却された。

段階 2: 「設計の副産物」仮説
MIOpen の solver finder が候補列挙 + IsApplicable フィルタ方式であることを確認した時点で、仮説 B（capability-based 設計の自然な帰結）が形成された。これは最も強いコード根拠を持つ仮説として現在も維持されている。

段階 3: 「UDNA への布石」仮説
rDNA/cDNA 分離と UDNA 統合方向の話が出た時点で、仮説 D（繋ぎとしての分離）が生まれた。しかし議論と調査を深める過程で「UDNA というゴールが見えている今だからそう見えているだけで、証拠を積む前に語りすぎるとロマンで終わる」という批判を受け、仮説 E（意図は未確定、構造は観測できる）に着地した。

段階 4: 「第三の可能性」の発見
「設計の自然な帰結 vs コミュニティの保守」の二項対立で考えていたが、議論の中で「AMD が積極的に消していない」という第三の状態の存在を指摘された。これはコスト-便益的な判断によるものであり、より地に足のついた説明として有力。

Hypothesis evolution — How thinking changed

Stage 1: "Life support" hypothesis
Initially, the assumption was that someone was giving gfx900 special treatment. Code investigation revealed that the dot4 fallback is generic, not gfx900-specific, and this hypothesis was rejected.

Stage 2: "Design byproduct" hypothesis
After confirming MIOpen's solver finder uses candidate enumeration + IsApplicable filtering, Hypothesis B (natural consequence of capability-based design) was formed. This remains the hypothesis with the strongest code evidence.

Stage 3: "UDNA foresight" hypothesis
The rDNA/cDNA split combined with UDNA unification signals gave rise to Hypothesis D (split as transitional). Peer review criticism — "it only looks that way because UDNA is now visible; without evidence this is just romance" — led to the more conservative Hypothesis E (intent undetermined, structure observable).

Stage 4: Discovery of the "third possibility"
The binary framing of "design consequence vs community maintenance" was challenged by peer review, which identified a third state: "AMD hasn't actively removed it". This cost-benefit explanation is potentially the most grounded.

動的検証の試行錯誤 — DLOPS/XDLOPS 系で何が起きたか

solver の静的コード調査だけでは不十分と考え、Vega 64 実機上で solver を強制実行する検証に移った。ここで想定外の現象が次々と発生した:

ConvMlirIgemmFwd 強制実行 → MIIR_INVALID_PARAM / rc=0x7
ConvCkIgemmFwdV6r1DlopsNchw 7ケース → 全件 not applicable / rc=0x3
-s 1 + C/K 極値 + stride 差 + NHWC/NCHW の8ケース → 同じく全件 not applicable
ConvHipImplicitGemmFwdXdlops → CompileSolution/GetInvoker まで進むが assertion abort (EXIT=134)
ConvHipImplicitGemmForwardV4R5Xdlops → xdlops kernel compile 失敗（intrin_mfma_* 不在）

この経験から得られた教訓: DLOPS/XDLOPS 系は「候補名が存在する」ことと「当該 problem で成立する」ことが完全に分離している。solver family ごとに「到達点（applicability 判定の前後）」が異なるため、失敗モード分類（§2）の枠組みが生まれた。

さらに 2026-03-14 の追試では、ConvMlirIgemmFwd 強制実行が CompileSolution / GetInvoker まで進み、Perf Db: record not found → boost::optional::get() assert に落ちることも確認した。つまり強制実行は通常ガードの外側にある未整備な downstream path を露出させる。

Dynamic verification trials — What happened with DLOPS/XDLOPS

Static code analysis alone was deemed insufficient, so runtime solver forcing on Vega 64 was attempted. Unexpected failures emerged in sequence:

ConvMlirIgemmFwd forced → MIIR_INVALID_PARAM / rc=0x7
ConvCkIgemmFwdV6r1DlopsNchw in 7 configs → all not applicable / rc=0x3
-s 1 + extreme C/K + stride variations + NHWC/NCHW (8 cases) → all not applicable
ConvHipImplicitGemmFwdXdlops → reached CompileSolution/GetInvoker but assertion abort (EXIT=134)
ConvHipImplicitGemmForwardV4R5Xdlops → xdlops kernel compile failure (missing intrin_mfma_*)

Key lesson: "Solver name exists" and "solver applies to a given problem" are completely separate. Each solver family fails at a different stage, leading to the failure mode taxonomy in §2.

A 2026-03-14 follow-up further showed that forced ConvMlirIgemmFwd can reach CompileSolution / GetInvoker and then die at Perf Db: record not found → boost::optional::get(). Forced execution therefore exposes unfinished downstream paths outside the normal guard.

議論中のコメントからの修正 — 何が指摘されたか

最も評価された点:

「dot4 不在フォールバックは gfx900 専用ではなく、capability が立たない世代全般向けの汎用レイヤ」という観察 → capability-based 設計の証拠として最も強い
「維持・管理・補充の3層」整理 → 「コードが残ってる」と「意図的に維持されている」の間を埋める枠組みとして機能

慎重にすべきと指摘された点:

仮説 D（rDNA/cDNA 分離は繋ぎ）は「最もロマン寄り」→ git 履歴で設計意図が読めてから語るべき
「CUDA への秘策をコードに忍ばせた」方向の話は仮説 E の整理通り「意図は未確定」で止めるのが正しい

新たな視点:

gfx900 が GCN であり rDNA でも cDNA でもないという事実自体が、「アーキテクチャファミリーを問わずに capability で分岐する設計になっている」ことの独立した根拠 → 仮説 B を補強

Corrections from peer review — What was flagged

Most positively received:

"dot4 fallback is not gfx900-specific but a generic layer for any generation without the capability" → strongest evidence for capability-based design
"Build / Selection / Fallback three-layer model" → effectively bridges "code remains" and "intentionally maintained"

Flagged for caution:

Hypothesis D (rDNA/cDNA split as transitional) → "most romantic" — should be deferred until git history shows design intent
Any narrative about "secret weapons hidden in code against CUDA" → Hypothesis E's framing ("intent undetermined") is the correct stopping point

New perspectives added:

The fact that gfx900 is GCN (neither rDNA nor cDNA) is itself independent evidence that the design uses capability-based branching regardless of architecture family → strengthens Hypothesis B

rocMLIR 静的結線の基準点

動的失敗シグネチャを読むための静的結線の基準点を以下に固定した:

solver.cpp: solver 登録点
fin_interface.cpp: 強制指定 ID（80/114/128）
mlir_build.cpp: MIIR_INVALID_PARAM 変換点
hipoc_program.cpp: .mlir 分岐と Code object build failed throw 点

これにより、前節の動的失敗シグネチャを3層（applicability reject / build 失敗 / runtime abort）に分けて読む前提が整った。

rocMLIR static trace baseline

Static trace baseline was established to interpret dynamic failure signatures:

solver.cpp: solver registration point
fin_interface.cpp: forced solver IDs (80/114/128)
mlir_build.cpp: MIIR_INVALID_PARAM translation point
hipoc_program.cpp: .mlir branch and Code object build failed throw point

This allowed the dynamic failure signatures to be read across three layers: applicability reject / build failure / runtime abort.

§8. 今後の調査課題

§8. Remaining Investigation Items

完了済み

Completed

✅ MIOpen PR #1328 のレビュー実データ取得（reviews: APPROVED x2、review thread comments: 0、issue comments: 2）
✅ MIOpen PR #1328 review data collected (reviews: APPROVED x2, review-thread comments: 0, issue comments: 2)
✅ MiirIsConfigApplicable の実装到達（public ROCm/rocMLIR）: miirLowerTuningParams は Applicability pipeline 実行、失敗時 MIIR_BUILD_FAILURE
✅ Reached the implementation behind MiirIsConfigApplicable (public ROCm/rocMLIR): miirLowerTuningParams runs applicability pipeline and returns MIIR_BUILD_FAILURE on failure
✅ FP32 fallback 確認（Vega 64 で ConvBinWinograd3x3U / ConvAsm1x1U / ConvHipImplicitGemmV4R1Fwd を自然選択で確認）
✅ FP32 fallback confirmed (Vega 64 naturally selected ConvBinWinograd3x3U / ConvAsm1x1U / ConvHipImplicitGemmV4R1Fwd)
✅ HSACO 逆アセンブル手順の確立
✅ HSACO disassembly procedure established
✅ v_dot4_* の有無確認手順の確立
✅ v_dot4_* presence verification procedure established
✅ git blame で gfx900 関連行の出所を確認（コミット 2407d2f、PR #1328）
✅ git blame provenance confirmed (commit 2407d2f, PR #1328)
✅ #389 の参照先が llvm-project-private（AMD private）であることを確定
✅ Confirmed #389 references llvm-project-private (AMD private repository)

未完了（優先度順）

Remaining (by priority)

🔸 公開 llvm-project で gfx900 / MLIR 関連の commit / issue を再探索
🔸 Search public llvm-project for gfx900 / MLIR related commits and issues
🔸 MIIR_BUILD_FAILURE を誘発する最小再現ケースを抽出し、MIOpen 側の失敗シグネチャ（rc=0x7）と対応づける
🔸 Extract a minimal reproducer for MIIR_BUILD_FAILURE and map it to MIOpen failure signatures (rc=0x7)
🔸 他の gfx900 関連コミットへの AMD 社員 / 外部貢献の分類拡張
🔸 Extend AMD employee vs external contributor classification to other gfx900 commits
🔸 Provenance map の作成
🔸 Create provenance map
🔹 INT8 非 naive solver の自然選択確認（現状: 全形状で ConvDirectNaiveConvFwd のみ）
🔹 INT8 non-naive solver natural selection verification (current: all shapes use ConvDirectNaiveConvFwd only)

本ページの立場: この文書は AMD の設計判断を批判するものではない。ROCm の capability-based 設計がなぜ旧世代を包含しうるかの構造分析であり、AMD エンジニアリングの設計合理性に対する技術的関心、そして何よりAMDとコミュニティの多大なる貢献へのリスペクトに基づいている。すべての仮説は反証可能な形で記載しており、新しい証拠により修正される。 Position of this document: This document is not intended as a critique of AMD’s design decisions. Rather, it is a structural analysis of how ROCm’s capability-based design can continue to encompass legacy generations, motivated by technical interest in the rationality of AMD’s engineering choices and, above all, by respect for the substantial contributions of both AMD and the community. All hypotheses are stated in a falsifiable form and remain subject to revision in light of new evidence.

gfx900 はなぜ今も動くのか— ROCm における Capability-Based 設計の構造分析

Why Does gfx900 Still Run?— Structural Analysis of Capability-Based Design in ROCm