ROCm エコシステム図解

ROCm Ecosystem Map

ROCm を構成するリポジトリ群・ライブラリスタック・ハードウェア対応の関係を、図とカードで視覚的に整理したページです。gfx900 (Vega / RX Vega) がどの位置にいるかも確認できます。

A visual map of the repositories, library stack, and hardware support relationships that make up ROCm — including where gfx900 (Vega / RX Vega) sits in the ecosystem.

1. リポジトリ構成図

1. Repository topology

ROCm は複数の GitHub リポジトリに分散して開発・配布されます。2023〜2024 年頃から TheRock や rocm-libraries などの統合リポジトリへの再編が進んでいます。

ROCm is developed and distributed across many GitHub repositories. Since roughly 2023–2024, a consolidation into umbrella repos like TheRock and rocm-libraries has been underway.

    まずは全部覚えなくて大丈夫です。 このページでは名前の多さに圧倒されやすいですが、最初は 上: アプリ、中: 計算ライブラリ、下: ランタイムとドライバ、いちばん下: GPU 本体 の4段だけ追えば十分です。
    You do not need to memorise every name here. The list can feel overwhelming. At first, just track four layers: top: apps, middle: compute libraries, below that: runtime and driver, and bottom: the GPU itself.
  

flowchart TD MANIFEST["ROCm manifest\namdgpu/rocm-release"] --> THEROCK["TheRock\nmonorepo build system"] MANIFEST --> RSYS["rocm-systems\nruntime + driver"] MANIFEST --> RLIBS["rocm-libraries\nmath + inference"] MANIFEST --> COMPILER["llvm-project\n(AMD fork)\nHIP / LLVM backend"] THEROCK --> MIOPEN["MIOpen\nDNN kernel library"] THEROCK --> ROCBLAS["rocBLAS\nBLAS for GPU"] THEROCK --> CK["composable_kernel\ntile-based kernels"] RLIBS --> MIOPEN RLIBS --> ROCBLAS RLIBS --> TENSILE["Tensile\nGEMM code-gen"] RSYS --> ROCR["ROCr\nruntime / HSA"] RSYS --> HIP_RT["HIP runtime\nhip/include"] ROCBLAS --> TENSILE MIOPEN --> CK MIOPEN --> ROCMLIR["rocMLIR\nMLIR-based iGEMM"] COMPILER --> HIP_RT HIP_RT --> ROCR ROCR --> GPU["GPU hardware\ngfx900 / 906 / 908 / ..."] style THEROCK fill:#EDE7F6,stroke:#5B4B8A style RLIBS fill:#EDE7F6,stroke:#5B4B8A style RSYS fill:#EDE7F6,stroke:#5B4B8A style MANIFEST fill:#EDE7F6,stroke:#5B4B8A style COMPILER fill:#E8F5E9,stroke:#2E7D32 style MIOPEN fill:#E3F2FD,stroke:#1565C0 style ROCBLAS fill:#E3F2FD,stroke:#1565C0 style CK fill:#E3F2FD,stroke:#1565C0 style TENSILE fill:#E3F2FD,stroke:#1565C0 style ROCMLIR fill:#E3F2FD,stroke:#1565C0 style ROCR fill:#FFF3E0,stroke:#E65100 style HIP_RT fill:#FFF3E0,stroke:#E65100 style GPU fill:#FFEBEE,stroke:#B71C1C

矢印は「依存・包含」関係を表す（ビルド依存またはサブモジュール）

Arrows indicate dependency or inclusion relationships (build dependencies or submodules)

統合リポジトリ Umbrella repo

TheRock

AMD が推進するモノリポ方式のビルドシステム。rocBLAS・MIOpen・HIP などを一括ビルドする。2023年以降に整備が本格化。

AMD's monorepo-style build system. Builds rocBLAS, MIOpen, HIP, etc. in one pass. Gained momentum post-2023.

統合リポジトリ Umbrella repo

rocm-libraries

数学ライブラリ群（rocBLAS, MIOpen, rocFFT 等）のリリース管理を統合する傘リポジトリ。

An umbrella repo consolidating release management for math libraries (rocBLAS, MIOpen, rocFFT, etc.).

推論ライブラリ Inference lib

MIOpen

AI でよく使う畳み込みや行列計算の担当です。計算のやり方の候補をいくつも持ち、GPU ごとに使える中から速そうなものを選びます。

Provides DNN kernels (convolution, GEMM, etc.) via a solver registry that selects the optimal kernel per GPU.

数学ライブラリ Math library

rocBLAS

GPU 向けの行列計算ライブラリです。大きな表どうしの掛け算（GEMM）などを担当し、Tensile が GPU 世代ごとの計算プログラムを用意します。

GPU BLAS implementation. Handles GEMM etc. Uses Tensile to generate per-generation assembly code.

コード生成 Code generation

Tensile

rocBLAS 用の「計算プログラム工場」です。GPU 世代ごとに合った行列計算カーネルを作り、rocBLAS に入ります。

GEMM kernel code-generation tool. Produces per-GPU-generation optimized assembly bundled into rocBLAS.

カーネルライブラリ Kernel library

composable_kernel (CK)

GPU 計算の部品集です。小さな計算ブロックを組み合わせる形でカーネルを作り、MIOpen などが利用します。

A tile-based GPU kernel library using generic templates. Used by MIOpen for efficient solver backends.

コンパイラ Compiler

llvm-project (AMD fork)

HIP → LLVM IR → AMD GCN/RDNA/CDNA アセンブリへの変換を担うコンパイラ基盤。

The compiler infrastructure translating HIP → LLVM IR → AMD GCN/RDNA/CDNA assembly.

ランタイム Runtime

ROCr / HIP runtime

GPU に仕事を送り出す受付係。メモリ確保、計算開始、GPU 検出などを担当し、上のソフトとドライバのあいだをつなぐ。

The reception desk that sends work to the GPU. Handles memory allocation, launching computation, and GPU detection, connecting upper software to the driver.

2. ソフトウェアスタック実行フロー

2. Software stack execution flow

ユーザーコード（PyTorch 等）から GPU 実行まで、データがどのレイヤを経由するかを示します。

Shows the layers a computation passes through from user code (e.g. PyTorch) down to GPU execution.

    この図の読み方: solver は「計算のやり方の候補」、fallback は「速い道が使えないときの予備ルート」です。HSA は、GPU に仕事を順番に渡すための土台、くらいの理解で十分です。
    How to read this diagram: a solver is one candidate way to do the computation, a fallback is the backup route when the fast path is unavailable, and HSA can be thought of simply as the underlying system for queuing work to the GPU.
  

flowchart TD USER["User code\ne.g. PyTorch / TensorFlow"] USER --> FW["Framework layer\nPyTorch ATen / TF XLA"] FW --> MIOPEN_SYM["MIOpen API\nconv / GEMM / pooling"] FW --> ROCBLAS_SYM["rocBLAS API\nSGEMM / DGEMM"] MIOPEN_SYM --> SOLVER["Solver registry\nIsApplicable() per GPU"] SOLVER --> MLIR_PATH["rocMLIR path\nMLIR iGEMM"] SOLVER --> ASM_PATH["ASM path\nWinograd / v4r1"] SOLVER --> NAIVE_PATH["Naive fallback\nfor unsupported GPUs"] ROCBLAS_SYM --> TENSILE_RT["Tensile kernel\nper-GPU HSACO"] ROCBLAS_SYM --> CK_RT["composable_kernel\ntile backend"] MLIR_PATH --> HIP_API["HIP API\nhipLaunchKernel"] ASM_PATH --> HIP_API NAIVE_PATH --> HIP_API TENSILE_RT --> HIP_API CK_RT --> HIP_API HIP_API --> ROCR_RT["ROCr runtime\nHSA queue / signals"] ROCR_RT --> DRIVER["amdgpu kernel driver"] DRIVER --> GPU_HW["GPU hardware"] style USER fill:#E8F5E9,stroke:#2E7D32 style FW fill:#E8F5E9,stroke:#2E7D32 style MIOPEN_SYM fill:#E3F2FD,stroke:#1565C0 style ROCBLAS_SYM fill:#E3F2FD,stroke:#1565C0 style SOLVER fill:#EDE7F6,stroke:#5B4B8A style MLIR_PATH fill:#FFEBEE,stroke:#B71C1C style ASM_PATH fill:#E3F2FD,stroke:#1565C0 style NAIVE_PATH fill:#FFF3E0,stroke:#E65100 style TENSILE_RT fill:#E3F2FD,stroke:#1565C0 style CK_RT fill:#E3F2FD,stroke:#1565C0 style HIP_API fill:#FFF3E0,stroke:#E65100 style ROCR_RT fill:#FFF3E0,stroke:#E65100 style DRIVER fill:#FFEBEE,stroke:#B71C1C style GPU_HW fill:#FFEBEE,stroke:#B71C1C

まず「この GPU で使えるやり方」を確かめて、その中から計算ルートを選ぶ

The solver registry checks GPU capabilities and selects the optimal kernel path

3. gfx900 のエコシステム上の位置

3. Where gfx900 sits in the ecosystem

gfx900 (Vega / RX Vega / Radeon VII 相当世代) は、ROCm スタックの各レイヤで異なる扱いを受けます。

gfx900 (the Vega / RX Vega generation) is handled differently at each layer of the ROCm stack.

    ポイント： gfx900 は公式サポート外の GPU ですが、コードベース上では能力ベースのフォールバック設計により多くの経路が残っています。突然すべて消えるというより、使える道が少しずつ減っていく形です。
    Key point: gfx900 is not "officially supported," yet capability-based fallback design in the codebase keeps many paths alive. Rather than a hard removal, support has faded by gradually losing newer and faster routes.
  

%%{init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#E3F2FD', 'primaryBorderColor': '#1565C0' } } }%% flowchart LR G_HIP["HIP / compile\ngfx900 target"] G_ROCR["ROCr runtime\nkernel launch"] G_MIOPEN["MIOpen solvers\ncapability check"] G_ROCBLAS["rocBLAS\ngeneric GEMM"] G_PERF["Perf DB\n169K entries"] G_HIP --> OK1["Compile OK"] G_ROCR --> OK2["Runtime OK"] G_MIOPEN --> BLOCKED["MLIR iGEMM\nBLOCKED 2407d2f"] G_MIOPEN --> OK3["ASM v4r1/Winograd\npass"] G_MIOPEN --> PARTIAL["INT8 naive\nfallback only"] G_ROCBLAS --> OK4["GEMM OK 128 files"] G_PERF --> OK5["Perf DB OK ROCm 7.2"] style BLOCKED fill:#FFEBEE,stroke:#B71C1C,color:#B71C1C style PARTIAL fill:#FFF3E0,stroke:#E65100,color:#E65100 style OK1 fill:#E8F5E9,stroke:#2E7D32,color:#2E7D32 style OK2 fill:#E8F5E9,stroke:#2E7D32,color:#2E7D32 style OK3 fill:#E8F5E9,stroke:#2E7D32,color:#2E7D32 style OK4 fill:#E8F5E9,stroke:#2E7D32,color:#2E7D32 style OK5 fill:#E8F5E9,stroke:#2E7D32,color:#2E7D32 style GFX900 fill:#E3F2FD,stroke:#1565C0

緑 = 機能する、赤 = 除外済み、橙 = 部分対応

Green = working, Red = explicitly excluded, Orange = partial/fallback only

レイヤ	コンポーネント	gfx900 状況	根拠	Layer	Component	gfx900 status	Evidence
コンパイラ	LLVM / HIP	✓ 動作	gfx900 アーキ定義が残存	Compiler	LLVM / HIP	✓ Works	gfx900 arch definition present
ランタイム	ROCr / HIP RT	✓ 動作	HSA キュー機構は世代共通	Runtime	ROCr / HIP RT	✓ Works	HSA queue mechanism is arch-agnostic
畳み込み	MIOpen MLIR iGEMM	✗ 除外	commit 2407d2f で明示除外	Convolution	MIOpen MLIR iGEMM	✗ Excluded	Explicitly disabled in commit 2407d2f
畳み込み	MIOpen ASM v4r1 / Winograd	✓ 動作	IsApplicable() を通過	Convolution	MIOpen ASM v4r1 / Winograd	✓ Works	Passes IsApplicable() check
畳み込み	MIOpen INT8	△ naive のみ	dot4 命令なし → naive fallback	Convolution	MIOpen INT8	△ naive only	No dot4 instruction → naive fallback
行列演算	rocBLAS GEMM	✓ 動作	gfx900 用ファイル 128 本	Matrix ops	rocBLAS GEMM	✓ Works	128 gfx900-specific files present
性能DB	Perf DB	✓ 収録	ROCm 7.2 で 169,182 エントリ	Perf tuning	Perf DB	✓ Included	169,182 entries in ROCm 7.2
xdlops 行列演算	rocBLAS / MIOpen xdlops	✗ 非対応	xdlops は gfx908 以降の命令	xdlops matrix	rocBLAS / MIOpen xdlops	✗ N/A	xdlops instruction requires gfx908+

4. GPU 世代と ROCm サポート範囲

4. GPU generations and ROCm support scope

ROCm が対象とする GPU 世代を、命令セットの追加と共に整理します。世代ごとに使えるカーネルパスが変わります。

A map of GPU generations ROCm targets, alongside the instruction set additions that unlock new kernel paths.

%%{init: { 'theme': 'base' } }%% timeline title GPU generations and key instruction sets section GCN era (legacy) gfx900 (Vega10) : 2017 — no xdlops, no dot4 : ASM / Winograd paths only gfx906 (Vega20) : 2018 — adds dot4 (INT8) : Radeon VII / Instinct MI50 section CDNA era (data center) gfx908 (Arcturus): 2020 — adds xdlops matrix : Instinct MI100 gfx90a (Aldebaran): 2021 — xdlops + FP64 matrix : Instinct MI200 gfx942 (Antares) : 2024 — xdlops gen4 : Instinct MI300 section RDNA era (gaming / consumer) gfx1030 (Navi21) : 2020 — new ISA branch : Radeon RX 6000 series gfx1100 (Navi31) : 2022 — RDNA 3 : Radeon RX 7000 series

gfx900 は GCN 時代の最初期世代。xdlops も dot4 も持たないため、後続世代向けの高速パスは使えない。

gfx900 is the earliest GCN-era target. Without xdlops or dot4, it cannot reach the fast paths built for later generations.

5. gfx900 が「消えない」仕組み

5. Why gfx900 has not disappeared

公式サポート終了後も gfx900 向けコードが残り続けるのは、ROCm の capability-based 設計 と 削除コスト の組み合わせによります。

Code for gfx900 persists after official support ends due to a combination of ROCm's capability-based design and the cost of active deletion.

flowchart LR CB["Capability-based design\nIsApplicable() runtime check\nno hard blocklist"] COST["Deletion cost\nbreaks fallback chains\nregression risk"] COMM["Community maintenance\nWinograd / ASM v4r1\nauthors active"] PERF_DB_NODE["Perf DB entries\nshipped per release\nno active removal"] CB & COST & COMM & PERF_DB_NODE --> RESULT["gfx900 keeps working\nvia fallback paths"] style CB fill:#E3F2FD,stroke:#1565C0 style COST fill:#FFF3E0,stroke:#E65100 style COMM fill:#E8F5E9,stroke:#2E7D32 style PERF_DB_NODE fill:#EDE7F6,stroke:#5B4B8A style RESULT fill:#FFEBEE,stroke:#B71C1C style WHY fill:#F5F8FB,stroke:#D8E0E8

4 つの要因が重なって、gfx900 は「全部は残らないが、使える道は残る」状態になっている

Four overlapping factors keep gfx900 in a "staged retreat but still functional" state

    まとめ： ROCm では、「公式サポート」と「実際に動くか」がいつも同じではありません。gfx900 は公式サポート外ですが、能力ベースの分岐とフォールバックの仕組みにより、コードベース上では今も動く道が残っています。ROCm 的には、「GPU 名」より「その GPU に何ができるか」を見るのが大事です。
    Summary: ROCm's ecosystem is designed such that "officially supported" and "actually works" do not always align. gfx900 is outside official support, yet capability-based fallback and the cost of active removal keep functional paths alive in the codebase. This is not neglect — it is a side-effect of the design principles.
  

vega-hbmx-pages — ROCm エコシステム図解 | 2026-03-15 vega-hbmx-pages — ROCm Ecosystem Map | 2026-03-15