イメージでわかる推論の流れ

Visual Guide to the Inference Flow

シリーズ: イメージでわかる

入力がモデルの中でどう計算され、最後に答えになるのかを、工場の流れのようにやさしく説明するページです。

A factory-style explanation of how input data moves through a model and becomes an output during inference.

1. 「推論」ってなに？

1. What Is "Inference"?

学習ずみの AI モデルに新しいデータを入れて、答えを出してもらう工程です。

The process of feeding new data into a trained AI model and getting an answer.

AI には大きく2つのフェーズがあります。「学習（training）」と「推論（inference）」。学習は教科書を何百回も読んで知識をつける段階。推論は、身についた知識でテストに答える段階です。

AI has two major phases: "training" and "inference." Training is like reading a textbook hundreds of times to gain knowledge. Inference is using that knowledge to answer test questions.

ChatGPT に質問を入力して返事をもらう、画像生成AIに指示を出して絵をもらう、これらはすべて推論です。

When you ask ChatGPT a question and get a reply, or tell an image-generation AI to create a picture — that's all inference.

🏭 たとえ話: 🏭 Analogy: 推論は「工場のライン」のようなもの。原材料（入力データ）を流し込むと、いくつもの加工工程を経て、完成品（出力・回答）が出てくる。 Inference is like a factory line. You feed in raw materials (input data), they pass through several processing stations, and a finished product (output / answer) comes out at the end.

2. 推論はこう進む

2. How Inference Flows

入力の準備

Prepare the Input

テキストや画像をモデルが読める数値（トークン、テンソル）に変換する。「原材料を加工しやすい形にする」工程。

Convert text or images into numbers (tokens, tensors) the model can read. Like shaping raw materials for the production line.

行列計算（rocBLAS の出番）

Matrix Multiplication (rocBLAS's Job)

入力データに重み行列をかける。モデルの各レイヤーで何度もくり返される、推論の中心的な作業。ここに時間がいちばんかかる。

Multiply the input data by weight matrices. This repeats at every layer of the model and is the most time-consuming part of inference.

畳み込み・活性化（MIOpen の出番）

Convolution & Activation (MIOpen's Job)

画像系モデルでは畳み込みでパターンを抽出。どのモデルでも「活性化関数」で中間結果を調整する。

Image-based models use convolution to extract patterns. All models use "activation functions" to refine intermediate results.

レイヤーをくり返す

Repeat Across Layers

ステップ2〜3をモデルの層の数だけくり返す。大きいモデルでは数十〜数百層。

Repeat steps 2–3 for as many layers as the model has. Large models can have tens to hundreds of layers.

出力を返す

Return the Output

最終層の結果から、テキスト（次の単語）や画像（ピクセル）に変換して返す。「完成品の出荷」。

Convert the final layer's result back into text (the next word) or an image (pixels). "Shipping the finished product."

💡 ポイント: 💡 Key point: 推論の計算時間のほとんどはステップ2（行列計算）が占めます。だから rocBLAS の速さがモデル全体の速さを大きく左右するのです。 Most of inference time is spent on Step 2 (matrix multiplication). That's why rocBLAS speed has a massive impact on overall model performance.

3. それぞれの担当まとめ

3. Who Does What

rocBLAS → ステップ2担当。行列のかけ算を超高速でやる専門家。

rocBLAS → Handles Step 2. The specialist that performs matrix multiplication at blazing speed.

MIOpen → ステップ3担当。畳み込みと活性化の達人。GPU ごとにいちばん速い方法（solver = 計算のやり方）を選んでくれる。

MIOpen → Handles Step 3. The convolution and activation expert. Picks the fastest method (solver = one way of doing the computation) for each GPU.

HIP ランタイム → 工場全体の管理者。GPU にメモリを確保したり、計算命令を送ったりする裏方。

HIP Runtime → The factory manager. Allocates GPU memory, sends computation commands — the behind-the-scenes coordinator.

gfx900 で推論するとき、ステップ2・3の中で「使える solver」「使えない solver」が分かれます。使えないときは fallback 経路、つまり予備の少し遅い道で代替されます。 When running inference on gfx900, Steps 2 and 3 may have some solvers available and others not. When unavailable, fallback paths — slower backup routes — take over.

📖 このあと読むならこれ: 📖 Read next:
イメージでわかる学習と推論のちがい — 2つのモードの比較
イメージでわかる学習の流れ — 推論の対：モデルが育つ工程
イメージでわかる線形代数 — 行列計算の中身をもっと知る
イメージでわかるたたみこみ — 畳み込みの仕組みを深掘り
イメージでわかる深層学習 — solver 一覧表つきの詳細ページ Training vs Inference — Compare the two modes side by side
Visual Guide to the Training Flow — The counterpart: how a model grows
Visual Linear Algebra — Dive deeper into matrix operations
Visual Convolution for Beginners — Understand convolution in detail
Visual Deep Learning — Detailed page with solver availability table

イメージでわかる 推論の流れ