Azure OpenAI の推論モデル

2025-06-20

Azure OpenAI の o-series モデルは、より焦点が合った、優れた能力で推論と問題解決のタスクに取り組めるように設計されています。これらのモデルは、ユーザーの要求の処理と理解により多くの時間を費やし、これまでのイテレーションと比較して、科学、コーディング、数学などの分野で非常に強力になっています。

o シリーズモデルの主な機能:

複雑なコード生成: 開発者をサポートするための、アルゴリズム生成と、高度なコーディングタスクの処理の機能。
高度な問題解決: 包括的なブレーンストーミングセッションや多面的な課題への対処に最適。
複雑なドキュメント比較: 契約、ケースファイル、法的ドキュメントなどを分析して微妙な違いを特定するのに最適。
命令のフォローとワークフロー管理: 短いコンテキストを必要とするワークフローの管理に特に効果的。

可用性

利用可能なリージョン

モデル	リージョン	制限付きアクセス
`o3-pro`	米国東部 2 およびスウェーデン中部 (グローバル標準)	アクセスの要求: o3 制限付きアクセスモデルアプリケーション。既に `o3 access` している場合は、 `o3-pro`の要求は必要ありません。
`codex-mini`	米国東部 2 およびスウェーデン中部 (グローバル標準)	アクセス要求は必要ありません。
`o4-mini`	モデルの可用性	このモデルのコア機能を使用するためにアクセス要求は必要ありません。アクセス権の要求: o4-mini 要約推論機能
`o3`	モデルの可用性	アクセスの要求: o3 制限付きアクセスモデルアプリケーション
`o3-mini`	モデルの可用性。	このモデルでは、アクセスに制限がなくなりました。
`o1`	モデルの可用性。	このモデルでは、アクセスに制限がなくなりました。
`o1-preview`	モデルの可用性。	このモデルを使用できるのは、元の制限付きアクセスリリースの一部としてアクセス権を付与されたお客様に限られます。現在、`o1-preview` へのアクセスの拡張は行っていません。
`o1-mini`	モデルの可用性。	Global Standard のデプロイの場合、アクセス要求は必要ありません。現在、Standard (リージョン) のデプロイは、`o1-preview` リリースの一部として以前にアクセス権を付与されたお客様のみが利用できます。

API と機能のサポート

機能	codex-mini、 2025-05-16	o3-pro、 2025-06-10	o4-mini、 2025-04-16	o3、 2025-04-16	o3-mini、2025-01-31	o1、2024-12-17	o1-preview、2024-09-12	o1-mini、2024-09-12
API バージョン	`2025-04-01-preview`> v1 プレビュー	`2025-04-01-preview`> v1 プレビュー	`2025-04-01-preview`	`2025-04-01-preview`	`2024-12-01-preview` 以降 `2025-03-01-preview` (推奨)	`2024-12-01-preview` 以降 `2025-03-01-preview` (推奨)	`2024-09-01-preview` 以降 `2025-03-01-preview` (推奨)	`2024-09-01-preview` 以降 `2025-03-01-preview` (推奨)
開発者メッセージ	✅	✅	✅	✅	✅	✅	-	-
構造化出力	✅	✅	✅	✅	✅	✅	-	-
コンテキストウィンドウ	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 128,000 出力: 32,768	入力: 128,000 出力: 65,536
推論作業	✅	✅	✅	✅	✅	✅	-	-
画像入力	✅	✅	✅	✅	-	✅	-	-
チャット入力候補 API	-	-	✅	✅	✅	✅	✅	✅
レスポンス API	✅	✅	✅	✅	-	-	-	-
機能/ツール	✅	✅	✅	✅	✅	✅	-	-
並列ツール呼び出し	-	-	-	-	-	-	-	-
`max_completion_tokens`¹	✅	✅	✅	✅	✅	✅	✅	✅
システムメッセージ ²	✅	✅	✅	✅	✅	✅	-	-
推論の概要³	✅	-	✅	✅	-	-	-	-
ストリーミング ⁴	✅	-	✅	✅	✅	-	-	-

¹ 推論モデルは、 max_completion_tokens パラメーターでのみ機能します。

² 最新のo^* シリーズモデルは、移行を容易にするためにシステムメッセージをサポートしています。 o4-mini、o3、o3-mini、およびo1でシステムメッセージを使用すると、開発者メッセージとして扱われます。開発者メッセージとシステムメッセージの両方を同じ API 要求で使用しないでください。 ³ 考え方の連鎖的な推論の概要へのアクセスは、 o3 と o4-miniに対してのみ制限されます。 ⁴o3 のストリーミングは、制限付きアクセスのみです。

注

タイムアウトを回避するには、バックグラウンドモードを o3-proすることをお勧めします。
o3-pro は現在、イメージの生成をサポートしていません。

サポートされていません

現状、推論モデルにおいて、以下のものはサポート対象外です。

temperature、top_p、presence_penalty、frequency_penalty、logprobs、top_logprobs、logit_bias、max_tokens

使用法

これらのモデルでは現在、Chat Completions API を使用する他のモデルと同じパラメーターセットはサポートされていません。

最新のパラメーターにアクセスするには、OpenAI クライアントライブラリをアップグレードする必要があります。

pip install openai --upgrade

認証に Microsoft Entra ID を初めて使用する場合は、「Microsoft Entra ID 認証を使用して Azure AI Foundry Models で Azure OpenAI を構成する方法」を参照してください。

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  azure_ad_token_provider=token_provider,
  api_version="2025-03-01-preview"
)

response = client.chat.completions.create(
    model="o1-new", # replace with the model deployment name of your o1-preview, or o1-mini model
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

max_completion_tokens などの新しいパラメーターを利用するには、OpenAI Python ライブラリのバージョンのアップグレードが必要な場合があります。

pip install openai --upgrade


from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version="2025-03-01-preview"
)

response = client.chat.completions.create(
    model="o1-new", # replace with the model deployment name of your o1 deployment.
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

using Azure.AI.OpenAI;
using Azure.AI.OpenAI.Chat;
using Azure.Identity;
using OpenAI.Chat;

AzureOpenAIClient openAIClient = new(
    new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/"),
    new DefaultAzureCredential());
ChatClient chatClient = openAIClient.GetChatClient("o3-mini"); //model deployment name

ChatCompletionOptions options = new ChatCompletionOptions
{
    MaxOutputTokenCount = 100000
};

#pragma warning disable AOAI001 //currently required to use MaxOutputTokenCount

options.SetNewMaxCompletionTokensPropertyEnabled(true);

ChatCompletion completion = chatClient.CompleteChat(
    [

        new UserChatMessage("Testing 1,2,3")
    ],
    options); // Pass the options to the CompleteChat method

Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");

Python 出力:

{
  "id": "chatcmpl-AEj7pKFoiTqDPHuxOcirA9KIvf3yz",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Writing your first Python API is an exciting step in developing software that can communicate with other applications. An API (Application Programming Interface) allows different software systems to interact with each other, enabling data exchange and functionality sharing. Here are the steps you should consider when creating your first Python API...truncated for brevity.",
        "refusal": null,
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1728073417,
  "model": "o1-2024-12-17",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_503a95a7d8",
  "usage": {
    "completion_tokens": 1843,
    "prompt_tokens": 20,
    "total_tokens": 1863,
    "completion_tokens_details": {
      "audio_tokens": null,
      "reasoning_tokens": 448
    },
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 0
    }
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "custom_blocklists": {
          "filtered": false
        },
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}

推論作業

注

推論モデルでは、モデル応答の reasoning_tokens の一部として completion_tokens_details があります。これらは、メッセージ応答コンテンツの一部としては返されない隠されたトークンですが、利用者の要求に対する最終的な回答を生成するためにモデルによって使用されます。 2024-12-01-preview には、最新の reasoning_effort モデルで low、medium、または high に設定できる新しいパラメーター o1 が追加されています。 effort 設定が高いほど、モデルが要求の処理に費やす時間が長くなり、一般に reasoning_tokens の数が多くなります。

開発者メッセージ

機能上、開発者メッセージ "role": "developer" はシステムメッセージと同じです。

前述のコード例に開発者メッセージを追加すると、次のようになります。

最新のパラメーターにアクセスするには、OpenAI クライアントライブラリをアップグレードする必要があります。

pip install openai --upgrade

認証に Microsoft Entra ID を初めて使用する場合は、「 Microsoft Entra ID 認証を使用して Azure OpenAI を構成する方法」を参照してください。

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  azure_ad_token_provider=token_provider,
  api_version="2025-03-01-preview"
)

response = client.chat.completions.create(
    model="o1-new", # replace with the model deployment name of your o1-preview, or o1-mini model
    messages=[
        {"role": "developer","content": "You are a helpful assistant."}, # optional equivalent to a system message for reasoning models 
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000,
    reasoning_effort = "medium" # low, medium, or high

)

print(response.model_dump_json(indent=2))

max_completion_tokens などの新しいパラメーターを利用するには、OpenAI Python ライブラリのバージョンのアップグレードが必要な場合があります。

pip install openai --upgrade


from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version="2025-03-01-preview"
)

response = client.chat.completions.create(
    model="o1-new", # replace with the model deployment name of your o1 deployment.
    messages=[
        {"role": "developer","content": "You are a helpful assistant."}, # optional equivalent to a system message for reasoning models 
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000,
    reasoning_effort = "medium" # low, medium, or high
)

print(response.model_dump_json(indent=2))

using Azure.AI.OpenAI;
using Azure.AI.OpenAI.Chat;
using Azure.Identity;
using OpenAI.Chat;

AzureOpenAIClient openAIClient = new(
    new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/"),
    new DefaultAzureCredential());
ChatClient chatClient = openAIClient.GetChatClient("o3-mini"); //model deployment name

ChatCompletionOptions options = new ChatCompletionOptions
{
    ReasoningEffortLevel = ChatReasoningEffortLevel.Low,
    MaxOutputTokenCount = 100000
};

#pragma warning disable AOAI001 //currently required to use MaxOutputTokenCount

options.SetNewMaxCompletionTokensPropertyEnabled(true);

ChatCompletion completion = chatClient.CompleteChat(
    [
        new DeveloperChatMessage("You are a helpful assistant."),
        new UserChatMessage("Testing 1,2,3")
    ],
    options); // Pass the options to the CompleteChat method

Console.WriteLine($"{completion.Role}: {completion.Content[0].Text}");

推論の概要

o3 で最新のo4-miniモデルとモデルを使用する場合は、推論の概要パラメーターを使用して、モデルの思考推論チェーンの概要を受け取ることができます。このパラメーターは、 auto、 concise、または detailedに設定できます。この機能にアクセスするには、アクセスを要求する必要があります。

注

有効になっている場合でも、すべてのステップ/要求に対して推論の概要が生成されるわけではありません。これは正しい動作です。

パイソン
レスト

最新のパラメーターにアクセスするには、OpenAI クライアントライブラリをアップグレードする必要があります。

pip install openai --upgrade

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  azure_ad_token_provider=token_provider,
  api_version="preview"
)

response = client.responses.create(
    input="Tell me about the curious case of neural text degeneration",
    model="o4-mini", # replace with model deployment name
    reasoning={
        "effort": "medium",
        "summary": "detailed" # auto, concise, or detailed (currently only supported with o4-mini and o3)
    }
)

print(response.model_dump_json(indent=2))

curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses?api-version=preview" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
 -d '{
     "model": "o4-mini",
     "input": "Tell me about the curious case of neural text degeneration",
     "reasoning": {"summary": "detailed"}
    }'

{
  "id": "resp_68007e26b2cc8190b83361014f3a78c50ae9b88522c3ad24",
  "created_at": 1744862758.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "o4-mini",
  "object": "response",
  "output": [
    {
      "id": "rs_68007e2773bc8190b5b8089949bfe13a0ae9b88522c3ad24",
      "summary": [
        {
          "text": "**Summarizing neural text degeneration**\n\nThe user's asking about \"The Curious Case of Neural Text Degeneration,\" a paper by Ari Holtzman et al. from 2020. It explains how certain decoding strategies produce repetitive and dull text. In contrast, methods like nucleus sampling yield more coherent and diverse outputs. The authors introduce metrics like surprisal and distinct-n for evaluation and suggest that maximum likelihood decoding often favors generic continuations, leading to loops and repetitive patterns in longer texts. They promote sampling from truncated distributions for improved text quality.",
          "type": "summary_text"
        },
        {
          "text": "**Explaining nucleus sampling**\n\nThe authors propose nucleus sampling, which captures a specified mass of the predictive distribution, improving metrics such as coherence and diversity. They identify a \"sudden drop\" phenomenon in token probabilities, where a few tokens dominate, leading to a long tail. By truncating this at a cumulative probability threshold, they aim to enhance text quality compared to top-k sampling. Their evaluations include human assessments, showing better results in terms of BLEU scores and distinct-n measures. Overall, they highlight how decoding strategies influence quality and recommend adaptive techniques for improved outcomes.",
          "type": "summary_text"
        }
      ],
      "type": "reasoning",
      "status": null
    },
    {
      "id": "msg_68007e35c44881908cb4651b8e9972300ae9b88522c3ad24",
      "content": [
        {
          "annotations": [],
          "text": "Researchers first became aware that neural language models, when used to generate long stretches of text with standard “maximum‐likelihood” decoding (greedy search, beam search, etc.), often produce bland, repetitive or looping output. The 2020 paper “The Curious Case of Neural Text Degeneration” (Holtzman et al.) analyzes this failure mode and proposes a simple fix—nucleus (top‑p) sampling—that dramatically improves output quality.\n\n1. The Problem: Degeneration  \n   • With greedy or beam search, models tend to pick very high‑probability tokens over and over, leading to loops (“the the the…”) or generic, dull continuations.  \n   • Even sampling with a fixed top‑k (e.g. always sample from the 40 most likely tokens) can be suboptimal: if the model’s probability mass is skewed, k may be too small (overly repetitive) or too large (introducing incoherence).\n\n2. Why It Happens: Distributional Peakedness  \n   • At each time step the model’s predicted next‐token distribution often has one or two very high‑probability tokens, then a long tail of low‑probability tokens.  \n   • Maximum‐likelihood decoding zeroes in on the peak, collapsing diversity.  \n   • Uniform sampling over a large k allows low‑probability “wild” tokens, harming coherence.\n\n3. The Fix: Nucleus (Top‑p) Sampling  \n   • Rather than fixing k, dynamically truncate the distribution to the smallest set of tokens whose cumulative probability ≥ p (e.g. p=0.9).  \n   • Then renormalize and sample from that “nucleus.”  \n   • This keeps only the “plausible” mass and discards the improbable tail, adapting to each context.\n\n4. Empirical Findings  \n   • Automatic metrics (distinct‑n, repetition rates) and human evaluations show nucleus sampling yields more diverse, coherent, on‑topic text than greedy/beam or fixed top‑k.  \n   • It also outperforms simple temperature scaling (raising logits to 1/T) because it adapts to changes in the distribution’s shape.\n\n5. Takeaways for Practitioners  \n   • Don’t default to beam search for open-ended generation—its high likelihood doesn’t mean high quality.  \n   • Use nucleus sampling (p between 0.8 and 0.95) for a balance of diversity and coherence.  \n   • Monitor repetition and distinct‑n scores if you need automatic sanity checks.\n\nIn short, “neural text degeneration” is the tendency of likelihood‐maximizing decoders to produce dull or looping text. By recognizing that the shape of the model’s probability distribution varies wildly from step to step, nucleus sampling provides an elegant, adaptive way to maintain both coherence and diversity in generated text.",
          "type": "output_text"
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "max_output_tokens": null,
  "previous_response_id": null,
  "reasoning": {
    "effort": "medium",
    "generate_summary": null,
    "summary": "detailed"
  },
  "status": "completed",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "truncation": "disabled",
  "usage": {
    "input_tokens": 16,
    "output_tokens": 974,
    "output_tokens_details": {
      "reasoning_tokens": 384
    },
    "total_tokens": 990,
    "input_tokens_details": {
      "cached_tokens": 0
    }
  },
  "user": null,
  "store": true
}

Markdown の出力

既定では、o3-mini および o1 モデルによって、Markdown 書式設定を含む出力の生成は試みられません。この動作が望ましくない一般的なユースケースとしては、Markdown コードブロック内に含まれるコードを当該モデルを使用して出力する場合が挙げられます。モデルにおいて Markdown 書式設定を使用しないで出力を生成すると、構文の強調表示や、対話型プレイグラウンドエクスペリエンスでのコピー可能なコードブロックなどの機能が失われます。この新しい既定の動作をオーバーライドし、モデルの応答に Markdown を含めることを推奨するには、開発者メッセージの先頭に文字列 Formatting re-enabled を追加します。

開発者メッセージの先頭に Formatting re-enabled を追加しても、モデルの応答に Markdown 書式設定が含まれることが保証されるわけではなく、含まれる可能性が高くなるに過ぎません。内部テストの結果、Formatting re-enabled 自体の効果は o1 モデルを使用した場合よりも o3-mini モデルを使用した場合の方が低いことが判明しました。

Formatting re-enabled のパフォーマンスを向上させるには、開発者メッセージの先頭をさらに拡張します。すると、多くの場合、目的の出力が生成されるようになります。開発者メッセージの先頭に Formatting re-enabled を追加するだけでなく、次の例のように、よりわかりやすい初期命令を追加して実験することもできます。

Formatting re-enabled - please enclose code blocks with appropriate markdown tags.
Formatting re-enabled - code output should be wrapped in markdown.

期待する出力によっては、最初の開発者メッセージをさらにカスタマイズして、特定のユースケースをターゲットにすることが必要となる場合があります。

次の方法で共有

Azure OpenAI の推論モデル

可用性

利用可能なリージョン

API と機能のサポート

サポートされていません

使用法

推論作業

開発者メッセージ

推論の概要

Markdown の出力

フィードバック

その他のリソース