Azure OpenAI の推論モデル

Azure OpenAI 推論モデルは、集中力と能力を高め、推論と問題解決のタスクに取り組むために設計されています。これらのモデルでは、ユーザーの要求の処理と理解に多くの時間が費やされ、以前のイテレーションと比較して、科学、コーディング、数学などの分野で非常に強力になります。

推論モデルの主な機能:

複雑なコード生成: 開発者をサポートするための、アルゴリズム生成と、高度なコーディングタスクの処理の機能。
高度な問題解決: 包括的なブレーンストーミングセッションや多面的な課題への対処に最適。
複雑なドキュメント比較: 契約、ケースファイル、法的ドキュメントなどを分析して微妙な違いを特定するのに最適。
命令のフォローとワークフロー管理: 短いコンテキストを必要とするワークフローの管理に特に効果的。

可用性

リージョンの可用性

モデル	リージョン	制限付きアクセス
`gpt-5-pro`	米国東部 2 およびスウェーデン中部 (グローバル標準)	アクセスの要求: 制限付きアクセスモデルアプリケーション。制限付きアクセスモデルに既にアクセスできる場合は、要求は必要ありません。
`gpt-5-codex`	米国東部 2 およびスウェーデン中部 (グローバル標準)	アクセスの要求: 制限付きアクセスモデルアプリケーション。制限付きアクセスモデルに既にアクセスできる場合は、要求は必要ありません。
`gpt-5`	モデルの可用性	アクセスの要求: 制限付きアクセスモデルアプリケーション。制限付きアクセスモデルに既にアクセスできる場合は、要求は必要ありません。
`gpt-5-mini`	モデルの可用性	アクセス要求は必要ありません。
`gpt-5-nano`	モデルの可用性	アクセス要求は必要ありません。
`o3-pro`	米国東部 2 およびスウェーデン中部 (グローバル標準)	アクセスの要求: 制限付きアクセスモデルアプリケーション。制限付きアクセスモデルに既にアクセスできる場合は、要求は必要ありません。
`codex-mini`	米国東部 2 およびスウェーデン中部 (グローバル標準)	アクセス要求は必要ありません。
`o4-mini`	モデルの可用性	このモデルのコア機能を使用するためにアクセス要求は必要ありません。アクセス権の要求: o4-mini 要約推論機能
`o3`	モデルの可用性	アクセスの要求: 制限付きアクセスモデルアプリケーション
`o3-mini`	モデルの可用性。	このモデルでは、アクセスに制限がなくなりました。
`o1`	モデルの可用性。	このモデルでは、アクセスに制限がなくなりました。
`o1-mini`	モデルの可用性。	Global Standard のデプロイの場合、アクセス要求は必要ありません。現在、Standard (リージョン) のデプロイは、`o1-preview` リリースの一部として以前にアクセス権を付与されたお客様のみが利用できます。

特徴	gpt-5-pro、 2025-10-06	gpt-5-codex, 2025-09-011	gpt-5、 2025-08-07	gpt-5-mini、 2025-08-07	gpt-5-nano、 2025-08-07
API バージョン	v1	v1	v1	v1	v1
開発者メッセージ	✅	✅	✅	✅	✅
構造化出力	✅	✅	✅	✅	✅
コンテキストウィンドウ	400,000 入力: 272,000 出力: 128,000	400,000 入力: 272,000 出力: 128,000	400,000 入力: 272,000 出力: 128,000	400,000 入力: 272,000 出力: 128,000	400,000 入力: 272,000 出力: 128,000
推論作業	- ⁴	✅	✅	✅	✅
画像入力	✅	✅	✅	✅	✅
チャット入力候補 API	-	-	✅	✅	✅
レスポンス API	✅	✅	✅	✅	✅
機能/ツール	✅	✅	✅	✅	✅
並列ツール呼び出し¹	-	✅	✅	✅	✅
`max_completion_tokens` ²	-	-	✅	✅	✅
システムメッセージ ³	✅	✅	✅	✅	✅
推論概要	✅	✅	✅	✅	✅
ストリーミング	-	✅	✅	✅	✅

1 つの並列ツールの呼び出しは、が reasoning_effort に設定されている場合サポートされません。

² 理由モデルは、Chat Completions API を使用する場合にのみ、 max_completion_tokens パラメーターで動作します。 Responses API で max_output_tokens を使用します。

³ 最新の推論モデルでは、移行を容易にするシステムメッセージがサポートされています。開発者メッセージとシステムメッセージの両方を同じ API 要求で使用しないでください。

⁴gpt-5-pro は reasoning_efforthighのみをサポートします。これは、モデルに明示的に渡されない場合でも既定値です。

新しい GPT-5 推論機能

特徴	説明
`reasoning_effort`	`minimal` GPT-5 シリーズ推論モデルでサポートされるようになりました^* オプション: `minimal`、 `low`、 `medium`、 `high`
`verbosity`	モデルの出力の簡潔さをきめ細かく制御できる新しいパラメーター。 Options:`low`、 `medium`、 `high`。
`preamble`	GPT-5 シリーズ推論モデルには、関数/ツール呼び出しを実行する前に、追加の時間を "思考" する機能があります。この計画が行われると、モデルは、 `preamble` オブジェクトと呼ばれる新しいオブジェクトを介して、モデル応答の計画手順に関する分析情報を提供できます。モデル応答でプリアンブルの生成は保証されませんが、`instructions` パラメーターを使用して、"各関数呼び出しの前に広範囲に計画する必要があります。"のような内容を渡すことで、モデルを促進することができます。関数を呼び出す前に、常にプランをユーザーに出力します"
許可されているツール	`tool_choice`では、1 つではなく、複数のツールを指定できます。
カスタムツールの種類	生テキスト (json 以外) の出力を有効にします
`lark_tool`	モデル応答のより柔軟な制約のために、Python ラークの機能の一部を使用できます

^* gpt-5-codex では、最小限の reasoning_effort はサポートされません。

詳細については、OpenAI の GPT-5 プロンプトクックブックガイドとその GPT-5 機能ガイドも読むことをお勧めします。

特徴	codex-mini、 2025-05-16	o3-pro、 2025-06-10	o4-mini、 2025-04-16	o3、 2025-04-16	o3-mini、2025-01-31	o1、2024-12-17	o1-mini、2024-09-12
API バージョン	`2025-04-01-preview` > v1	`2025-04-01-preview` > v1	`2025-04-01-preview` > v1	`2025-04-01-preview` > v1	`2025-04-01-preview` > v1 プレビュー	`2025-04-01-preview` > v1 プレビュー	`2025-04-01-preview` > v1 プレビュー
開発者メッセージ	✅	✅	✅	✅	✅	✅	-
構造化出力	✅	✅	✅	✅	✅	✅	-
コンテキストウィンドウ	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 200,000 出力: 100,000	入力: 128,000 出力: 65,536
推論作業	✅	✅	✅	✅	✅	✅	-
画像入力	✅	✅	✅	✅	-	✅	-
チャット入力候補 API	-	-	✅	✅	✅	✅	✅
レスポンス API	✅	✅	✅	✅	✅	✅	-
機能/ツール	✅	✅	✅	✅	✅	✅	-
並列ツール呼び出し	-	-	-	-	-	-	-
`max_completion_tokens` ¹	✅	✅	✅	✅	✅	✅	✅
システムメッセージ ²	✅	✅	✅	✅	✅	✅	-
推論概要	✅	-	✅	✅	-	-	-
ストリーミング ³	✅	-	✅	✅	✅	-	-

¹ 理由モデルは、Chat Completions API を使用する場合にのみ、 max_completion_tokens パラメーターで動作します。 Responses API で max_output_tokens を使用します。

² 最新のo^* シリーズモデルは、移行を容易にするためにシステムメッセージをサポートしています。 o4-mini、o3、o3-mini、およびo1でシステムメッセージを使用すると、開発者メッセージとして扱われます。開発者メッセージとシステムメッセージの両方を同じ API 要求で使用しないでください。 ³o3 のストリーミングは、制限付きアクセスのみとなります。

注

タイムアウトを回避するには、バックグラウンドモードを o3-proすることをお勧めします。
o3-pro は現在、イメージの生成をサポートしていません。

サポートされていません

現状、推論モデルにおいて、以下のものはサポート対象外です。

temperature、 top_p、 presence_penalty、 frequency_penalty、 logprobs、 top_logprobs、 logit_bias、 max_tokens

使用方法

これらのモデルでは現在、Chat Completions API を使用する他のモデルと同じパラメーターセットはサポートされていません。

max_completion_tokens などの新しいパラメーターを利用するには、OpenAI Python ライブラリのバージョンのアップグレードが必要な場合があります。

pip install openai --upgrade

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)

response = client.chat.completions.create(
    model="gpt-5-mini", # replace with the model deployment name of your o1 deployment.
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

最新のパラメーターにアクセスするには、OpenAI クライアントライブラリをアップグレードする必要があります。

pip install openai --upgrade

認証に Microsoft Entra ID を初めて使用する場合は、「Microsoft Entra ID 認証を使用して Azure AI Foundry Models で Azure OpenAI を構成する方法」を参照してください。

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)

response = client.chat.completions.create(
    model="o1-new", # replace with your model deployment name 
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default");

ChatClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {

        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

ChatCompletionOptions options = new ChatCompletionOptions
{
    MaxOutputTokenCount = 100000
};

ChatCompletion completion = client.CompleteChat(
         new DeveloperChatMessage("You are a helpful assistant"),
         new UserChatMessage("Tell me about the bitter lesson")
    );

Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");

Python 出力:

{
  "id": "chatcmpl-AEj7pKFoiTqDPHuxOcirA9KIvf3yz",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Writing your first Python API is an exciting step in developing software that can communicate with other applications. An API (Application Programming Interface) allows different software systems to interact with each other, enabling data exchange and functionality sharing. Here are the steps you should consider when creating your first Python API...truncated for brevity.",
        "refusal": null,
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1728073417,
  "model": "o1-2024-12-17",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_503a95a7d8",
  "usage": {
    "completion_tokens": 1843,
    "prompt_tokens": 20,
    "total_tokens": 1863,
    "completion_tokens_details": {
      "audio_tokens": null,
      "reasoning_tokens": 448
    },
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 0
    }
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "custom_blocklists": {
          "filtered": false
        },
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}

推論作業

注

推論モデルでは、モデル応答の reasoning_tokens の一部として completion_tokens_details があります。これらは、メッセージ応答コンテンツの一部としては返されない隠されたトークンですが、利用者の要求に対する最終的な回答を生成するためにモデルによって使用されます。 reasoning_effortは、lowを除くすべての推論モデルのmedium、high、またはo1-miniに設定できます。 GPT-5 推論モデルでは、reasoning_effortの新しいminimal設定がサポートされます。 effort 設定が高いほど、モデルが要求の処理に費やす時間が長くなり、一般に reasoning_tokens の数が多くなります。

開発者メッセージ

機能上、開発者メッセージ "role": "developer" はシステムメッセージと同じです。

前述のコード例に開発者メッセージを追加すると、次のようになります。

max_completion_tokens などの新しいパラメーターを利用するには、OpenAI Python ライブラリのバージョンのアップグレードが必要な場合があります。

pip install openai --upgrade

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    base_url="https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
)

response = client.chat.completions.create(
    model="gpt-5-mini", # replace with the model deployment name of your o1 deployment.
    messages=[
        {"role": "developer","content": "You are a helpful assistant."}, # optional equivalent to a system message for reasoning models 
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000,
    reasoning_effort = "medium" # low, medium, or high
)

print(response.model_dump_json(indent=2))

最新のパラメーターにアクセスするには、OpenAI クライアントライブラリをアップグレードする必要があります。

pip install openai --upgrade

認証に Microsoft Entra ID を初めて使用する場合は、「 Microsoft Entra ID 認証を使用して Azure OpenAI を構成する方法」を参照してください。

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)

response = client.chat.completions.create(
    model="o1-new", # replace with your model deployment name 
    messages=[
        {"role": "developer","content": "You are a helpful assistant."}, # optional equivalent to a system message for reasoning models 
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000,
    reasoning_effort = "medium" # low, medium, or high

)

print(response.model_dump_json(indent=2))


using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001 //currently required for token based authentication

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default");

ChatClient client = new(
    model: "o4-mini",
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions()
    {

        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
    }
);

ChatCompletionOptions options = new ChatCompletionOptions
{
    ReasoningEffortLevel = ChatReasoningEffortLevel.Low,
    MaxOutputTokenCount = 100000
};

ChatCompletion completion = client.CompleteChat(
         new DeveloperChatMessage("You are a helpful assistant"),
         new UserChatMessage("Tell me about the bitter lesson")
    );

Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");

推論の概要

Responses API で最新の推論モデルを使用する場合は、推論の概要パラメーターを使用して、モデルの思考推論チェーンの概要を受け取ることができます。

Important

推論の概要パラメーター以外の方法を使用して生の推論を抽出しようとしてもサポートされず、許容される使用ポリシーに違反する可能性があり、検出されると調整または中断が発生する可能性があります。

Python
レスト

最新のパラメーターにアクセスするには、OpenAI クライアントライブラリをアップグレードする必要があります。

pip install openai --upgrade

import os
from openai import OpenAI

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
  api_key=os.getenv("AZURE_OPENAI_API_KEY")  
)

response = client.responses.create(
    input="Tell me about the curious case of neural text degeneration",
    model="gpt-5", # replace with model deployment name
    reasoning={
        "effort": "medium",
        "summary": "auto" # auto, concise, or detailed, gpt-5 series do not support concise 
    },
    text={
        "verbosity": "low" # New with GPT-5 models
    }
)

print(response.model_dump_json(indent=2))

curl -X POST "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/responses" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
 -d '{
     "model": "gpt-5",
     "input": "Tell me about the curious case of neural text degeneration",
     "reasoning": {"summary": "auto"},
     "text": {"verbosity": "low"}
    }'

{
  "id": "resp_689a0a3090808190b418acf12b5cc40e0fc1c31bc69d8719",
  "created_at": 1754925616.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "gpt-5",
  "object": "response",
  "output": [
    {
      "id": "rs_689a0a329298819095d90c34dc9b80db0fc1c31bc69d8719",
      "summary": [],
      "type": "reasoning",
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_689a0a33009881909fe0fcf57cba30200fc1c31bc69d8719",
      "content": [
        {
          "annotations": [],
          "text": "Neural text degeneration refers to the ways language models produce low-quality, repetitive, or vacuous text, especially when generating long outputs. It’s “curious” because models trained to imitate fluent text can still spiral into unnatural patterns. Key aspects:\n\n- Repetition and loops: The model repeats phrases or sentences (“I’m sorry, but...”), often due to high-confidence tokens reinforcing themselves.\n- Loss of specificity: Vague, generic, agreeable text that avoids concrete details.\n- Drift and contradiction: The output gradually departs from context or contradicts itself over long spans.\n- Exposure bias: During training, models see gold-standard prefixes; at inference, they must condition on their own imperfect outputs, compounding errors.\n- Likelihood vs. quality mismatch: Maximizing token-level likelihood doesn’t align with human preferences for diversity, coherence, or factuality.\n- Token over-optimization: Frequent, safe tokens get overused; certain phrases become attractors.\n- Entropy collapse: With greedy or low-temperature decoding, the distribution narrows too much, causing repetitive, low-entropy text.\n- Length and beam search issues: Larger beams or long generations can favor bland, repetitive sequences (the “likelihood trap”).\n\nCommon mitigations:\n\n- Decoding strategies:\n  - Top-k, nucleus (top-p), or temperature sampling to keep sufficient entropy.\n  - Typical sampling and locally typical sampling to avoid dull but high-probability tokens.\n  - Repetition penalties, presence/frequency penalties, no-repeat n-grams.\n  - Contrastive decoding (and variants like DoLa) to filter generic continuations.\n  - Min/max length, stop sequences, and beam search with diversity/penalties.\n\n- Training and alignment:\n  - RLHF/DPO to better match human preferences for non-repetitive, helpful text.\n  - Supervised fine-tuning on high-quality, diverse data; instruction tuning.\n  - Debiasing objectives (unlikelihood training) to penalize repetition and banned patterns.\n  - Mixture-of-denoisers or latent planning to improve long-range coherence.\n\n- Architectural and planning aids:\n  - Retrieval-augmented generation to ground outputs.\n  - Tool use and structured prompting to constrain drift.\n  - Memory and planning modules, hierarchical decoding, or sentence-level control.\n\n- Prompting tips:\n  - Ask for concise answers, set token limits, and specify structure.\n  - Provide concrete constraints or content to reduce generic filler.\n  - Use “say nothing if uncertain” style instructions to avoid vacuity.\n\nRepresentative papers/terms to search:\n- Holtzman et al., “The Curious Case of Neural Text Degeneration” (2020): nucleus sampling.\n- Welleck et al., “Neural Text Degeneration with Unlikelihood Training.”\n- Li et al., “A Contrastive Framework for Decoding.”\n- Su et al., “DoLa: Decoding by Contrasting Layers.”\n- Meister et al., “Typical Decoding.”\n- Ouyang et al., “Training language models to follow instructions with human feedback.”\n\nIn short, degeneration arises from a mismatch between next-token likelihood and human preferences plus decoding choices; careful decoding, training objectives, and grounding help prevent it.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": "minimal",
    "generate_summary": null,
    "summary": "detailed"
  },
  "safety_identifier": null,
  "service_tier": "default",
  "status": "completed",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 16,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 657,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 673
  },
  "user": null,
  "content_filters": null,
  "store": true
}

注

有効化されていても、推論概要がすべてのテップ/要求に対して生成されることは保証されません。これは正しい動作です。

Python Lark

GPT-5 シリーズ推論モデルには、custom_toolと呼ばれる新しいlark_toolを呼び出す機能があります。このツールは Python ラークに基づいており、モデル出力のより柔軟な制約に使用できます。

レスポンス API

{
  "model": "gpt-5-2025-08-07",
  "input": "please calculate the area of a circle with radius equal to the number of 'r's in strawberry",
  "tools": [
    {
      "type": "custom",
      "name": "lark_tool",
      "format": {
        "type": "grammar",
        "syntax": "lark",
        "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
      }
    }
  ],
  "tool_choice": "required"
}

import os
from openai import OpenAI

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
  api_key=os.getenv("AZURE_OPENAI_API_KEY")  
)

response = client.responses.create(  
    model="gpt-5",  # replace with your model deployment name  
    tools=[  
        {  
            "type": "custom",
            "name": "lark_tool",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
            }
        }  
    ],  
    input=[{"role": "user", "content": "Please calculate the area of a circle with radius equal to the number of 'r's in strawberry"}],  
)  

print(response.model_dump_json(indent=2))

出力:

{
  "id": "resp_689a0cf927408190b8875915747667ad01c936c6ffb9d0d3",
  "created_at": 1754926332.0,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "metadata": {},
  "model": "gpt-5",
  "object": "response",
  "output": [
    {
      "id": "rs_689a0cfd1c888190a2a67057f471b5cc01c936c6ffb9d0d3",
      "summary": [],
      "type": "reasoning",
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_689a0d00e60c81908964e5e9b2d6eeb501c936c6ffb9d0d3",
      "content": [
        {
          "annotations": [],
          "text": "“strawberry” has 3 r’s, so the radius is 3.\nArea = πr² = π × 3² = 9π ≈ 28.27 square units.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [
    {
      "name": "lark_tool",
      "parameters": null,
      "strict": null,
      "type": "custom",
      "description": null,
      "format": {
        "type": "grammar",
        "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/",
        "syntax": "lark"
      }
    }
  ],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": "medium",
    "generate_summary": null,
    "summary": null
  },
  "safety_identifier": null,
  "service_tier": "default",
  "status": "completed",
  "text": {
    "format": {
      "type": "text"
    }
  },
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 139,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 240,
    "output_tokens_details": {
      "reasoning_tokens": 192
    },
    "total_tokens": 379
  },
  "user": null,
  "content_filters": null,
  "store": true
}

チャット入力候補

{
  "messages": [
    {
      "role": "user",
      "content": "Which one is larger, 42 or 0?"
    }
  ],
  "tools": [
    {
      "type": "custom",
      "name": "custom_tool",
      "custom": {
        "name": "lark_tool",
        "format": {
          "type": "grammar",
          "grammar": {
            "syntax": "lark",
            "definition": "start: QUESTION NEWLINE ANSWER\nQUESTION: /[^\\n?]{1,200}\\?/\nNEWLINE: /\\n/\nANSWER: /[^\\n!]{1,200}!/"
          }
        }
      }
    }
  ],
  "tool_choice": "required",
  "model": "gpt-5-2025-08-07"
}

Markdown の出力

既定では、o3-mini および o1 モデルによって、Markdown 書式設定を含む出力の生成は試みられません。この動作が望ましくない一般的なユースケースとしては、Markdown コードブロック内に含まれるコードを当該モデルを使用して出力する場合が挙げられます。モデルにおいて Markdown 書式設定を使用しないで出力を生成すると、構文の強調表示や、対話型プレイグラウンドエクスペリエンスでのコピー可能なコードブロックなどの機能が失われます。この新しい既定の動作をオーバーライドし、モデルの応答に Markdown を含めることを推奨するには、開発者メッセージの先頭に文字列 Formatting re-enabled を追加します。

開発者メッセージの先頭に Formatting re-enabled を追加しても、モデルの応答に Markdown 書式設定が含まれることが保証されるわけではなく、含まれる可能性が高くなるに過ぎません。内部テストの結果、Formatting re-enabled 自体の効果は o1 モデルを使用した場合よりも o3-mini モデルを使用した場合の方が低いことが判明しました。

Formatting re-enabled のパフォーマンスを向上させるには、開発者メッセージの先頭をさらに拡張します。すると、多くの場合、目的の出力が生成されるようになります。開発者メッセージの先頭に Formatting re-enabled を追加するだけでなく、次の例のように、よりわかりやすい初期命令を追加して実験することもできます。

Formatting re-enabled - please enclose code blocks with appropriate markdown tags.
Formatting re-enabled - code output should be wrapped in markdown.

期待する出力によっては、最初の開発者メッセージをさらにカスタマイズして、特定のユースケースをターゲットにすることが必要となる場合があります。

フィードバック

このページはお役に立ちましたか?

Last updated on 2025-10-11

次の方法で共有

Azure OpenAI の推論モデル

可用性

リージョンの可用性

API と機能のサポート

新しい GPT-5 推論機能

サポートされていません

使用方法

推論作業

開発者メッセージ

推論の概要

Python Lark

レスポンス API

チャット入力候補

Markdown の出力

フィードバック

その他のリソース