Observação
O acesso a essa página exige autorização. Você pode tentar entrar ou alterar diretórios.
O acesso a essa página exige autorização. Você pode tentar alterar os diretórios.
A API do GPT Realtime do OpenAI do Azure para fala e áudio faz parte da família de modelos GPT-4o que oferece suporte a interações conversacionais de baixa latência, do tipo "fala de entrada, fala de saída".
Você pode usar a API em tempo real via WebRTC, SIP ou WebSocket para enviar entrada de áudio para o modelo e receber respostas de áudio em tempo real. Siga as instruções neste artigo para começar a usar a API do Realtime via WebRTC.
Na maioria dos casos, use a API WebRTC para streaming de áudio em tempo real. A API WebRTC é um padrão da Web que permite a RTC (comunicação em tempo real) entre navegadores e aplicativos móveis. Aqui estão alguns motivos pelos quais o WebRTC é preferido para streaming de áudio em tempo real:
- Latência mais baixa: o WebRTC foi projetado para minimizar o atraso, tornando-o mais adequado para a comunicação de áudio e vídeo, em que a baixa latência é essencial para manter a qualidade e a sincronização.
- Manipulação de mídia: o WebRTC tem suporte interno para codecs de áudio e vídeo, fornecendo tratamento otimizado de fluxos de mídia.
- Correção de erro: o WebRTC inclui mecanismos para lidar com perda e tremulação de pacotes, que são essenciais para manter a qualidade dos fluxos de áudio em redes imprevisíveis.
- Comunicação ponto a ponto: o WebRTC permite a comunicação direta entre clientes, reduzindo a necessidade de um servidor central retransmitir dados de áudio, o que pode reduzir ainda mais a latência.
Use a API em tempo real por meio de WebSockets se precisar:
- Transmitir dados de áudio de um servidor para um cliente.
- Enviar e receber dados em tempo real entre um cliente e um servidor.
WebSockets não são recomendados para streaming de áudio em tempo real porque têm latência maior que WebRTC.
Modelos com suporte
Você pode acessar os modelos de GPT em tempo real para implantações globais nas regiões Leste dos EUA 2 e Central da Suécia.
-
gpt-4o-mini-realtime-preview(2024-12-17) -
gpt-4o-realtime-preview(2024-12-17) -
gpt-realtime(versão 2025-08-28) -
gpt-realtime-mini(versão 2025-10-06)
Você deve usar a versão 2025-08-28 da API na URL para a API em tempo real. A versão da API está incluída na URL das sessões.
Para obter mais informações sobre modelos com suporte, consulte a documentação de modelos e versões.
Importante
Protocolo GA para WebRTC.
Você ainda pode usar o protocolo beta, mas recomendamos que você comece com o Protocolo GA. Se você for um cliente atual, planeje migrar para o Protocolo GA.
Este artigo descreve como usar o WebRTC com o Protocolo GA. Preservamos a documentação do protocolo herdado aqui.
Pré-requisitos
Antes de poder usar o áudio do GPT em tempo real, você precisa:
- Uma assinatura do Azure – Crie uma gratuitamente.
- Um recurso do OpenAI do Azure criado em uma região com suporte. Para obter mais informações, consulte Criar um recurso e implantar um modelo com o Azure OpenAI.
- Uma implantação do
gpt-4o-realtime-preview,gpt-4o-mini-realtime-preview,gpt-realtimeougpt-realtime-minimodelo em uma região com suporte, conforme descrito na seção modelos com suporte neste artigo. Você pode implantar o modelo no catálogo de modelos do Foundry ou no seu projeto no portal do Microsoft Foundry.
Configurar WebRTC
Para usar o WebRTC, você precisa de duas partes de código.
- Seu aplicativo do navegador da Web
- Um serviço em que seu navegador da Web pode recuperar um token efêmero
Mais opções:
Você pode usar o mesmo serviço que recupera o token efêmero para fazer o proxy da negociação de sessão do navegador da Web por meio do Protocolo de Descrição de Sessão. Esse cenário tem melhor segurança, pois o navegador da Web não tem acesso ao token efêmero.
Você pode filtrar as mensagens que vão para o navegador da Web usando um parâmetro de consulta.
Você pode criar uma conexão websocket de observador para ouvir ou gravar a sessão.
Steps
Etapa 1: Configurar o serviço para adquirir um token efêmero
A chave para gerar um token efêmero é a API REST usando
url = https://{your azure resource}.openai.azure.com/openai/v1/realtime/client_secrets
Use essa URL com uma chave de api ou um token de ID do Microsoft Entra. Essa solicitação recupera um token efêmero e configura a configuração de sessão que você deseja que o navegador da Web use, incluindo as instruções de prompt e a voz de saída.
Aqui está um exemplo de código python para um serviço de token. O navegador web pode chamar esse serviço usando o endpoint /token para recuperar um token efêmero. Este código de exemplo usa o DefaultAzureCredential para autenticar-se no RealtimeAPI gerando tokens efêmeros.
from flask import Flask, jsonify
import os
import requests
import time
import threading
from azure.identity import DefaultAzureCredential
app = Flask(__name__)
# Session configuration
session_config = {
"session": {
"type": "realtime",
"model": "<your model deployment name>",
"instructions": "You are a helpful assistant.",
"audio": {
"output": {
"voice": "marin",
},
},
},
}
# Get configuration from environment variables
azure_resource = os.getenv('AZURE_RESOURCE') # e.g., 'your-azure-resource'
# Token caching variables
cached_token = None
token_expiry = 0
token_lock = threading.Lock()
def get_bearer_token(resource_scope: str) -> str:
"""Get a bearer token using DefaultAzureCredential with caching."""
global cached_token, token_expiry
current_time = time.time()
# Check if we have a valid cached token (with 5 minute buffer before expiry)
with token_lock:
if cached_token and current_time < (token_expiry - 300):
return cached_token
# Get a new token
try:
credential = DefaultAzureCredential()
token = credential.get_token(resource_scope)
with token_lock:
cached_token = token.token
token_expiry = token.expires_on
print(f"Acquired new bearer token, expires at: {time.ctime(token_expiry)}")
return cached_token
except Exception as e:
print(f"Failed to acquire bearer token: {e}")
raise
@app.route('/token', methods=['GET'])
def get_token():
"""
An endpoint which returns the contents of a REST API request to the protected endpoint.
Uses DefaultAzureCredential for authentication with token caching.
"""
try:
# Get bearer token using DefaultAzureCredential
bearer_token = get_bearer_token("https://cognitiveservices.azure.com/.default")
# Construct the Azure OpenAI endpoint URL
url = f"https://{azure_resource}.openai.azure.com/openai/v1/realtime/client_secrets"
headers = {
"Authorization": f"Bearer {bearer_token}",
"Content-Type": "application/json",
}
# Make the request to Azure OpenAI
response = requests.post(
url,
headers=headers,
json=session_config,
timeout=30
)
# Check if the request was successful
if response.status_code != 200:
print(f"Request failed with status {response.status_code}: {response.reason}")
print(f"Response headers: {dict(response.headers)}")
print(f"Response content: {response.text}")
response.raise_for_status()
# Parse the JSON response and extract the ephemeral token
data = response.json()
ephemeral_token = data.get('value', '')
if not ephemeral_token:
print(f"No ephemeral token found in response: {data}")
return jsonify({"error": "No ephemeral token available"}), 500
# Return the ephemeral token as JSON
return jsonify({"token": ephemeral_token})
except requests.exceptions.RequestException as e:
print(f"Token generation error: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f"Response status: {e.response.status_code}")
print(f"Response reason: {e.response.reason}")
print(f"Response content: {e.response.text}")
return jsonify({"error": "Failed to generate token"}), 500
except Exception as e:
print(f"Unexpected error: {e}")
return jsonify({"error": "Failed to generate token"}), 500
if __name__ == '__main__':
if not azure_resource:
print("Error: AZURE_RESOURCE environment variable is required")
exit(1)
print(f"Starting token service for Azure resource: {azure_resource}")
print("Using DefaultAzureCredential for authentication")
print("Production mode - use gunicorn to run this service:")
port = int(os.getenv('PORT', 5000))
print(f" gunicorn -w 4 -b 0.0.0.0:{port} --timeout 30 token-service:app")
Etapa 2: Configurar o aplicativo do navegador
O aplicativo do navegador chama seu serviço de token para obter o token e, em seguida, inicia uma conexão webRTC com o RealtimeAPI. Para iniciar a conexão webRTC, use a seguinte URL com o token efêmero para autenticação.
https://<your azure resource>.openai.azure.com/openai/v1/realtime/calls
Uma vez conectado, o aplicativo do navegador envia texto pelo canal de dados e áudio pelo canal de mídia. Aqui está um documento HTML de exemplo para você começar.
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Azure OpenAI Realtime Session</title>
</head>
<body>
<h1>Azure OpenAI Realtime Session</h1>
<button onclick="StartSession()">Start Session</button>
<!-- Log container for API messages -->
<div id="logContainer"></div>
<script>
const AZURE_RESOURCE = "<your azure resource>"
const WEBRTC_URL= `https://${AZURE_RESOURCE}.openai.azure.com/openai/v1/realtime/calls?webrtcfilter=on`
async function StartSession() {
try {
// Call our token service to get the ephemeral key
const tokenResponse = await fetch("/token");
if (!tokenResponse.ok) {
throw new Error(`Token service request failed: ${tokenResponse.status}`);
}
const tokenData = await tokenResponse.json();
const ephemeralKey = tokenData.token;
console.log("Ephemeral key received from token service");
// Mask the ephemeral key in the log message.
logMessage("Ephemeral Key Received from Token Service: " + "***");
// Set up the WebRTC connection using the ephemeral key.
init(ephemeralKey);
} catch (error) {
console.error("Error fetching ephemeral key:", error);
logMessage("Error fetching ephemeral key: " + error.message);
}
}
async function init(ephemeralKey) {
logMessage("🚀 Starting WebRTC initialization...");
let peerConnection = new RTCPeerConnection();
logMessage("✅ RTCPeerConnection created");
// Set up to play remote audio from the model.
const audioElement = document.createElement('audio');
audioElement.autoplay = true;
document.body.appendChild(audioElement);
logMessage("🔊 Audio element created and added to page");
peerConnection.ontrack = (event) => {
logMessage("🎵 Remote track received! Type: " + event.track.kind);
logMessage("📊 Number of streams: " + event.streams.length);
if (event.streams.length > 0) {
audioElement.srcObject = event.streams[0];
logMessage("✅ Audio stream assigned to audio element");
// Add event listeners to audio element for debugging
audioElement.onloadstart = () => logMessage("🔄 Audio loading started");
audioElement.oncanplay = () => logMessage("▶️ Audio can start playing");
audioElement.onplay = () => logMessage("🎵 Audio playback started");
audioElement.onerror = (e) => logMessage("❌ Audio error: " + e.message);
} else {
logMessage("⚠️ No streams in track event");
}
};
// Set up data channel for sending and receiving events
logMessage("🎤 Requesting microphone access...");
try {
const clientMedia = await navigator.mediaDevices.getUserMedia({ audio: true });
logMessage("✅ Microphone access granted");
const audioTrack = clientMedia.getAudioTracks()[0];
logMessage("🎤 Audio track obtained: " + audioTrack.label);
peerConnection.addTrack(audioTrack);
logMessage("✅ Audio track added to peer connection");
} catch (error) {
logMessage("❌ Failed to get microphone access: " + error.message);
return;
}
const dataChannel = peerConnection.createDataChannel('realtime-channel');
logMessage("📡 Data channel created");
dataChannel.addEventListener('open', () => {
logMessage('✅ Data channel is open - ready to send messages');
// Send client events to start the conversation
logMessage("📝 Preparing to send text input message...");
const event = {
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "hello there! Can you give me some vacation options?",
},
],
},
};
logMessage("📤 Sending conversation.item.create event...");
logMessage("💬 Text content: " + event.item.content[0].text);
try {
dataChannel.send(JSON.stringify(event));
logMessage("✅ Text input sent successfully!");
// Now send response.create to trigger the AI response
const responseEvent = {
type: "response.create"
};
logMessage("📤 Sending response.create event to trigger AI response...");
dataChannel.send(JSON.stringify(responseEvent));
logMessage("✅ Response.create sent successfully!");
} catch (error) {
logMessage("❌ Failed to send text input: " + error.message);
}
}); dataChannel.addEventListener('message', (event) => {
const realtimeEvent = JSON.parse(event.data);
console.log(realtimeEvent);
logMessage("Received server event: " + JSON.stringify(realtimeEvent, null, 2));
if (realtimeEvent.type === "session.update") {
const instructions = realtimeEvent.session.instructions;
logMessage("Instructions: " + instructions);
} else if (realtimeEvent.type === "session.error") {
logMessage("Error: " + realtimeEvent.error.message);
} else if (realtimeEvent.type === "session.end") {
logMessage("Session ended.");
}
});
dataChannel.addEventListener('close', () => {
logMessage('Data channel is closed');
});
// Start the session using the Session Description Protocol (SDP)
logMessage("🤝 Creating WebRTC offer...");
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
logMessage("✅ Local description set");
logMessage("📡 Sending SDP offer to: " + WEBRTC_URL);
const sdpResponse = await fetch(`${WEBRTC_URL}`, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer ${ephemeralKey}`,
"Content-Type": "application/sdp",
},
});
logMessage("📥 Received SDP response, status: " + sdpResponse.status);
if (!sdpResponse.ok) {
logMessage("❌ SDP exchange failed: " + sdpResponse.statusText);
return;
}
const answerSdp = await sdpResponse.text();
logMessage("✅ Got SDP answer, length: " + answerSdp.length + " chars");
const answer = { type: "answer", sdp: answerSdp };
await peerConnection.setRemoteDescription(answer);
logMessage("✅ Remote description set - WebRTC connection should be establishing...");
// Add connection state logging
peerConnection.onconnectionstatechange = () => {
logMessage("🔗 Connection state: " + peerConnection.connectionState);
};
peerConnection.oniceconnectionstatechange = () => {
logMessage("🧊 ICE connection state: " + peerConnection.iceConnectionState);
}; const button = document.createElement('button');
button.innerText = 'Close Session';
button.onclick = stopSession;
document.body.appendChild(button);
function stopSession() {
if (dataChannel) dataChannel.close();
if (peerConnection) peerConnection.close();
peerConnection = null;
logMessage("Session closed.");
}
}
function logMessage(message) {
const logContainer = document.getElementById("logContainer");
const p = document.createElement("p");
p.textContent = message;
logContainer.appendChild(p);
}
</script>
</body>
</html>
No exemplo, usamos o parâmetro de consulta webrtcfilter=on. Esse parâmetro de consulta limita as mensagens de canal de dados enviadas ao navegador para manter suas instruções de prompt privadas. Quando o filtro é ativado, somente as seguintes mensagens são retornadas para o navegador no canal de dados:
- input_audio_buffer.speech_started
- input_audio_buffer.speech_stopped
- buffer_de_saída_de_áudio.iniciado
- output_audio_buffer.stopped
- conversation.item.input_audio_transcription.completed
- conversation.item.added
- conversation.item.created
- response.output_text.delta
- response.output_text.done
- response.output_audio_transcript.delta
- response.output_audio_transcript.done
Etapa 3 (opcional): Criar um observador/controlador websocket
Se você fizer o proxy da negociação de sessão por meio do seu aplicativo de serviço, poderá analisar o cabeçalho Location retornado e usá-lo para criar uma conexão websocket com a chamada WebRTC. Essa conexão pode registrar a chamada WebRTC e até controlá-la emitindo eventos session.update e outros comandos diretamente.
Aqui está uma versão atualizada do token_service mostrado anteriormente, agora com um endpoint "/connect" que você pode usar para obter o token temporário e negociar o início da sessão. Ele também inclui uma conexão websocket que escuta a sessão webRTC.
from flask import Flask, jsonify, request
#from flask_cors import CORS
import os
import requests
import time
import threading
import asyncio
import json
import websockets
from azure.identity import DefaultAzureCredential
app = Flask(__name__)
# CORS(app) # Enable CORS for all routes when running locally for testing
# Session configuration
session_config = {
"session": {
"type": "realtime",
"model": "<YOUR MODEL DEPLOYMENT NAME>",
"instructions": "You are a helpful assistant.",
"audio": {
"output": {
"voice": "marin",
},
},
},
}
# Get configuration from environment variables
azure_resource = os.getenv('AZURE_RESOURCE') # e.g., 'your-azure-resource'
# Token caching variables
cached_token = None
token_expiry = 0
token_lock = threading.Lock()
def get_bearer_token(resource_scope: str) -> str:
"""Get a bearer token using DefaultAzureCredential with caching."""
global cached_token, token_expiry
current_time = time.time()
# Check if we have a valid cached token (with 5 minute buffer before expiry)
with token_lock:
if cached_token and current_time < (token_expiry - 300):
return cached_token
# Get a new token
try:
credential = DefaultAzureCredential()
token = credential.get_token(resource_scope)
with token_lock:
cached_token = token.token
token_expiry = token.expires_on
print(f"Acquired new bearer token, expires at: {time.ctime(token_expiry)}")
return cached_token
except Exception as e:
print(f"Failed to acquire bearer token: {e}")
raise
def get_ephemeral_token():
"""
Generate an ephemeral token from Azure OpenAI.
Returns:
str: The ephemeral token
Raises:
Exception: If token generation fails
"""
# Get bearer token using DefaultAzureCredential
bearer_token = get_bearer_token("https://cognitiveservices.azure.com/.default")
# Construct the Azure OpenAI endpoint URL
url = f"https://{azure_resource}.openai.azure.com/openai/v1/realtime/client_secrets"
headers = {
"Authorization": f"Bearer {bearer_token}",
"Content-Type": "application/json",
}
# Make the request to Azure OpenAI
response = requests.post(
url,
headers=headers,
json=session_config,
timeout=30
)
# Check if the request was successful
if response.status_code != 200:
print(f"Request failed with status {response.status_code}: {response.reason}")
print(f"Response headers: {dict(response.headers)}")
print(f"Response content: {response.text}")
response.raise_for_status()
# Parse the JSON response and extract the ephemeral token
data = response.json()
ephemeral_token = data.get('value', '')
if not ephemeral_token:
print(f"No ephemeral token found in response: {data}")
raise Exception("No ephemeral token available")
return ephemeral_token
def perform_sdp_negotiation(ephemeral_token, sdp_offer):
"""
Perform SDP negotiation with the Azure OpenAI Realtime API.
Args:
ephemeral_token (str): The ephemeral token for authentication
sdp_offer (str): The SDP offer to send
Returns:
tuple: (sdp_answer, location_header) - The SDP answer from the server and Location header for WebSocket
Raises:
Exception: If SDP negotiation fails
"""
# Construct the realtime endpoint URL - matching the v1transceiver_test pattern
realtime_url = f"https://{azure_resource}.openai.azure.com/openai/v1/realtime/calls"
headers = {
'Authorization': f'Bearer {ephemeral_token}',
'Content-Type': 'application/sdp' # Azure OpenAI expects application/sdp, not form data
}
print(f"Sending SDP offer to: {realtime_url}")
# Send the SDP offer as raw body data (not form data)
response = requests.post(realtime_url, data=sdp_offer, headers=headers, timeout=30)
if response.status_code == 201: # Changed from 200 to 201 to match the test expectation
sdp_answer = response.text
location_header = response.headers.get('Location', '')
print(f"Received SDP answer: {sdp_answer[:100]}...")
if location_header:
print(f"Captured Location header: {location_header}")
else:
print("Warning: No Location header found in response")
return sdp_answer, location_header
else:
error_msg = f"SDP negotiation failed: {response.status_code} - {response.text}"
print(error_msg)
raise Exception(error_msg)
@app.route('/token', methods=['GET'])
def get_token():
"""
An endpoint which returns an ephemeral token for Azure OpenAI Realtime API.
Uses DefaultAzureCredential for authentication with token caching.
"""
try:
ephemeral_token = get_ephemeral_token()
return jsonify({
"token": ephemeral_token,
"endpoint": f"https://{azure_resource}.openai.azure.com",
"deployment": "gpt-4o-realtime-preview"
})
except requests.exceptions.RequestException as e:
print(f"Token generation error: {e}")
if hasattr(e, 'response') and e.response is not None:
print(f"Response status: {e.response.status_code}")
print(f"Response reason: {e.response.reason}")
print(f"Response content: {e.response.text}")
return jsonify({"error": "Failed to generate token"}), 500
except Exception as e:
print(f"Unexpected error: {e}")
return jsonify({"error": "Failed to generate token"}), 500
async def connect_websocket(location_header, bearer_token=None, api_key=None):
"""
Connect to the WebSocket endpoint using the Location header.
Similar to the _connect_websocket function in run_v1transceiver_test.py
Args:
location_header (str): The Location header from the SDP negotiation response
bearer_token (str, optional): Bearer token for authentication
api_key (str, optional): API key for authentication (fallback)
Returns:
None: Just logs messages, doesn't store them
"""
# Extract call_id from ___location header
# Example: /v1/realtime/calls/rtc_abc123 -> rtc_abc123
call_id = location_header.split('/')[-1]
print(f"Extracted call_id: {call_id}")
# Construct WebSocket URL: wss://<resource>.openai.azure.com/openai/v1/realtime?call_id=<call_id>
ws_url = f"wss://{azure_resource}.openai.azure.com/openai/v1/realtime?call_id={call_id}"
print(f"Connecting to WebSocket: {ws_url}")
message_count = 0
try:
# WebSocket headers - use proper authentication
headers = {}
if bearer_token is not None:
print("Using Bearer token for WebSocket authentication")
headers["Authorization"] = f"Bearer {bearer_token}"
elif api_key is not None:
print("Using API key for WebSocket authentication")
headers["api-key"] = api_key
else:
print("Warning: No authentication provided for WebSocket")
async with websockets.connect(ws_url, additional_headers=headers) as websocket:
print("WebSocket connection established")
# Listen for messages
try:
async for message in websocket:
try:
# Parse JSON message
json_data = json.loads(message)
msg_type = json_data.get('type', 'unknown')
message_count += 1
print(f"WebSocket [{message_count}]: {msg_type}")
# Handle specific message types with additional details
if msg_type == 'response.done':
session_status = json_data['response'].get('status', 'unknown')
session_details = json_data['response'].get('details', 'No details provided')
print(f" -> Response status: {session_status}, Details: {session_details}")
# Continue listening instead of breaking
elif msg_type == 'session.created':
session_id = json_data.get('session', {}).get('id', 'unknown')
print(f" -> Session created: {session_id}")
elif msg_type == 'error':
error_message = json_data.get('error', {}).get('message', 'No error message')
print(f" -> Error: {error_message}")
except json.JSONDecodeError:
message_count += 1
print(f"WebSocket [{message_count}]: Non-JSON message: {message[:100]}...")
except Exception as e:
print(f"Error processing WebSocket message: {e}")
except websockets.exceptions.ConnectionClosed:
print(f"WebSocket connection closed by remote (processed {message_count} messages)")
except Exception as e:
print(f"WebSocket message loop error: {e}")
except Exception as e:
print(f"WebSocket connection error: {e}")
print(f"WebSocket monitoring completed. Total messages processed: {message_count}")
def start_websocket_background(location_header, bearer_token):
"""
Start WebSocket connection in background thread to monitor/record the call.
"""
def run_websocket():
try:
print(f"Starting background WebSocket monitoring for: {location_header}")
# Create new event loop for this thread
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
# Run the WebSocket connection (now just logs, doesn't return messages)
loop.run_until_complete(
connect_websocket(location_header, bearer_token)
)
print("Background WebSocket monitoring completed.")
except Exception as e:
print(f"Background WebSocket error: {e}")
finally:
loop.close()
except Exception as e:
print(f"Failed to start background WebSocket: {e}")
# Start the WebSocket in a background thread
websocket_thread = threading.Thread(target=run_websocket, daemon=True)
websocket_thread.start()
print("Background WebSocket thread started")
@app.route('/connect', methods=['POST'])
def connect_and_negotiate():
"""
Get token and perform SDP negotiation.
Expects multipart form data with 'sdp' field containing the SDP offer.
Returns SDP answer as plain text (matching the v1transceiver_test behavior).
Automatically starts WebSocket connection in background to monitor/record the call.
"""
try:
# Get the SDP offer from multipart form data
if 'sdp' not in request.form:
return jsonify({"error": "Missing 'sdp' field in multipart form data"}), 400
sdp_offer = request.form['sdp']
print(f"Received SDP offer: {sdp_offer[:100]}...")
# Get ephemeral token using shared function
ephemeral_token = get_ephemeral_token()
print(f"Got ephemeral token for SDP negotiation: {ephemeral_token[:20]}...")
# Perform SDP negotiation using shared function
sdp_answer, location_header = perform_sdp_negotiation(ephemeral_token, sdp_offer)
# Create response headers
response_headers = {'Content-Type': 'application/sdp'}
# If we have a ___location header, start WebSocket connection in background to monitor/record the call
if location_header:
try:
# Get a bearer token for WebSocket authentication
bearer_token = get_bearer_token("https://cognitiveservices.azure.com/.default")
start_websocket_background(location_header, bearer_token)
except Exception as e:
print(f"Failed to start background WebSocket monitoring: {e}")
# Don't fail the main request if WebSocket setup fails
# Return SDP answer as plain text, just like the v1transceiver_test expects
return sdp_answer, 201, response_headers
except Exception as e:
error_msg = f"Error in SDP negotiation: {e}"
print(error_msg)
return jsonify({"error": error_msg}), 500
if __name__ == '__main__':
if not azure_resource:
print("Error: AZURE_RESOURCE environment variable is required")
exit(1)
print(f"Starting token service for Azure resource: {azure_resource}")
print("Using DefaultAzureCredential for authentication")
port = int(os.getenv('PORT', 5000))
print(f" gunicorn -w 4 -b 0.0.0.0:{port} --timeout 30 token-service:app")
As alterações associadas do navegador são mostradas aqui.
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Azure OpenAI Realtime Session - Connect Endpoint</title>
</head>
<body>
<h1>Azure OpenAI Realtime Session - Using /connect Endpoint</h1>
<button onclick="StartSession()">Start Session</button>
<!-- Log container for API messages -->
<div id="logContainer"></div>
<script>
const AZURE_RESOURCE = "YOUR AZURE RESOURCE NAME"
async function StartSession() {
try {
logMessage("🚀 Starting session with /connect endpoint...");
// Set up the WebRTC connection first
const peerConnection = new RTCPeerConnection();
logMessage("✅ RTCPeerConnection created");
// Get microphone access and add audio track BEFORE creating offer
logMessage("🎤 Requesting microphone access...");
try {
const clientMedia = await navigator.mediaDevices.getUserMedia({ audio: true });
logMessage("✅ Microphone access granted");
const audioTrack = clientMedia.getAudioTracks()[0];
logMessage("🎤 Audio track obtained: " + audioTrack.label);
peerConnection.addTrack(audioTrack);
logMessage("✅ Audio track added to peer connection");
} catch (error) {
logMessage("❌ Failed to get microphone access: " + error.message);
return;
}
// Set up audio playback
const audioElement = document.createElement('audio');
audioElement.autoplay = true;
document.body.appendChild(audioElement);
logMessage("🔊 Audio element created and added to page");
peerConnection.ontrack = (event) => {
logMessage("🎵 Remote track received! Type: " + event.track.kind);
logMessage("📊 Number of streams: " + event.streams.length);
if (event.streams.length > 0) {
audioElement.srcObject = event.streams[0];
logMessage("✅ Audio stream assigned to audio element");
// Add event listeners to audio element for debugging
audioElement.onloadstart = () => logMessage("🔄 Audio loading started");
audioElement.oncanplay = () => logMessage("▶️ Audio can start playing");
audioElement.onplay = () => logMessage("🎵 Audio playback started");
audioElement.onerror = (e) => logMessage("❌ Audio error: " + e.message);
} else {
logMessage("⚠️ No streams in track event");
}
};
// Set up data channel BEFORE SDP exchange
const dataChannel = peerConnection.createDataChannel('realtime-channel');
logMessage("📡 Data channel created");
dataChannel.addEventListener('open', () => {
logMessage('✅ Data channel is open - ready to send messages');
// Send client events to start the conversation
logMessage("📝 Preparing to send text input message...");
const event = {
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "hello there! Can you give me some vacation options?",
},
],
},
};
logMessage("📤 Sending conversation.item.create event...");
logMessage("💬 Text content: " + event.item.content[0].text);
try {
dataChannel.send(JSON.stringify(event));
logMessage("✅ Text input sent successfully!");
// Now send response.create to trigger the AI response
const responseEvent = {
type: "response.create"
};
logMessage("📤 Sending response.create event to trigger AI response...");
dataChannel.send(JSON.stringify(responseEvent));
logMessage("✅ Response.create sent successfully!");
} catch (error) {
logMessage("❌ Failed to send text input: " + error.message);
}
});
dataChannel.addEventListener('message', (event) => {
const realtimeEvent = JSON.parse(event.data);
console.log(realtimeEvent);
logMessage("📥 Received server event: " + realtimeEvent.type);
// Log more detail for important events
if (realtimeEvent.type === "error") {
logMessage("❌ Error: " + realtimeEvent.error.message);
} else if (realtimeEvent.type === "session.created") {
logMessage("🎉 Session created successfully");
} else if (realtimeEvent.type === "response.output_audio_transcript.done") {
logMessage("📝 AI transcript complete: " + (realtimeEvent.transcript || ""));
} else if (realtimeEvent.type === "response.done") {
logMessage("✅ Response completed");
}
});
dataChannel.addEventListener('close', () => {
logMessage('❌ Data channel is closed');
});
dataChannel.addEventListener('error', (error) => {
logMessage('❌ Data channel error: ' + error);
});
// Add connection state logging
peerConnection.onconnectionstatechange = () => {
logMessage("🔗 Connection state: " + peerConnection.connectionState);
};
peerConnection.oniceconnectionstatechange = () => {
logMessage("🧊 ICE connection state: " + peerConnection.iceConnectionState);
};
// Create offer AFTER setting up data channel
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
logMessage("🤝 WebRTC offer created with audio track");
// Prepare multipart form data for /connect endpoint
const formData = new FormData();
formData.append('sdp', offer.sdp);
logMessage("📤 Sending SDP via multipart form to /connect endpoint...");
// Call our /connect endpoint with multipart form data
const connectResponse = await fetch("/connect", {
method: "POST",
body: formData // FormData automatically sets correct Content-Type
});
if (!connectResponse.ok) {
throw new Error(`Connect service request failed: ${connectResponse.status}`);
}
// Get the SDP answer directly as text (not JSON)
const answerSdp = await connectResponse.text();
logMessage("✅ Got SDP answer from /connect endpoint, length: " + answerSdp.length + " chars");
// Set up the WebRTC connection using the SDP answer
const answer = { type: "answer", sdp: answerSdp };
await peerConnection.setRemoteDescription(answer);
logMessage("✅ Remote description set");
// Add close session button
const button = document.createElement('button');
button.innerText = 'Close Session';
button.onclick = () => stopSession(dataChannel, peerConnection);
document.body.appendChild(button);
logMessage("🔴 Close session button added");
function stopSession(dataChannel, peerConnection) {
if (dataChannel) dataChannel.close();
if (peerConnection) peerConnection.close();
logMessage("Session closed.");
}
} catch (error) {
console.error("Error in StartSession:", error);
logMessage("Error in StartSession: " + error.message);
}
}
function logMessage(message) {
const logContainer = document.getElementById("logContainer");
const p = document.createElement("p");
p.textContent = message;
logContainer.appendChild(p);
}
async function init(peerConnection) {
logMessage("� Continuing WebRTC setup with existing peer connection...");
// Set up to play remote audio from the model.
const audioElement = document.createElement('audio');
audioElement.autoplay = true;
document.body.appendChild(audioElement);
logMessage("🔊 Audio element created and added to page");
peerConnection.ontrack = (event) => {
logMessage("🎵 Remote track received! Type: " + event.track.kind);
logMessage("📊 Number of streams: " + event.streams.length);
if (event.streams.length > 0) {
audioElement.srcObject = event.streams[0];
logMessage("✅ Audio stream assigned to audio element");
// Add event listeners to audio element for debugging
audioElement.onloadstart = () => logMessage("🔄 Audio loading started");
audioElement.oncanplay = () => logMessage("▶️ Audio can start playing");
audioElement.onplay = () => logMessage("🎵 Audio playback started");
audioElement.onerror = (e) => logMessage("❌ Audio error: " + e.message);
} else {
logMessage("⚠️ No streams in track event");
}
};
const dataChannel = peerConnection.createDataChannel('realtime-channel');
logMessage("📡 Data channel created");
dataChannel.addEventListener('open', () => {
logMessage('✅ Data channel is open - ready to send messages');
// Send client events to start the conversation
logMessage("📝 Preparing to send text input message...");
const event = {
type: "conversation.item.create",
item: {
type: "message",
role: "user",
content: [
{
type: "input_text",
text: "hello there! Can you give me some vacation options?",
},
],
},
};
logMessage("📤 Sending conversation.item.create event...");
logMessage("💬 Text content: " + event.item.content[0].text);
try {
dataChannel.send(JSON.stringify(event));
logMessage("✅ Text input sent successfully!");
// Now send response.create to trigger the AI response
const responseEvent = {
type: "response.create"
};
logMessage("📤 Sending response.create event to trigger AI response...");
dataChannel.send(JSON.stringify(responseEvent));
logMessage("✅ Response.create sent successfully!");
} catch (error) {
logMessage("❌ Failed to send text input: " + error.message);
}
});
dataChannel.addEventListener('close', () => {
logMessage('❌ Data channel is closed');
});
dataChannel.addEventListener('error', (error) => {
logMessage('❌ Data channel error: ' + error);
}); // Add connection state logging
peerConnection.onconnectionstatechange = () => {
logMessage("� Connection state: " + peerConnection.connectionState);
};
peerConnection.oniceconnectionstatechange = () => {
logMessage("🧊 ICE connection state: " + peerConnection.iceConnectionState);
};
// Add close session button
const button = document.createElement('button');
button.innerText = 'Close Session';
button.onclick = () => stopSession(dataChannel, peerConnection);
document.body.appendChild(button);
logMessage("🔴 Close session button added");
function stopSession(dataChannel, peerConnection) {
if (dataChannel) dataChannel.close();
if (peerConnection) peerConnection.close();
logMessage("Session closed.");
}
}
function logMessage(message) {
const logContainer = document.getElementById("logContainer");
const p = document.createElement("p");
p.textContent = message;
logContainer.appendChild(p);
}
</script>
</body>
</html>
Conteúdo relacionado
- Experimente o início rápido de áudio em tempo real
- Consulte a referência de API em tempo real
- Saiba mais sobre as cotas e limites do OpenAI do Azure