Share via


Azure.AI.VoiceLive Namespace

Classes

AnimationOptions

Configuration for animation outputs including blendshapes and visemes metadata.

AssistantMessageItem

The AssistantMessageItem.

AudioEchoCancellation

Echo cancellation configuration for server-side audio processing.

AudioInputTranscriptionOptions

Configuration for input audio transcription.

AudioNoiseReduction

Configuration for input audio noise reduction.

AvatarConfiguration

Configuration for avatar streaming and behavior during the session.

AzureAIVoiceLiveContext

Context class which will be filled in by the System.ClientModel.SourceGeneration. For more information https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/core/System.ClientModel/src/docs/ModelReaderWriterContext.md

AzureCustomVoice

Azure custom voice configuration.

AzurePersonalVoice

Azure personal voice configuration.

AzureSemanticEouDetection

Azure semantic end-of-utterance detection (default).

AzureSemanticEouDetectionEn

Azure semantic end-of-utterance detection (default).

AzureSemanticEouDetectionMultilingual

Azure semantic end-of-utterance detection (default).

AzureSemanticVadTurnDetection

Base model for VAD-based turn detection.

AzureSemanticVadTurnDetectionEn

Base model for VAD-based turn detection.

AzureSemanticVadTurnDetectionMultilingual

Base model for VAD-based turn detection.

AzureStandardVoice

Azure standard voice configuration.

AzureVoice

Base for Azure voice configurations. Please note this is the abstract base class. The derived classes available for instantiation are: AzureCustomVoice, AzureStandardVoice, and AzurePersonalVoice.

CachedTokenDetails

Details of output token usage.

ConversationRequestItem

Base for any response item; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: MessageItem, FunctionCallItem, and FunctionCallOutputItem.

EouDetection

Top-level union for end-of-utterance (EOU) semantic detection configuration. Please note this is the abstract base class. The derived classes available for instantiation are: AzureSemanticEouDetection, AzureSemanticEouDetectionEn, and AzureSemanticEouDetectionMultilingual.

FunctionCallItem

A function call item within a conversation.

FunctionCallOutputItem

A function call output item within a conversation.

IceServer

ICE server configuration for WebRTC connection negotiation.

InputAudioContentPart

Input audio content part.

InputTextContentPart

Input text content part.

InputTokenDetails

Details of input token usage.

LogProbProperties

A single log probability entry for a token.

MaxResponseOutputTokensOption
MessageContentPart

Base for any message content part; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: InputTextContentPart, InputAudioContentPart, and OutputTextContentPart.

MessageItem

A message item within a conversation.

NoTurnDetection

Disables turn detection.

OpenAIVoice

OpenAI voice configuration with explicit type field.

This provides a unified interface for OpenAI voices, complementing the existing string-based OAIVoice for backward compatibility.

OutputTextContentPart

Output text content part.

OutputTokenDetails

Details of output token usage.

RequestAudioContentPart

An audio content part for a request.

RequestTextContentPart

A text content part for a request.

ResponseAudioContentPart

An audio content part for a response.

ResponseCancelledDetails

Details for a cancelled response.

ResponseFailedDetails

Details for a failed response.

ResponseFunctionCallItem

A function call item within a conversation.

ResponseFunctionCallOutputItem

A function call output item within a conversation.

ResponseIncompleteDetails

Details for an incomplete response.

ResponseStatusDetails

Base for all non-success response details. Please note this is the abstract base class. The derived classes available for instantiation are: ResponseCancelledDetails, ResponseIncompleteDetails, and ResponseFailedDetails.

ResponseTextContentPart

A text content part for a response.

ResponseTokenStatistics

Overall usage statistics for a response.

ServerVadTurnDetection

Base model for VAD-based turn detection.

SessionResponse

The response resource.

SessionResponseItem

Base for any response item; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: SessionResponseMessageItem, ResponseFunctionCallItem, and ResponseFunctionCallOutputItem.

SessionResponseMessageItem

Base type for message item within a conversation.

SessionUpdate

A voicelive server event. Please note this is the abstract base class. The derived classes available for instantiation are: SessionUpdateError, SessionUpdateSessionCreated, SessionUpdateSessionUpdated, SessionUpdateAvatarConnecting, SessionUpdateInputAudioBufferCommitted, SessionUpdateInputAudioBufferCleared, SessionUpdateInputAudioBufferSpeechStarted, SessionUpdateInputAudioBufferSpeechStopped, SessionUpdateConversationItemCreated, SessionUpdateConversationItemInputAudioTranscriptionCompleted, SessionUpdateConversationItemInputAudioTranscriptionFailed, SessionUpdateConversationItemTruncated, SessionUpdateConversationItemDeleted, SessionUpdateResponseCreated, SessionUpdateResponseDone, SessionUpdateResponseOutputItemAdded, SessionUpdateResponseOutputItemDone, SessionUpdateResponseContentPartAdded, SessionUpdateResponseContentPartDone, SessionUpdateResponseTextDelta, SessionUpdateResponseTextDone, SessionUpdateResponseAudioTranscriptDelta, SessionUpdateResponseAudioTranscriptDone, SessionUpdateResponseAudioDelta, SessionUpdateResponseAudioDone, SessionUpdateResponseAnimationBlendshapeDelta, SessionUpdateResponseAnimationBlendshapeDone, SessionUpdateResponseAudioTimestampDelta, SessionUpdateResponseAudioTimestampDone, SessionUpdateResponseAnimationVisemeDelta, SessionUpdateResponseAnimationVisemeDone, SessionUpdateConversationItemInputAudioTranscriptionDelta, SessionUpdateConversationItemRetrieved, SessionUpdateResponseFunctionCallArgumentsDelta, and SessionUpdateResponseFunctionCallArgumentsDone.

SessionUpdateAvatarConnecting

Sent when the server is in the process of establishing an avatar media connection and provides its SDP answer.

SessionUpdateConversationItemCreated

Returned when a conversation item is created. There are several scenarios that produce this event:

  • The server is generating a Response, which if successful will produce either one or two Items, which will be of type message (role assistant) or type function_call.
  • The input audio buffer has been committed, either by the client or the server (in server_vad mode). The server will take the content of the input audio buffer and add it to a new user message Item.
  • The client has sent a conversation.item.create event to add a new Item to the Conversation.
SessionUpdateConversationItemDeleted

Returned when an item in the conversation is deleted by the client with a conversation.item.delete event. This event is used to synchronize the server's understanding of the conversation history with the client's view.

SessionUpdateConversationItemInputAudioTranscriptionCompleted

This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in server_vad mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events. VoiceLive API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model. The transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide.

SessionUpdateConversationItemInputAudioTranscriptionDelta

Returned when the text value of an input audio transcription content part is updated.

SessionUpdateConversationItemInputAudioTranscriptionFailed

Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other error events so that the client can identify the related Item.

SessionUpdateConversationItemRetrieved

Returned when a conversation item is retrieved with conversation.item.retrieve.

SessionUpdateConversationItemTruncated

Returned when an earlier assistant audio message item is truncated by the client with a conversation.item.truncate event.

SessionUpdateError

Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.

SessionUpdateErrorDetails

Details of the error.

SessionUpdateInputAudioBufferCleared

Returned when the input audio buffer is cleared by the client with a input_audio_buffer.clear event.

SessionUpdateInputAudioBufferCommitted

Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The item_id property is the ID of the user message item that will be created, thus a conversation.item.created event will also be sent to the client.

SessionUpdateInputAudioBufferSpeechStarted

The SessionUpdateInputAudioBufferSpeechStarted.

SessionUpdateInputAudioBufferSpeechStopped

The SessionUpdateInputAudioBufferSpeechStopped.

SessionUpdateResponseAnimationBlendshapeDelta

Represents a delta update of blendshape animation frames for a specific output of a response.

SessionUpdateResponseAnimationBlendshapeDone

Indicates the completion of blendshape animation processing for a specific output of a response.

SessionUpdateResponseAnimationVisemeDelta

Represents a viseme ID delta update for animation based on audio.

SessionUpdateResponseAnimationVisemeDone

Indicates completion of viseme animation delivery for a response.

SessionUpdateResponseAudioDelta

Returned when the model-generated audio is updated.

SessionUpdateResponseAudioDone

Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseAudioTimestampDelta

Represents a word-level audio timestamp delta for a response.

SessionUpdateResponseAudioTimestampDone

Indicates completion of audio timestamp delivery for a response.

SessionUpdateResponseAudioTranscriptDelta

Returned when the model-generated transcription of audio output is updated.

SessionUpdateResponseAudioTranscriptDone

Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseContentPartAdded

Returned when a new content part is added to an assistant message item during response generation.

SessionUpdateResponseContentPartDone

Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseCreated

Returned when a new Response is created. The first event of response creation, where the response is in an initial state of in_progress.

SessionUpdateResponseDone

Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the response.done event will include all output Items in the Response but will omit the raw audio data.

SessionUpdateResponseFunctionCallArgumentsDelta

Returned when the model-generated function call arguments are updated.

SessionUpdateResponseFunctionCallArgumentsDone

Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseOutputItemAdded

Returned when a new Item is created during Response generation.

SessionUpdateResponseOutputItemDone

Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateResponseTextDelta

Returned when the text value of a "text" content part is updated.

SessionUpdateResponseTextDone

Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.

SessionUpdateSessionCreated

Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.

SessionUpdateSessionUpdated

Returned when a session is updated with a session.update event, unless there is an error.

SystemMessageItem

The SystemMessageItem.

ToolChoiceOption

Represents constraints placed on tool calls made by the model.

TurnDetection

Top-level union for turn detection configuration. Please note this is the abstract base class. The derived classes available for instantiation are: ServerVadTurnDetection, AzureSemanticVadTurnDetection, AzureSemanticVadTurnDetectionEn, and AzureSemanticVadTurnDetectionMultilingual.

UserMessageItem

The UserMessageItem.

VideoBackground

Defines a video background, either a solid color or an image URL (mutually exclusive).

VideoCrop

Defines a video crop rectangle.

VideoParams

Video streaming parameters for avatar.

VideoResolution

Resolution of the video feed in pixels.

VoiceLiveClient

The VoiceLiveClient.

VoiceLiveClientOptions

Client options for VoiceLiveClient.

VoiceLiveContentPart

Base for any content part; discriminated by type. Please note this is the abstract base class. The derived classes available for instantiation are: RequestTextContentPart, RequestAudioContentPart, ResponseTextContentPart, and ResponseAudioContentPart.

VoiceLiveErrorDetails

Error object returned in case of API failure.

VoiceLiveFunctionDefinition

The definition of a function tool as used by the voicelive endpoint.

VoiceLiveModelFactory

A factory class for creating instances of the models for mocking.

VoiceLiveResponse

The response resource.

VoiceLiveSession

Represents a WebSocket-based session for real-time voice communication with the Azure VoiceLive service.

VoiceLiveSessionOptions

The VoiceLiveRequestSession.

VoiceLiveSessionResponse

Base for session configuration in the response.

VoiceLiveToolDefinition

The base representation of a voicelive tool definition. Please note this is the abstract base class. The derived classes available for instantiation are: VoiceLiveFunctionDefinition.

VoiceProvider

Base interface for the different voice types supported by the VoiceLive service

Structs

AnimationOutputType

Specifies the types of animation data to output.

AudioInputTranscriptionOptionsModel
AudioNoiseReductionType
AudioTimestampType

Output timestamp types supported in audio response content.

EouThresholdLevel

Threshold level settings for Azure semantic end-of-utterance detection.

InputAudioFormat

Input audio format types supported.

InteractionModality

Supported modalities for the session.

ItemParamStatus

Indicates the processing status of an item or parameter.

OAIVoice

Supported OpenAI voice names (string enum).

OutputAudioFormat

Output audio format types supported.

PersonalVoiceModels

PersonalVoice models.

ResponseCancelledDetailsReason
ResponseIncompleteDetailsReason
ResponseMessageRole
SessionResponseItemStatus

Indicates the processing status of a response item.

SessionResponseStatus

Terminal status of a response.

ToolChoiceLiteral

The available set of mode-level, string literal tool_choice options for the voicelive endpoint.

Enums

SessionUpdateModality
VoiceLiveClientOptions.ServiceVersion

The version of the service to use.