Azure.AI.VoiceLive Namespace

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Classes

AnimationOptions	Configuration for animation outputs including blendshapes and visemes metadata.
AssistantMessageItem	The AssistantMessageItem.
AudioEchoCancellation	Echo cancellation configuration for server-side audio processing.
AudioInputTranscriptionOptions	Configuration for input audio transcription.
AudioNoiseReduction	Configuration for input audio noise reduction.
AvatarConfiguration	Configuration for avatar streaming and behavior during the session.
AzureAIVoiceLiveContext	Context class which will be filled in by the System.ClientModel.SourceGeneration. For more information https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/core/System.ClientModel/src/docs/ModelReaderWriterContext.md
AzureCustomVoice	Azure custom voice configuration.
AzurePersonalVoice	Azure personal voice configuration.
AzureSemanticEouDetection	Azure semantic end-of-utterance detection (default).
AzureSemanticEouDetectionEn	Azure semantic end-of-utterance detection (default).
AzureSemanticEouDetectionMultilingual	Azure semantic end-of-utterance detection (default).
AzureSemanticVadTurnDetection	Base model for VAD-based turn detection.
AzureSemanticVadTurnDetectionEn	Base model for VAD-based turn detection.
AzureSemanticVadTurnDetectionMultilingual	Base model for VAD-based turn detection.
AzureStandardVoice	Azure standard voice configuration.
AzureVoice	Base for Azure voice configurations. Please note this is the abstract base class. The derived classes available for instantiation are: AzureCustomVoice, AzureStandardVoice, and AzurePersonalVoice.
CachedTokenDetails	Details of output token usage.
ConversationRequestItem	Base for any response item; discriminated by `type`. Please note this is the abstract base class. The derived classes available for instantiation are: MessageItem, FunctionCallItem, and FunctionCallOutputItem.
EouDetection	Top-level union for end-of-utterance (EOU) semantic detection configuration. Please note this is the abstract base class. The derived classes available for instantiation are: AzureSemanticEouDetection, AzureSemanticEouDetectionEn, and AzureSemanticEouDetectionMultilingual.
FunctionCallItem	A function call item within a conversation.
FunctionCallOutputItem	A function call output item within a conversation.
IceServer	ICE server configuration for WebRTC connection negotiation.
InputAudioContentPart	Input audio content part.
InputTextContentPart	Input text content part.
InputTokenDetails	Details of input token usage.
LogProbProperties	A single log probability entry for a token.
MaxResponseOutputTokensOption
MessageContentPart	Base for any message content part; discriminated by `type`. Please note this is the abstract base class. The derived classes available for instantiation are: InputTextContentPart, InputAudioContentPart, and OutputTextContentPart.
MessageItem	A message item within a conversation.
NoTurnDetection	Disables turn detection.
OpenAIVoice	OpenAI voice configuration with explicit type field. This provides a unified interface for OpenAI voices, complementing the existing string-based OAIVoice for backward compatibility.
OutputTextContentPart	Output text content part.
OutputTokenDetails	Details of output token usage.
RequestAudioContentPart	An audio content part for a request.
RequestTextContentPart	A text content part for a request.
ResponseAudioContentPart	An audio content part for a response.
ResponseCancelledDetails	Details for a cancelled response.
ResponseFailedDetails	Details for a failed response.
ResponseFunctionCallItem	A function call item within a conversation.
ResponseFunctionCallOutputItem	A function call output item within a conversation.
ResponseIncompleteDetails	Details for an incomplete response.
ResponseStatusDetails	Base for all non-success response details. Please note this is the abstract base class. The derived classes available for instantiation are: ResponseCancelledDetails, ResponseIncompleteDetails, and ResponseFailedDetails.
ResponseTextContentPart	A text content part for a response.
ResponseTokenStatistics	Overall usage statistics for a response.
ServerVadTurnDetection	Base model for VAD-based turn detection.
SessionResponse	The response resource.
SessionResponseItem	Base for any response item; discriminated by `type`. Please note this is the abstract base class. The derived classes available for instantiation are: SessionResponseMessageItem, ResponseFunctionCallItem, and ResponseFunctionCallOutputItem.
SessionResponseMessageItem	Base type for message item within a conversation.
SessionUpdate	A voicelive server event. Please note this is the abstract base class. The derived classes available for instantiation are: SessionUpdateError, SessionUpdateSessionCreated, SessionUpdateSessionUpdated, SessionUpdateAvatarConnecting, SessionUpdateInputAudioBufferCommitted, SessionUpdateInputAudioBufferCleared, SessionUpdateInputAudioBufferSpeechStarted, SessionUpdateInputAudioBufferSpeechStopped, SessionUpdateConversationItemCreated, SessionUpdateConversationItemInputAudioTranscriptionCompleted, SessionUpdateConversationItemInputAudioTranscriptionFailed, SessionUpdateConversationItemTruncated, SessionUpdateConversationItemDeleted, SessionUpdateResponseCreated, SessionUpdateResponseDone, SessionUpdateResponseOutputItemAdded, SessionUpdateResponseOutputItemDone, SessionUpdateResponseContentPartAdded, SessionUpdateResponseContentPartDone, SessionUpdateResponseTextDelta, SessionUpdateResponseTextDone, SessionUpdateResponseAudioTranscriptDelta, SessionUpdateResponseAudioTranscriptDone, SessionUpdateResponseAudioDelta, SessionUpdateResponseAudioDone, SessionUpdateResponseAnimationBlendshapeDelta, SessionUpdateResponseAnimationBlendshapeDone, SessionUpdateResponseAudioTimestampDelta, SessionUpdateResponseAudioTimestampDone, SessionUpdateResponseAnimationVisemeDelta, SessionUpdateResponseAnimationVisemeDone, SessionUpdateConversationItemInputAudioTranscriptionDelta, SessionUpdateConversationItemRetrieved, SessionUpdateResponseFunctionCallArgumentsDelta, and SessionUpdateResponseFunctionCallArgumentsDone.
SessionUpdateAvatarConnecting	Sent when the server is in the process of establishing an avatar media connection and provides its SDP answer.
SessionUpdateConversationItemCreated	Returned when a conversation item is created. There are several scenarios that produce this event: The server is generating a Response, which if successful will produce either one or two Items, which will be of type `message` (role `assistant`) or type `function_call`. The input audio buffer has been committed, either by the client or the server (in `server_vad` mode). The server will take the content of the input audio buffer and add it to a new user message Item. The client has sent a `conversation.item.create` event to add a new Item to the Conversation.
SessionUpdateConversationItemDeleted	Returned when an item in the conversation is deleted by the client with a `conversation.item.delete` event. This event is used to synchronize the server's understanding of the conversation history with the client's view.
SessionUpdateConversationItemInputAudioTranscriptionCompleted	This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in `server_vad` mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events. VoiceLive API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model. The transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide.
SessionUpdateConversationItemInputAudioTranscriptionDelta	Returned when the text value of an input audio transcription content part is updated.
SessionUpdateConversationItemInputAudioTranscriptionFailed	Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other `error` events so that the client can identify the related Item.
SessionUpdateConversationItemRetrieved	Returned when a conversation item is retrieved with `conversation.item.retrieve`.
SessionUpdateConversationItemTruncated	Returned when an earlier assistant audio message item is truncated by the client with a `conversation.item.truncate` event.
SessionUpdateError	Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.
SessionUpdateErrorDetails	Details of the error.
SessionUpdateInputAudioBufferCleared	Returned when the input audio buffer is cleared by the client with a `input_audio_buffer.clear` event.
SessionUpdateInputAudioBufferCommitted	Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The `item_id` property is the ID of the user message item that will be created, thus a `conversation.item.created` event will also be sent to the client.
SessionUpdateInputAudioBufferSpeechStarted	The SessionUpdateInputAudioBufferSpeechStarted.
SessionUpdateInputAudioBufferSpeechStopped	The SessionUpdateInputAudioBufferSpeechStopped.
SessionUpdateResponseAnimationBlendshapeDelta	Represents a delta update of blendshape animation frames for a specific output of a response.
SessionUpdateResponseAnimationBlendshapeDone	Indicates the completion of blendshape animation processing for a specific output of a response.
SessionUpdateResponseAnimationVisemeDelta	Represents a viseme ID delta update for animation based on audio.
SessionUpdateResponseAnimationVisemeDone	Indicates completion of viseme animation delivery for a response.
SessionUpdateResponseAudioDelta	Returned when the model-generated audio is updated.
SessionUpdateResponseAudioDone	Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.
SessionUpdateResponseAudioTimestampDelta	Represents a word-level audio timestamp delta for a response.
SessionUpdateResponseAudioTimestampDone	Indicates completion of audio timestamp delivery for a response.
SessionUpdateResponseAudioTranscriptDelta	Returned when the model-generated transcription of audio output is updated.
SessionUpdateResponseAudioTranscriptDone	Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
SessionUpdateResponseContentPartAdded	Returned when a new content part is added to an assistant message item during response generation.
SessionUpdateResponseContentPartDone	Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.
SessionUpdateResponseCreated	Returned when a new Response is created. The first event of response creation, where the response is in an initial state of `in_progress`.
SessionUpdateResponseDone	Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the `response.done` event will include all output Items in the Response but will omit the raw audio data.
SessionUpdateResponseFunctionCallArgumentsDelta	Returned when the model-generated function call arguments are updated.
SessionUpdateResponseFunctionCallArgumentsDone	Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
SessionUpdateResponseOutputItemAdded	Returned when a new Item is created during Response generation.
SessionUpdateResponseOutputItemDone	Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
SessionUpdateResponseTextDelta	Returned when the text value of a "text" content part is updated.
SessionUpdateResponseTextDone	Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
SessionUpdateSessionCreated	Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.
SessionUpdateSessionUpdated	Returned when a session is updated with a `session.update` event, unless there is an error.
SystemMessageItem	The SystemMessageItem.
ToolChoiceOption	Represents constraints placed on tool calls made by the model.
TurnDetection	Top-level union for turn detection configuration. Please note this is the abstract base class. The derived classes available for instantiation are: ServerVadTurnDetection, AzureSemanticVadTurnDetection, AzureSemanticVadTurnDetectionEn, and AzureSemanticVadTurnDetectionMultilingual.
UserMessageItem	The UserMessageItem.
VideoBackground	Defines a video background, either a solid color or an image URL (mutually exclusive).
VideoCrop	Defines a video crop rectangle.
VideoParams	Video streaming parameters for avatar.
VideoResolution	Resolution of the video feed in pixels.
VoiceLiveClient	The VoiceLiveClient.
VoiceLiveClientOptions	Client options for VoiceLiveClient.
VoiceLiveContentPart	Base for any content part; discriminated by `type`. Please note this is the abstract base class. The derived classes available for instantiation are: RequestTextContentPart, RequestAudioContentPart, ResponseTextContentPart, and ResponseAudioContentPart.
VoiceLiveErrorDetails	Error object returned in case of API failure.
VoiceLiveFunctionDefinition	The definition of a function tool as used by the voicelive endpoint.
VoiceLiveModelFactory	A factory class for creating instances of the models for mocking.
VoiceLiveResponse	The response resource.
VoiceLiveSession	Represents a WebSocket-based session for real-time voice communication with the Azure VoiceLive service.
VoiceLiveSessionOptions	The VoiceLiveRequestSession.
VoiceLiveSessionResponse	Base for session configuration in the response.
VoiceLiveToolDefinition	The base representation of a voicelive tool definition. Please note this is the abstract base class. The derived classes available for instantiation are: VoiceLiveFunctionDefinition.
VoiceProvider	Base interface for the different voice types supported by the VoiceLive service

Structs

AnimationOutputType	Specifies the types of animation data to output.
AudioInputTranscriptionOptionsModel
AudioNoiseReductionType
AudioTimestampType	Output timestamp types supported in audio response content.
EouThresholdLevel	Threshold level settings for Azure semantic end-of-utterance detection.
InputAudioFormat	Input audio format types supported.
InteractionModality	Supported modalities for the session.
ItemParamStatus	Indicates the processing status of an item or parameter.
OAIVoice	Supported OpenAI voice names (string enum).
OutputAudioFormat	Output audio format types supported.
PersonalVoiceModels	PersonalVoice models.
ResponseCancelledDetailsReason
ResponseIncompleteDetailsReason
ResponseMessageRole
SessionResponseItemStatus	Indicates the processing status of a response item.
SessionResponseStatus	Terminal status of a response.
ToolChoiceLiteral	The available set of mode-level, string literal tool_choice options for the voicelive endpoint.

Enums

SessionUpdateModality
VoiceLiveClientOptions.ServiceVersion	The version of the service to use.

Feedback

Was this page helpful?