Microsoft AI updates 2024 - Q1

Within this blog, I want to give an overview of all the feature in Q1 2024 that becomes available in General Availability, Technical Preview or End of Support by Microsoft. This information can be found at Microsoft Azure AI Blog.

Features that are now supported by Microsoft (GA):

  • [General available] On Your Data is now Generally Available in Azure OpenAI Service
    Microsoft is thrilled to announce the much-anticipated Azure OpenAI Service On Your Data is now generally available! The groundbreaking feature empowers you to leverage the power of OpenAI models, such as GPT-4, and incorporates the advanced capabilities of RAG (Retrieval Augmented Generation) model, directly on your data with enterprise-grade security on Azure. This cutting-edge capability transforms the way you connect, interact, and ground your data, with greater accuracy and speed through a user-friendly conversational experience. You can rapidly create personalized copilots with your data to enhance user comprehension, expedite task completion, and aid decision-making. Azure OpenAI Service On Your Data empowers you to unlock the full potential of your data effortlessly with enterprise-grade security. By directly running OpenAI models on your data, the requirement for extensive training is eliminated. Leveraging advanced AI capabilities such as GPT-4 allows you to streamline communication, enhance customer service, and increase productivity throughout your organization. Below is a range of general availability capabilities:
    • Enhanced Security for Enterprise: Access your private Azure resources on both Azure OpenAI Studio and APIs with private endpoints and VPN enabled for Azure AI Search, Azure OpenAI, and Azure Blob Storage (preview). With document-level access control, responses can be generated based on the documents a user can access.
    • Expanded Data Sources: Connect to your data from Azure AI Search and Azure Cosmos DB for MongoDB vCore, as well as the data sources in preview, including Azure Blob Storage, local files, URL/web address, etc. Stay tuned for more additions to the general availability list.
    • Incorporate RAG (Retrieval-Augmented Generation) for Intelligent Information Retrieval: Utilize the advanced capabilities of the RAG model to enhance the quality and relevance of information retrieval, enabling more context-aware and precise responses in your conversational experiences.
    • Customizable Responses and Parameters: Tailor your chat experience by limiting response and custom parameters, such as the strictness and number of documents retrieved.
    • Azure AI Vector and Hybrid Search: Achieve more precise data retrieval with vector or hybrid search from Azure AI Search, refining the insights you gather from your data.
    • Semantic ranker enabled as a Default: Enjoy re-ranked, prioritized results with semantic ranker from Azure AI Search, enabled by default.
    • OpenAI Models Availability: Get access to OpenAI GPT-35-Turbo, GPT-35-Turbo-16k, GPT-4, GPT-4-32k models.
    • Private Endpoints & VPNs.
    • Document-level Access Control: Boost security by limiting access to documents based on Microsoft Entra ID when generating responses.
    • Effortless and Swift Deployment: Seamless and rapid deployment to a web application or a copilot in the Copilot Studio (preview).
    • Search Filter (API): Customize your searches and add context with the retrieval augmented generation (RAG) model on specific parts of our API.
    • Updated SDK: Streamline integration with your systems using our improved SDK, harnessing the power of Azure OpenAI On Your Data.
    • Supported File Types: .txt., .pdf, .docx, .pptx, .md, .html
      Click here to learn more.
  • [General available] Fine-tuning: New model support, new capabilities, and lower prices
    Since we announced Azure OpenAI Service fine-tuning for OpenAI’s Babbage-002, Davinci-002 and GPT-35-Turbo on October 16, 2023, we’ve enabled AI builders to build custom models. Today we’re releasing fine-tuning support for OpenAI’s GPT-35-Turbo 1106, a next gen GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Fine-tuning with GPT-35-Turbo 1106 supports 16k context length in training data, allowing you to fine-tune with longer messages and generate longer and more coherent texts. In addition, we are introducing two new features to enable you to create more complex custom models and easily update them. First, we are launching support for fine-tuning with function calling that enables you to teach your custom model when to make function calls and improve the accuracy and consistency of the responses. Second, we are launching support for continuous fine-tuning, which allows you to train a previously fine-tuned model with new data, without losing the previous knowledge and performance of the model. This lets you add additional training data to an existing custom model without starting from scratch and lets you experiment more iteratively. Besides new model support and features, we are making it more affordable for you to train and host your fine-tuned models on Azure OpenAI Service, including decreasing the cost of training and hosting GPT-35-Turbo by 50%. Click here to learn more.
  • [General available] Azure AI Speech launches new zero-shot TTS models for Personal Voice
    Today Microsoft is thrilled to announce that Azure AI Speech Service has upgraded its Personal Voice feature with new zero-shot TTS (text-to-speech) models. Compared to the initial model, these new models improve the naturalness of synthesized voices and better resemble the speech characteristics of the voice in the prompt. The Personal Voice capability in Azure AI Speech Service allows customers to create personalized synthetic voices for their users based on their unique speech characteristics. With Personal Voice, users can get AI replicating their voice in a few seconds by providing just a short speech sample as the audio prompt, and then use it to generate speech in any of the 100 languages supported. This feature can be used for various use cases, such as personalizing voice experience for a chatbot, or dubbing video content in different languages with the actor’s native voice. Click here to learn more. To get started, register your use case here and apply for the access.
  • [General available] AI Central
    AI Central lets you build configurable, extensible Pipelines allowing you to govern and observe access to your AI service. AI Central is an extensible smart reverse proxy for Azure Open AI and Open AI services. Out of the box it provides the following:
    • Consumer local rate limiting;
    • Endpoint local rate limiting and circuit breakers;
    • Randomized endpoint selection from a cluster of AI services;
    • Prioritized endpoint selector from a priority cluster, to a fallback cluster;
    • Bulkhead to hold and throttle load to a cluster of servers;
    • Consumer Entra JWT auth (using Microsoft.Identity) with Role Authorisation;
    • Consumer Entra JWT pass-thru;
    • Client Key auth;
    • Prompt / Token usage logging to Azure Monitor (including Streaming Endpoints);
    • Open Telemetry metrics;
      The Github Repository has some good examples - https://github.com/microsoft/AICentral for a Quick Start, and https://github.com/microsoft/AICentral/blob/main/docs/configuration.md for some more complex examples. Click here to learn more.

Features are not yet supported by Microsoft (GA)

  • [Public Preview] Azure AI Translator announces Synchronous Document Translation
    This new synchronous operation allows users to translate a document in real time into a target language. Document translation enables users to translate complex documents in a variety of file formats including Text, HTML, Word, Excel, PowerPoint, and Outlook messages whilst preserving the source document’s format and layout. The service autodetects the language of the text in the source document if it is unknown to the user. In addition, the user in the request can optionally send a glossary of terms to apply when translating the document. Click here to learn more.
  • [Public Preview] Native Document support for PII Redaction and Summarization
    Customers exploring AI capabilities on documents have to go through pre- and post-processing efforts. For tasks such as Personally Identifiable Information (PII) redaction, summarization and more, they have to provide documents as input data, often have to crack it open, format it and then recreate the document. This is time-consuming, expensive, and inconvenient. To alleviate this challenge, we are delighted to announce the availability of native document support in Azure AI Language. This is available in public preview with limited access (apply for access):
    • PII Redaction - This enhancement allows for the identification, categorization, and redaction of sensitive information directly from complex documents. It streamlines data privacy and compliance, and reinforces security, offering users tangible benefits.
    • REST API - PII Redaction. The REST API for PII redaction requires parameters ‘source location’ (URL of the input document’s location), ‘target location’ (URL of the target container’s location) and ‘language’ (the language of content in the source document).
    • REST API - Summarization. The REST API for summarization requires parameters ‘source location’ (URL of the input document’s location), ‘target location’ (URL of the target container’s location) and ‘kind’ (type of summarization required).
      Click here to learn more.
  • [Public Preview] OpenAI text-to-speech voices on Azure OpenAI Service and Azure AI Speech
    At OpenAI DevDay on November 6th 2023, OpenAI announced a new text-to-speech (TTS) model that offers 6 preset voices to choose from, in their standard format as well as their respective high-definition (HD) equivalents. Today, we are excited to announce that we are bringing those models in preview to Azure. Developers can now access OpenAI's TTS voices through Azure OpenAI and Azure AI Speech services. Each of the 6 voices has its own personality and style. The standard voices models are optimized for real-time use cases, and the HD equivalents are optimized for quality. These new TTS voices augment capabilities, such as building custom voices and avatars, already available in Azure AI and allow customers to build entirely new experiences across customer support, training videos, live-streaming and more. This capability allows developers to give human-like voices to chatbots, audiobook or article narration, translation across multiple languages, content creation for games and offers much-needed assistance to the visually impaired. The new voices will support a wide range of languages from Afrikaans to Welsh, and the service can cater to diverse linguistic needs. For a complete list of supported languages, please follow this link. In addition to making these voices available in Azure OpenAI Service, customers will also find them in the Azure AI Speech with the added support for Speech Synthesis Markup Language (SSML) SDK. Click here to learn more.
  • [Public Preview] New models and model updates
    The following models and model updates are coming this month to Azure OpenAI Service. You can review the latest model availability here.
    • Updated GPT-4 Turbo preview and GPT-3.5 Turbo models; We are rolling out an updated GPT-4 Turbo preview model, gpt-4-0125-preview, with improvements in tasks such as code generation and reduced cases of “laziness” where the model doesn’t complete a task. The new model fixes a bug impacting non-English UTF-8 generations. Post-launch, we’ll begin updating Azure OpenAI deployments that use GPT-4 version 1106-preview to use version 0125-preview. The update will start two weeks after the launch date and complete within a week. Because version 0125-preview offers improved capabilities, customers may notice some changes in the model behavior and compatibility after the upgrade. GPT-4-0125-preview is now live in East US, North Central US, and South Central US. Pricing for gpt-4-0125-preview will be same as pricing for gpt-4-1106-preview. In addition to the updated GPT-4 Turbo, we will also be launching gpt-3.5-turbo-0125, a new GPT-3.5 Turbo model with improved pricing and higher accuracy at responding in various formats. We will reduce input prices for the new model by 50% to $0.0005 /1K tokens and output prices by 25% to $0.0015 /1K tokens.
      • New Text-to-Speech (TTS) models; Microsofts new text-to-speech model generates human-quality speech from text in six preset voices, each with its own personality and style. The two model variants include tts-1, the standard voices model variant, which is optimized for real-time use cases, and tts-1-hd, the high-definition (HD) equivalent, which is optimized for quality. This new includes capabilities such as building custom voices and avatars already available in Azure AI and enables customers to build entirely new experiences across customer support, training videos, live-streaming and more. Developers can now access these voices through both services, Azure OpenAI Service and Azure AI Speech.
      • A new generation of embeddings models with lower pricing. Azure OpenAI Service customers have been incorporating embeddings models in their applications to personalize, recommend and search content. We are excited to announce a new generation of embeddings models that are significantly more capable and meet a variety of customer needs. These models will be available later this month.
        • Text-embedding-3-small is a new smaller and highly efficient embeddings model that provides stronger performance compared to its predecessor text-embedding-ada-002. Given its efficiency, pricing for this model is $0.00002 per 1k tokens, a 5x price reduction compared to that of text-embedding-ada-002. We are not deprecating text-embedding-ada-002 so you can continue using the previous generation model, if needed.
        • text-embedding-3-large is our new best performing embeddings model that creates embeddings with up to 3072 dimensions. This large embeddings model is priced at $0.00013 / 1k tokens. Both embeddings models offer native support for shortening embeddings (i.e. remove numbers from the end of the sequence) without the embedding losing its concept-representing properties. This allows you to make trade-off between the performance and cost of using embeddings. Click here to learn more.
  • [Public Preview] Assistants API
    We are excited to announce, Assistants, a new feature in Azure OpenAI Service, is now available in public preview. Assistants API makes it simple for developers to create high quality copilot-like experiences within their own applications. Previously, building custom AI assistants needed heavy lifting even for experienced developers. While the chat completions API is lightweight and powerful, it is inherently stateless, which means that developers had to manage conversation state and chat threads, tool integrations, retrieval documents and indexes, and execute code manually. Assistants API, as the stateful evolution of the chat completion API, provides a solution for these challenges. Building customizable, purpose-built AI that can sift through data, suggest solutions, and automate tasks just got easier. The Assistants API supports persistent and infinitely long threads. This means that as a developer you no longer need to develop thread state management systems and work around a model’s context window constraints. Once you create a Thread, you can simply append new messages to it as users respond. Assistants can access files in several formats - either while creating an Assistant or as part of Threads. Assistants can also access multiple tools in parallel, as needed. These tools include:
    • Code Interpreter: This Azure OpenAI Service-hosted tool lets you write and run Python code in a sandboxed environment. Use cases include solving challenging code and math problems iteratively, performing advanced data analysis over user-added files in multiple formats and generating data visualization like charts and graphs.
    • Function calling: You can describe functions of your app or external APIs to your Assistant and have the model intelligently decide when to invoke those functions and incorporate the function response in its messages. Support for new features, including an improved knowledge retrieval tool, is coming soon. Assistants API is built on the same capabilities that power OpenAI’s GPT product and offers unparalleled flexibility for creating a wide range of copilot-like applications. Use cases span a wide range: AI-powered product recommender, sales analyst app, coding assistant, employee Q&A chatbot, and more. Start building on the no-code Assistants playground on get started with the API. Click here to learn more. To get started, click here.
  • [Public Preview] Assistants API
    We are excited to announce, Assistants, a new feature in Azure OpenAI Service, is now available in public preview. Assistants API makes it simple for developers to create high quality copilot-like experiences within their own applications. Previously, building custom AI assistants needed heavy lifting even for experienced developers. While the chat completions API is lightweight and powerful, it is inherently stateless, which means that developers had to manage conversation state and chat threads, tool integrations, retrieval documents and indexes, and execute code manually. Assistants API, as the stateful evolution of the chat completion API, provides a solution for these challenges. Building customizable, purpose-built AI that can sift through data, suggest solutions, and automate tasks just got easier. The Assistants API supports persistent and infinitely long threads. This means that as a developer you no longer need to develop thread state management systems and work around a model’s context window constraints. Once you create a Thread, you can simply append new messages to it as users respond. Assistants can access files in several formats - either while creating an Assistant or as part of Threads. Assistants can also access multiple tools in parallel, as needed. These tools include:
    • Code Interpreter: This Azure OpenAI Service-hosted tool lets you write and run Python code in a sandboxed environment. Use cases include solving challenging code and math problems iteratively, performing advanced data analysis over user-added files in multiple formats and generating data visualization like charts and graphs.
    • Function calling: You can describe functions of your app or external APIs to your Assistant and have the model intelligently decide when to invoke those functions and incorporate the function response in its messages. Support for new features, including an improved knowledge retrieval tool, is coming soon. Assistants API is built on the same capabilities that power OpenAI’s GPT product and offers unparalleled flexibility for creating a wide range of copilot-like applications. Use cases span a wide range: AI-powered product recommender, sales analyst app, coding assistant, employee Q&A chatbot, and more. Start building on the no-code Assistants playground on get started with the API. Click here to learn more.
  • [Public Preview] Azure AI Video Indexer's Preview of Prompt-Ready API
    Azure AI Video Indexer has a new algorithm that translates the multi-modality content understanding into an LLM’s prompt-ready format, capturing the important details and insights in the video, which then can be used as-is to create LLM prompts for various tasks, including video summarization or search. Microsoft's newest algorithm is based on advanced AI models developed in Azure AI Video Indexer. It effectively integrates all three modalities – visual, audio and text – based on the main insights from Azure AI Video Indexer, processes them and transforms them into an LLM’s prompt-ready format. The method consists of the following steps:
    • Extracting multi-modal insights: create the insights of the video and allow for a full video understanding. However, having all the insights of the video and its transcript as a prompt for an LLM is problematic. First, because of the prompt size. Second, it’s just too much information, and Microsoft need to provide the main insights and separations to the LLM in order to get good results. Therefore, Microsoft extract the essence from each insight. For example, Microsoft eliminate small OCR, filter visual labels, and more.
    • Insights’ “tags”: In order to give more context to the LLM that will ease its video understanding and combining all the insights, Microsoft create “tags” that guide the LLM on the insights’ roles within the content. These tags include labels such as [OCR], [Transcript], [Audio effects] and more.
    • Chaptering to sections: Microsoft split the video and its insights into coherent chapters, that fit both the essence of the video and the prompt size. Use scene segmentation, which is based on visual content. You can use other modalities, such as audio and text, to divide the scenes further smartly into smaller sections to work within the limitations of LLMs. Each section fits to a prompt size and contains the content of the video at that time – including the transcript, audio events (such as clapping, dog barking etc.), and visual content (objects in the video, celebrities, and more). Each part in the video is consolidated, and the matching insights are used to create each section. Microsoft determine the length of the sections, ensuring they are not too long for using them as prompts, and not too short for effective and meaningful content.
      Click here, to learn more.