Microsoft AI updates 2024 - Q3

Within this blog, I want to give an overview of all the feature in Q3 2024 that becomes available in General Availability, Technical Preview or End of Support by Microsoft. This information can be found at Microsoft Azure AI Blog and What's new in Azure OpenAI Service.


Features that are now supported by Microsoft (GA):

  • [General available] GPT-4o mini regional availability
    For standard and global standard deployment in the East US and Sweden Central regions.
    For global batch deployment in East US, Sweden Central, and West US regions.

  • [General available] Global batch deployments are now available
    The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at 50% less cost than global standard. With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads. Key use cases include:

    • Large-Scale Data Processing: Quickly analyze extensive datasets in parallel;
    • Content Generation: Create large volumes of text, such as product descriptions or articles;
    • Document Review and Summarization: Automate the review and summarization of lengthy documents;
    • Customer Support Automation: Handle numerous queries simultaneously for faster responses;
    • Data Extraction and Analysis: Extract and analyze information from vast amounts of unstructured data;
    • Natural Language Processing (NLP) Tasks: Perform tasks like sentiment analysis or translation on large datasets;
    • Marketing and Personalization: Generate personalized content and recommendations at scale.
  • [General available] New Responsible AI default content filtering policy
    Azure OpenAI Service includes a content filtering system that works alongside core models, including DALL-E image generation models. This system works by running both the prompt and completion through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Variations in API configurations and application design might affect completions and thus filtering behavior. The new default content filtering policy DefaultV2 delivers the latest safety and security mitigations for the GPT model series (text), including:

    • Prompt Shields for jailbreak attacks on user prompts (filter);
    • Protected material detection for text (filter) on model completion;
    • Protected material detection for code (annotate) on model completions.

    While there are no changes to content filters for existing resources and deployments (default or custom content filtering configurations remain unchanged), new resources and GPT deployments will automatically inherit the new content filtering policy DefaultV2. Customers have the option to switch between safety defaults and create custom content filtering configurations. Refer to our Default safety policy documentation for more information. Click here to learn more about the "Default content safety policy".

  • [General available] GPT-4o mini model available for deployment
    GPT-4o mini is the latest Azure OpenAI model first announced on July 18, 2024. "GPT-4o mini allows customers to deliver stunning applications at a lower cost with blazing speed. GPT-4o mini is significantly smarter than GPT-3.5 Turbo—scoring 82% on Measuring Massive Multitask Language Understanding (MMLU) compared to 70%—and is more than 60% cheaper.1 The model delivers an expanded 128K context window and integrates the improved multilingual capabilities of GPT-4o, bringing greater quality to languages from around the world." The model is currently available for both standard and global standard deployment in the East US region. For information on model quota, consult the quota and limits page and for the latest info on model availability refer to the models page.

  • [General available] Expansion of regions available for global standard deployments of gpt-4o
    GPT-4o is now available for global standard deployments in:

    • australiaeast;
    • brazilsouth;
    • canadaeast;
    • eastus;
    • eastus2;
    • francecentral git;
    • germanywestcentral;
    • japaneast;
    • koreacentral;
    • northcentralus;
    • norwayeast;
    • polandcentral;
    • southafricanorth;
    • southcentralus;
    • southindia;
    • swedencentral;
    • switzerlandnorth;
    • uksouth;
    • westeurope;
    • westus;
    • westus3.

Features are not yet supported by Microsoft (GA)

  • [Public Preview] Expanded GenAI Gateway capabilities in Azure API Management
    Microsoft is excited to announce new enhancements to our GenAI Gateway capabilities, specifically designed for large language model (LLM) use cases. Building on our initial release in May 2024, Microsoft is introducing new policies to support a wider range of LLMs via the Azure AI Model Inference API. These new policies offer the same robust functionality as our initial offerings but are now compatible with a broader array of models available in Azure AI Studio.
    Key Highlights of the New GenAI Policies:
    • LLM Token Limit Policy (Preview): This policy allows you to define and enforce token limits for interactions with large language models, helping manage resource usage and control costs. It automatically blocks requests that exceed the set token limit, preventing overuse and ensuring fair usage across applications.
    • LLM Emit Token Metric Policy (Preview): Gain detailed insights into token consumption with this policy, which emits metrics in real-time. It provides valuable information on token usage patterns, aiding in cost management by allowing you to attribute costs to different teams, departments, or applications.
    • LLM Semantic Caching Policy (Preview): Designed to enhance efficiency and reduce costs, this policy caches responses based on the semantic content of prompts. By reducing redundant model inferences and lowering latency, it optimizes resource utilization and speeds up response times for frequently requested queries.
      These enhancements ensure efficient, cost-effective, and powerful LLM usage, allowing you to take full advantage of the models available in Azure AI. With seamless integration and enhanced monitoring capabilities, Azure API Management continues to empower your intelligent applications with advanced AI functionalities. Click here to learn more. Start exploring these new policies today and elevate your application development with Azure API Management!
  • [Public Preview] Latest GPT-4o model available in the early access playground (preview)
    On August 6, 2024, OpenAI announced the latest version of their flagship GPT-4o model version 2024-08-06. It has all the capabilities of the previous version as well as an "enhanced ability to support complex structured outputs" and "Max output tokens have been increased from 4,096 to 16,384".
  • [Public Preview] GPT-4o mini is now available for fine-tuning
    GPT-4o mini fine-tuning is now available in public preview in Sweden Central and in North Central US. Click here to learn more.
  • [Public Preview] Assistants File Search tool is now billed
    The file search tool for Assistants now has additional charges for usage. File Search augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. OpenAI automatically parses and chunks your documents, creates and stores the embeddings, and use both vector and keyword search to retrieve relevant content to answer user queries. Click here to learn more. See the pricing page for more information.

Features that are retired

  • [Retired] Intent Recognition an Azure AI Speech feature will be retired on September 30th, 2025
    Azure AI Services is restructuring its Speech service features, prioritizing new innovations in line with the introduction of large language models. This shift has changed the direction of speech interactions for Voice Commanding scenarios. For these reasons, Intent Recognition will be deprecated on September 30th, 2025. Intent Recognition (IR) capability in the Speech Service will be supported until September 30th, 2025, however, The Intent Recognition API will be removed from new versions of the Speech SDK after November 30th, 2024.
    Required action
    Customers can use Speech Recognition with  Conversational Language Understanding (CLU) API directly to classify intents and entities of recognized utterances.
  • [Retired] Azure AI Speaker Recognition
    Beginning on September 30, 2025, Azure AI Speaker Recognition will be retired, and your applications will no longer be able to access its APIs. To continue providing speaker verification and/or speaker identification capabilities in your applications, Microsoft requests you to migrate to other solutions available in the market. Microsoft encourage you to explore Nuance Gatekeeper as one of the alternative solutions. Microsoft recommends evaluating and migrating to another solution as early as possible to allow sufficient time for transitioning enrolled profiles and tuning your applications based on the new solution’s accuracy & performance characteristics.
  • [Retired] Azure AI Vision APIs to be retired
    Azure AI Vision is retiring Image Analysis 4.0 Custom Image Classification, Custom Object Detection, Product Recognition and Segment (Background Removal) Preview APIs, and Spatial Analysis Edge container. On January 10th, 2025, Image Analysis 4.0 Custom Image Classification, Custom Object Detection, Product Recognition and Segment (Background Removal) Preview APIs will be retired and all requests to the service will fail. On March 30th, 2025, Spatial Analysis Edge container will be retired and all requests to the service will fail.
    Required action
    To avoid interruption for Custom Image classification and Object detection APIs, transition to Azure AI Custom Vision service by January 10th, 2025. To avoid interruption for Background Removal API and Spatial Analysis Edge container, prepare now for the API retirement by seeking alternate services.
  • [Retired] Azure AI Document Intelligence GA API v2.1 is retiring on 15 September 2027
    On 15 September 2027, the Document Intelligence GA API v2.1 will be retired and all requests to the service with that API version will fail. To ensure your models continue to operate seamlessly, you’ll need to migrate to a newer GA API version of Document Intelligence by that date. If you’re using custom models, we recommend re-training them with a newer GA API. The newer APIs offer improved model quality, extended capabilities, and additional document type coverage.
    Required action
    To avoid service disruptions, migrate to a newer GA API version of Document Intelligence by 15 September 2027.