Microsoft AI updates 2024 - Q4

Within this blog, I want to give an overview of all the feature in Q4 2024 that becomes available in General Availability, Technical Preview or End of Support by Microsoft. This information can be found at Microsoft Azure AI Blog and/or What's new in Azure OpenAI Service.


Features that are now supported by Microsoft (GA):

  • [General available] - o1 reasoning model released for limited access
    The latest o1 model is now available for API access and model deployment. Registration is required, and access will be granted based on Microsoft's eligibility criteria. Customers who previously applied and received access to o1-preview, don't need to reapply as they are automatically on the wait-list for the latest model. Click here to learn more.
  • [General available] - o1 Region availability
    o1 (Version: 2024-12-17) is now available in East US2 (Global Standard) and Sweden Central (Global Standard).
  • [General available] - GPT-4o 2024-11-20
    gpt-4o-2024-11-20 is now available for global standard deployment in East US, East US 2, North Central US, South Central US, West US, West US 3, Sweden Central.
  • [General available] - NEW data zone provisioned deployment type
    Data zone provisioned deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure infrastructure within Microsoft specified data zones. Data zone provisioned deployments are supported on gpt-4o-2024-08-06, gpt-4o-2024-05-13, gpt-4o-mini-2024-07-18, gpt-4o-2024-08-06, gpt-4o-2024-05-13, and gpt-4o-mini-2024-07-18 models. Click here to learn more.
  • [General available] - Vision Fine-tuning GA
    Vision fine-tuning with GPT-4o (2024-08-06) in now Generally Available (GA). Vision fine-tuning allows you to add images to your JSONL training data. Just as you can send one or many image inputs to chat completions, you can include those same message types within your training data. Images can be provided either as URLs or as base64 encoded images. Click here to learn more.
  • [General available] - NEW AI abuse monitoring
    Microsoft is introducing new forms of abuse monitoring that leverage LLMs to improve efficiency of detection of potentially abusive use of the Azure OpenAI service and to enable abuse monitoring without the need for human review of prompts and completions. Prompts and completions that are flagged through content classification and/or identified to be part of a potentially abusive pattern of use are subjected to an additional review process to help confirm the system’s analysis and inform actioning decisions. Our abuse monitoring systems have been expanded to enable review by LLM by default and by humans when necessary and appropriate. Click here to learn more.
  • [General available] - Global batch support updates
    Global batch now supports GPT-4o (2024-08-06). Click here to learn more.
  • [General available] - Global Batch GA
    The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at 50% less cost than global standard. With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads. Key use cases include:
    • Large-Scale Data Processing: Quickly analyze extensive datasets in parallel;
    • Content Generation: Create large volumes of text, such as product descriptions or articles;
    • Document Review and Summarization: Automate the review and summarization of lengthy documents;
    • Customer Support Automation: Handle numerous queries simultaneously for faster responses;
    • Data Extraction and Analysis: Extract and analyze information from vast amounts of unstructured data;
    • Natural Language Processing (NLP) Tasks: Perform tasks like sentiment analysis or translation on large datasets;
    • Marketing and Personalization: Generate personalized content and recommendations at scale.
      Click here to learn more.

Features are not yet supported by Microsoft (GA)

  • [Public Preview] - Preference fine-tuning (preview)
    Direct preference optimization (DPO) is a new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike reinforcement learning from human feedback (RLHF), DPO does not require fitting a reward model and uses simpler data (binary preferences) for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. DPO is especially useful in scenarios where subjective elements like tone, style, or specific content preferences are important. Click here to learn more.
  • [Public Preview] - Stored completions & distillation
    Stored completions allow you to capture the conversation history from chat completions sessions to use as datasets for evaluations and fine-tuning.
  • [Public Preview] - o1-preview and o1-mini models limited access
    The o1-preview and o1-mini models are now available for API access and model deployment. Registration is required, and access will be granted based on Microsoft's eligibility criteria. Customers who were already approved and have access to the model through the early access playground don't need to apply again, you'll automatically be granted API access. Once access has been granted, you'll need to create a deployment for each model. API support: Support for the o1 series models was added in API version 2024-09-01-preview. The max_tokens parameter has been deprecated and replaced with the new max_completion_tokens parameter. o1 series models will only work with the max_completion_tokens parameter.
  • [Public Preview] - New GPT-4o Realtime API for speech and audio public preview
    Azure OpenAI GPT-4o audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o audio realtime API is designed to handle real-time, low-latency conversational interactions, making it a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators. The gpt-4o-realtime-preview model is available for global deployments in East US 2 and Sweden Central regions. Click here to learn more.

Features that are retired

  • [Retired] - none