Skip to main content

Deploying Azure OpenAI via Bicep: Key Considerations & Lab Setup

· 15 min read
Craig Dempsey
Cloud Devops Engineer @ Digital Reflections

Introduction

Azure OpenAI enables organizations to leverage powerful AI models such as GPT-4o, o3-mini, and Whisper for a variety of use cases, from chatbots to code generation and beyond. However, deploying Azure OpenAI via Infrastructure as Code (IaC) using Bicep requires careful planning to ensure a scalable, cost-effective, and secure deployment.

In this article, we’ll set up a simple lab environment to explore the key concepts behind deploying Azure OpenAI with Bicep. This will help us gain a deeper understanding of the service’s deployment types, regional availability, quota limitations, and model options. By the end, we’ll have a working Azure OpenAI deployment with a few models to experiment with helping us plan more advanced deployments in the future.

What We’ll Cover

Before diving into deployment, we’ll first explore some key areas that are important to understand when setting up Azure OpenAI:

  • Azure OpenAI Deployment Types – Understanding different deployment models.
  • Regional Availability – Where Azure OpenAI is available and why it matters.
  • Azure OpenAI Models – Overview of supported models and their capabilities.
  • Quota Management – Understanding capacity limits and how to request increases.

Once we’ve covered these fundamentals, we’ll walk through a step-by-step Bicep deployment, explaining each component and providing working code that you can use to deploy Azure OpenAI in your own environment.

Azure OpenAI Deployment Types

Azure OpenAI provides flexibility in how and where your AI models are deployed. There are two main deployment types: Standard and Provisioned. Within these, there are different data processing location options that impact latency, quota limits, and regional availability.

Standard Deployments

Standard deployments are the default option, making them ideal for setting up a lab environment. However, for production deployments, careful consideration is required to ensure alignment with business requirements, including privacy, data sovereignty, availability, latency, scalability, and cost considerations such as billing and pricing.

Standard deployments are further divided into:

  • Global Standard – The recommended starting point, this deployment type routes traffic dynamically across Azure’s global infrastructure, ensuring high availability and quick access to new models.

  • DataZone Standard – Routes traffic within a Microsoft-defined data zone, ensuring processing happens within a specific region or country grouping.

  • Azure Geography Standard – Ensures that all processing remains within the specific Azure geography where the resource is created.

Provisioned Deployments

Provisioned deployments offer dedicated capacity and are ideal for workloads requiring low latency and predictable performance. These are further divided into:

  • Global Provisioned-Managed – Offers dedicated capacity while still leveraging Azure’s global infrastructure.

  • Azure Geography Provisioned-Managed – Ensures dedicated capacity within a specific Azure geography.

Choosing the Right Deployment Type

Deployment TypeBest ForKey BenefitsConsiderations
Global StandardGeneral workloads- Fast access to models
- Easy setup
- Cheap, pay as you go for token usage
- Great for lab enviornment
- Higher latency variations at scale
- Data soverignty and privacy requirements
DataZone StandardRegional processing compliance- Regional traffic routing
- Traffic stays in regional boundaries
- Limited to Microsoft defined data zones
Azure Geography StandardRegulatory compliance- Processing within a specific geography- Lower scalability
- Could impact availability and performance
- Less of a pool of compute to choose from
Global Provisioned-ManagedLarge-scale AI workloads- Dedicated resources
- Lower latency
- Great for production workloads
- Much higher cost
- Provisioning time
Azure Geography Provisioned-ManagedStrict regulatory environments- Guaranteed regional processing
- Might meet a customers requirements around data soverignty and privacy
- Great for production workloads where availability and performance needs to be guaranteed
- Most expensive option

Understanding these deployment types helps ensure that your AI workloads meet performance, compliance, and cost requirements.

ℹ️ More information can be found in the 🔗 Azure OpenAI deployment types section of the Microsoft Learn documentation.

Supported Azure OpenAI Regions

Azure OpenAI is currently available in select regions. However, while the service itself may be available, the availability of specific models varies significantly by region. Before deploying, ensure your chosen region supports the required models.

ℹ️ You can check the latest supported regions and model availability on Microsoft’s official documentation - 🔗 Global Standard Model Availability.

Commonly Supported Regions

  • East US
  • South Central US
  • West Europe
  • France Central
  • Sweden Central
  • UK South
  • Japan East
  • Australia East

New Model Preview Regions

Newer model previews are often tested in specific regions before wider availability. If you're setting up a lab or testing environment, consider deploying in these regions for early access:

  • East US 2
  • Sweden Central
  • North America

Impact of Choosing Different Regions

By understanding regional availability, you can ensure that your Azure OpenAI deployment meets both performance and compliance needs.

FactorImpact
LatencyChoosing a region closer to your users improves response times.
Quota LimitsSome regions may have lower quotas or require quota increase requests.
ComplianceCertain regions comply with specific regulatory requirements (e.g., GDPR for EU regions).
Model AvailabilitySome models might be available first in global regions like East US before rolling out elsewhere.

ℹ️ Stay updated on Microsoft’s documentation as new regions are added frequently.

Quota Considerations

Azure OpenAI imposes quota limits on resource usage, which can affect scalability and availability. Understanding these limits and planning accordingly is crucial for ensuring smooth deployment and operation.

Types of Quotas

Quota TypeDescription
Request Rate LimitsLimits on the number of requests per minute/hour based on the selected model and region.
Token LimitsRestrictions on the number of tokens processed per request and per day.
Model-Specific LimitsCertain models may have stricter usage limits compared to others.
Regional QuotasAvailability and quotas can differ across regions, requiring careful selection.

For a detailed breakdown of quota limits per model and region, refer to 🔗 Azure OpenAI Service quotas and limits.

ℹ️ - You can view your quota via the portal via Azure Foundary.

Viewing Quota's via the Portal
Viewing Quota's via the Portal

Failure to check your available quote may result in an error like the below 😅.

alt text
Quota Error

Overview of Key Models

Below is a breakdown of the key AI models and their deployments across different Azure regions, along with their strengths and best use cases. As I am in Australia, I would like to target the Australia region where I can but then fall back to east us or sweden where needed to be able to preview new models for testing etc. Here is a list of what I have deployed for testing purposes in my lab.

ModelRegionStrengthsBest Use Cases
GPT-4oAustralia East, East US 2- High performance
- balanced cost
- Chatbots
-Content creation
- Coding assistance
o3-miniRequest access via waiting list- Enhanced reasoning abilities
- Cost effictive
- Lightweight
- Complex problem solving in science, math and coding -
O1 MiniEast US 2- Lightweight model with good performance
- advanced reasoning
Small-scale AI workloads, cost-sensitive applications
GPT-4o Mini Audio PreviewEast US 2- Low-latency speech input and output for interactive conversations- AI-powered transcription
- voice assistants
- Most suited for pre-recorded audio processing
GPT-4o Realtime PreviewEast US 2- Low-latency speech input and output for real-time interactive conversations- Fast real-time AI responses
- voice assistants
GPT-4o Mini Realtime PreviewEast US 2- Cost Effective
- Low-latency speech input and output for real-time interactive conversations
- Fast real-time AI responses
- voice assistants
DALL-E 3Australia East- Image generation- AI-powered design
- Creative applications
Text-Embedding-3 LargeEast US 2- Optimized for embeddings- Semantic search
- NLP tasks
WhisperEast US 2- Speech-to-text- Transcription
- accessibility solutions
TTS (Text-to-Speech)Sweden Central- High-quality text-to-speech conversion- AI-generated voice applications
TTS-HDSweden Central- High-definition text-to-speech- High-quality voice generation applications
- offline

This list showcases how different models are deployed across various Azure regions to take advantage of regional availability and optimized capabilities.

ℹ️ Docuementation at Microsoft Learn has a comprehensive list of 🔗 Azure OpenAI Service Models.

Choosing the Right Model

When selecting a model, it's important to balance performance, cost, and latency. Here are key considerations:

  • Performance vs. Cost – GPT-4o offers high performance but at a higher cost, while o3-mini is more budget-friendly.
  • Latency Requirements – If real-time interaction is crucial, GPT-4o Realtime models are the best choice.
  • Scalability – For high-volume applications, consider quota limits and regional availability.

Deploying an Azure OpenAI model

To ensure a structured approach, we will break the deployment down into logical steps.

Deployment Architecture Overview

Before jumping into the Bicep code, lets visualize our deployment with a simple mermaid diagram.

Bicep Deployment File

To begin, we first create the Bicep deployment file. This serves as the foundation for defining our Azure infrastructure in a repeatable and automated way.

  • The deployment is scoped at the subscription level, which allows us to create a resource group first, followed by all necessary Azure components within it.

  • This approach ensures that we have a repeatable pattern that can be extended for future deployments.

  • The deployment leverages Azure Verified Modules (AVM), which simplifies our work by using pre-built, tested modules instead of manually writing infrastructure definitions from scratch.

  • This approach enables consistency, security, and scalability without having to reinvent the wheel.

  • The deployment file is referenced by Bicep parameter files, with each parameter file representing a specific region and the models that will be deployed in that region.

ℹ️ - The networkAcls section in the Azure OpenAI deployment can be customized to restrict access to an allowed list of IP addresses for enhanced security.

targetScope = 'subscription'

@description('Optional. Location for all resources.')
param location string = deployment().location

@description('Required. Name of the resource group to create.')
param resourceGroupName string

@description('Required. Tags for the resources.')
param tags object

@description('Required. Name of the Azure Cognitive Services account.')
param azureAIServiceConfig object

module resourceGroup 'br/public:avm/res/resources/resource-group:0.4.0' = {
name: '${uniqueString(deployment().name, location)}-resourceGroup'
params: {
location: location
name: resourceGroupName
tags: tags
}
}

module aiService 'br/public:avm/res/cognitive-services/account:0.9.2' = {
name: 'aiService-${azureAIServiceConfig.name}'
scope: az.resourceGroup(resourceGroupName)
params: {
kind: azureAIServiceConfig.kind
name: azureAIServiceConfig.name
sku: azureAIServiceConfig.sku.name
disableLocalAuth: azureAIServiceConfig.disableLocalAuth
deployments: [for deployment in azureAIServiceConfig.deployments : {
model: deployment.model
name: deployment.name
sku: deployment.sku
versionUpgradeOption: deployment.versionUpgradeOption
raiPolicyName: deployment.raiPolicyName
}]
location: location
publicNetworkAccess: azureAIServiceConfig.publicNetworkAccess
networkAcls: azureAIServiceConfig.networkAcls
}
dependsOn: [
resourceGroup
]
}

Regional Deployments

Australia East Parameter File Example

The following Bicep parameter file defines the Australia East deployment. This region is preferable for users based in Australia as it provides low latency and optimal performance.

using 'main.bicep'

param resourceGroupName = 'rg-test-ae-ai-01'
param location = 'australiaeast'
param tags = {
owner: 'A User'
}
param azureAIServiceConfig = {
name: 'ais-test-ae-ai-01'
sku: {
name: 'S0'
}
kind: 'AIServices'
location: location
disableLocalAuth: false
publicNetworkAccess: 'Enabled'
networkAcls: {
defaultAction: 'Deny'
ipRules: [
{
value: '<your allowed IP addresses>'
}
]
}
deployments: [
{
model: {
format: 'OpenAI'
name: 'gpt-4o'
version: '2024-05-13'
}
name: 'gpt-4o'
sku: {
capacity: 34
name: 'GlobalStandard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
{
model: {
format: 'OpenAI'
name: 'dall-e-3'
version: '3.0'
}
name: 'dall-e-3'
sku: {
name: 'Standard'
capacity: 1
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
]
}

Deploying Australia East Models

ℹ️ - Here is an example of how we would use Azure Deployment Stack PowerShell command to deploy the models.

New-AzSubscriptionDeploymentStack `
-Name 'aeAiDeploymentExample' `
-Location 'australiaeast' `
-TemplateParameterFile 'ae.bicepparam' `
-Description 'An example Australia East Azure Open AI Deployment' `
-ActionOnUnmanage 'DeleteAll' `
-DenySettingsMode 'None' `
-Verbose

EastUs2 Deployment

Since some preview models are not available in Australia East, we also deploy an instance in East US 2 for testing a wider range of models.

using 'main.bicep'

param resourceGroupName = 'rg-test-eus2-ai-01'
param location = 'eastus2'
param tags = {
owner: 'A User'
}

param storageAccountConfig = {
networkAcls: {
bypass: 'AzureServices'
}
}

param keyVaultConfig = {
location: location
tags: tags
enablePurgeProtection: false
}

param azureAIServiceConfig = {
name: 'ais-test-eus2-01'
sku: {
name: 'S0'
}
kind: 'AIServices'
location: location
disableLocalAuth: false
publicNetworkAccess: 'Enabled'
networkAcls: {
defaultAction: 'Deny'
ipRules: [
{
value: '<your allowed IP Addresses>'
}
]
}
deployments: [
{
model: {
format: 'OpenAI'
name: 'gpt-4o'
version: '2024-11-20'
}
name: 'gpt-4o'
sku: {
capacity: 450
name: 'GlobalStandard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
{
model: {
format: 'OpenAI'
name: 'gpt-4o-mini-audio-preview'
version: '2024-12-17'
}
name: 'gpt-4o-mini-audio-preview'
sku: {
capacity: 30
name: 'GlobalStandard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
{
model: {
format: 'OpenAI'
name: 'gpt-4o-realtime-preview'
version: '2024-12-17'
}
name: 'gpt-4o-realtime-preview'
sku: {
capacity: 6
name: 'GlobalStandard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
{
model: {
format: 'OpenAI'
name: 'gpt-4o-mini-realtime-preview'
version: '2024-12-17'
}
name: 'gpt-4o-mini-realtime-preview'
sku: {
capacity: 6
name: 'GlobalStandard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
{
model: {
format: 'OpenAI'
name: 'o1-mini'
version: '2024-09-12'
}
name: 'o1-mini'
sku: {
capacity: 5
name: 'GlobalStandard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
{
model: {
format: 'OpenAI'
name: 'text-embedding-3-large'
version: '1'
}
name: 'text-embedding-3-large'
sku: {
capacity: 350
name: 'Standard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
{
model: {
format: 'OpenAI'
name: 'whisper'
version: '001'
}
name: 'whisper'
sku: {
capacity: 3
name: 'Standard'
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.DefaultV2'
}
]
}


Deploying East US 2 Models

ℹ️ - Here is an example of how we would use Azure Deployment Stack PowerShell command to deploy the models.

New-AzSubscriptionDeploymentStack `
-Name 'eus2AiDeploymentExample' `
-Location 'eastus2' `
-TemplateParameterFile 'eus2.bicepparam' `
-Description 'An example East US 2 Azure Open AI Deployment' `
-ActionOnUnmanage 'DeleteAll' `
-DenySettingsMode 'None' `
-Verbose

Sweden Central Parameter File Example

The Sweden Central region is used for deploying Text-to-Speech (TTS) and TTS-HD models, as they are only available in select regions.


using 'main.bicep'

param resourceGroupName = 'rg-test-swec-ai-01'
param location = 'swedencentral'
param tags = {
owner: 'A User'
}

param storageAccountConfig = {
networkAcls: {
bypass: 'AzureServices'
}
}

param keyVaultConfig = {
location: location.long
tags: tags
enablePurgeProtection: false
}

param azureAIServiceConfig = {
name: 'ais-test-swec-ai-01'
sku: {
name: 'S0'
}
kind: 'AIServices'
location: location
disableLocalAuth: false
publicNetworkAccess: 'Enabled'
networkAcls: {
defaultAction: 'Deny'
ipRules: [
{

value: '<your allowed IP Addresses>'
}
]
}
deployments: [
{
model: {
format: 'OpenAI'
name: 'tts'
version: '001'
}
name: 'tts'
sku: {
name: 'Standard'
capacity: 3
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.Default'
}
{
model: {
format: 'OpenAI'
name: 'tts-hd'
version: '001'
}
name: 'tts-hd'
sku: {
name: 'Standard'
capacity: 3
}
versionUpgradeOption: 'OnceNewDefaultVersionAvailable'
raiPolicyName: 'Microsoft.Default'
}
]
}

Deploying Sweden Central Models

ℹ️ - Here is an example of how we would use Azure Deployment Stack PowerShell command to deploy the models.

New-AzSubscriptionDeploymentStack `
-Name 'swecAiDeploymentExample' `
-Location 'swedencentral' `
-TemplateParameterFile 'swec.bicepparam' `
-Description 'An example Sweden Central Azure Open AI Deployment' `
-ActionOnUnmanage 'DeleteAll' `
-DenySettingsMode 'None' `
-Verbose

Final Thoughts

Deploying Azure OpenAI via Bicep provides a structured, repeatable, and scalable way to manage AI workloads in the cloud. By breaking the deployment into logical steps, we ensure that infrastructure is modular, secure, and aligned with best practices.

The deployments demonstrated above are primarily for lab environments and learning purposes. Transitioning this into a production-ready deployment requires deeper considerations, including security, compliance, and operational resilience.

Key Takeaways

  • Infrastructure as Code (IaC) – Using Bicep ensures that deployments are automated, version-controlled, and repeatable.
  • Regional Considerations – Selecting the right Azure region is critical for latency, compliance, and model availability.
  • Scalability & Security – Leveraging Azure Verified Modules (AVM) simplifies infrastructure while ensuring security and performance.
  • Flexible Model Deployments – Deploying in multiple regions provides access to the latest models while optimizing for cost and availability.
  • Deployment Stacks for Cleanup – Using deployment stacks allows for simplified management and clean deprovisioning of resources.

Next Steps

While this guide focused on a lab setup, moving to a production-ready deployment involves additional planning and refinement. Here are some key areas to consider:

  • Monitoring & Logging – Integrate Azure Monitor and Log Analytics to track usage, performance, and potential issues.
  • Cost Optimization – Evaluate quota limits, reserved capacity options, and pricing tiers to optimize costs for long-term usage.
  • Security & Compliance – Ensure data sovereignty, private networking, access controls, and role-based access management (RBAC) align with organizational policies.

With this structured approach, you can confidently deploy and manage Azure OpenAI, ensuring scalability, security, and cost efficiency while preparing for real-world applications.