Azure Traffic Manager: Global DNS-Based Load Balancing and Routing
When engineering global cloud applications, distributing user traffic across multiple regions is vital to minimizing latency and ensuring high availability. While Azure Front Door handles traffic at the HTTP/HTTPS application layer using Anycast, and Azure Load Balancer manages TCP/UDP traffic at the transport layer, cross-region management can also be handled at the foundational layer of the internet: The Domain Name System (DNS).
Azure Traffic Manager is a DNS-based traffic load balancer that enables you to distribute traffic optimally to services across global Azure regions, while providing high availability and responsiveness.
Introduction
What is Azure Traffic Manager?
Azure Traffic Manager operates at the DNS layer. It uses DNS responses to direct client requests to the most appropriate service endpoint based on a selected traffic-routing method.
Client Browser ──( 1. DNS Query: app.contoso.com )──> [ Azure Traffic Manager ]
Client Browser <──( 2. DNS Reply: Best Regional IP )── [ Azure Traffic Manager ]
Client Browser ──( 3. Direct Connection to App )────> Regional Endpoint (e.g., East US)
Because Traffic Manager is DNS-based, it does not see or proxy any traffic passing between the client and the application endpoint. The client connects directly to the endpoint returned by Traffic Manager. Consequently, it is completely agnostic to application protocols and can handle traffic for any internet-facing service.
Create a Traffic Manager Profile
Creating a profile involves provisioning a global Traffic Manager resource container in Azure. During deployment, you define a globally unique relative DNS name (e.g., my-app.trafficmanager.net). This profile specifies the routing method that governs how incoming DNS queries are answered.
⚠️ Important Note!
- DNS Caching & TTL Effects: Because Traffic Manager operates strictly via DNS, it is subject to the Time-to-Live (TTL) caching behavior of client operating systems and recursive DNS resolvers. If an endpoint fails, clients may continue attempting to connect to that failed endpoint until their local DNS cache expires (typically 30 to 300 seconds).
- Public Endpoints Only: Traffic Manager can only route to endpoints that have a publicly reachable IP address or Fully Qualified Domain Name (FQDN). It cannot natively route or load balance strictly internal, non-internet-facing private VNet resources without public exposure.
Traffic Manager Routing Methods
Traffic Manager supports six distinct routing methods that dictate how DNS queries are evaluated and resolved.
1. Priority Routing
Priority routing is used when you want to designate a primary service endpoint for all traffic, while maintaining one or more backup endpoints. Traffic Manager sends 100% of the traffic to the highest-priority healthy endpoint. It fails over to the next priority endpoint only if the primary endpoint is marked unhealthy by the monitoring system.
2. Weighted Routing
Weighted routing allows you to distribute traffic evenly or in a pre-defined percentage across a set of endpoints. You assign a weight from 1 to 1000 to each endpoint. Traffic Manager will distribute traffic proportionally based on the weights assigned (e.g., assigning two endpoints a weight of 50/50 results in an even round-robin distribution). This is ideal for gradual code rollouts or capacity testing.
3. Performance Routing
Performance routing is designed to give users the lowest possible network latency. When Traffic Manager receives a DNS query, it checks an internally maintained Internet Latency Table to determine which endpoint is closest to the client's source IP address from a network perspective, routing the user to the fastest available regional deployment.
4. Geographic Routing
Geographic routing directs users to specific endpoints based on the geographic location of their DNS query origin (broken down by Continent, Country, or State/Province). This is heavily utilized for sovereign data compliance laws (like GDPR), content localization, and enforcing regional geofencing mandates.
5. Subnet Routing
Subnet routing maps specific sets of client IP address ranges (defined via CIDR blocks) to designated endpoints. When a request originates from an IP address within a defined corporate subnet range, Traffic Manager directs them to an explicit endpoint. This is commonly used to route internal corporate traffic to a specific testing environment or specialized cloud tier.
6. Multi-Value Routing
Multi-value routing configures Traffic Manager to return multiple healthy endpoints (up to 8 IPv4/IPv6 addresses) within a single DNS query response block. If a client browser receives multiple IP addresses and the first IP fails to respond, the client application can automatically fall back to the secondary IPs immediately without waiting for a new DNS resolution cycle.
Traffic Manager Configurations
Alias Records
To connect your custom domain apex (e.g., contoso.com) to a Traffic Manager profile, standard DNS protocols pose a challenge since you cannot create a CNAME at the root zone. By utilizing Azure DNS, you can create a native Alias Record at your zone apex that points directly to your Traffic Manager profile, enabling safe root-domain traffic routing.
Endpoint Types
Traffic Manager categorizes target destinations into three distinct endpoint structural types:
- Azure Endpoints: Targets public IP resources hosted within Azure, such as Public IPs bound to Load Balancers, Application Gateways, or Azure App Services.
- External Endpoints: Targets any public FQDN or IP address hosted outside of Azure, including on-premises datacenters or third-party cloud providers.
- Nested Endpoints: Targets another Traffic Manager profile, enabling advanced multi-layered routing hierarchies.
Endpoint Monitoring
Traffic Manager features an integrated monitoring system that continuously tests endpoint availability. It sends synthetic HTTP, HTTPS, or TCP requests to a configured port and path (e.g., /health). An endpoint is declared unhealthy if it fails to respond within the configured number of consecutive probing intervals, triggering an automatic redirect of new DNS queries away from the failed instance.
Real User Measurements & Traffic View
- Real User Measurements: A feature that allows you to embed a small JavaScript code fragment into your web application. This script measures the network latency from your actual end-users back to Azure datacenters, continually feeding real-world telemetry into Traffic Manager's Performance routing latency table.
- Traffic View: Provides high-fidelity insights into user traffic patterns. It generates visual heatmaps and analytics displaying exactly where your users are located geographically, their data volume trends, and the average network latency they experience when accessing your endpoints.
Traffic Manager Advanced Architectures
Traffic Manager Nested Profiles
Nested profiles allow you to combine multiple routing methods within a single architecture. For example, you can create a top-level profile that uses Geographic Routing to send users to Europe or North America, and nest a lower-level profile within Europe that uses Weighted Routing to conduct a canary deployment.
[ Top-Level Profile: Geographic Routing ]
├──> Europe ───> [ Nested Profile: Weighted Routing ]
│ ├───> Endpoint A (Weight: 90)
│ └───> Endpoint B (Weight: 10)
└──> North America ───> Endpoint C
Combine Load Balancing Services
For enterprise resilience, Azure recommends combining Traffic Manager with other load-balancing services. A common architecture uses Traffic Manager globally at the DNS layer to distribute traffic across regions, and applies an Azure Application Gateway or Azure Load Balancer regionally within each datacenter to manage local Layer 7 or Layer 4 distribution.
Disaster Recovery Using Traffic Manager
By leveraging Priority Routing, you can architect automated disaster recovery (DR). Set your primary data center as Priority 1 and your DR site as Priority 2. If the primary region suffers an infrastructure outage, Traffic Manager's health probes will detect the failure and automatically update DNS responses to point to the DR site, ensuring business continuity.
Blue-Green Deployments Using Traffic Manager
Using Weighted Routing, you can perform zero-downtime blue-green deployments. Run your production app on the "Blue" endpoint with a weight of 1000 and deploy the new code version to the "Green" endpoint with a weight of 0. You can gradually shift the weights (e.g., 900 Blue / 100 Green) to route a small percentage of live users to the new version, validating stability before cutting over completely.
Monitoring & Observability
Diagnostic Settings
To capture long-term telemetry for compliance and audits, you must configure Diagnostic Settings. This lets you stream Traffic Manager operational logs continuously into an Azure Storage Account, an Event Hub pipeline, or an active Log Analytics Workspace.
Metrics & Alerts
- Metrics: Traffic Manager pushes real-time telemetry straight to Azure Monitor. Key performance indicators to track include:
- Endpoint Status by Profile (Indicates if endpoints are Online, Degraded, or Inactive).
- Queries by Endpoint Returned (Tracks the exact number of DNS requests routed to each individual endpoint).
- Alerts: You can set automated alerting thresholds over these metrics. For example, if an endpoint status changes from Online to Degraded, Traffic Manager can trigger an email alert, fire a Slack notification, or run an Azure Automation Webhook to spin up replacement compute infrastructure.
Log Analytics Workspace
By channeling logs into a Log Analytics Workspace, network teams can write Kusto Query Language (KQL) scripts to analyze traffic history. You can run queries to audit the historical distribution of DNS queries, track exactly when an endpoint dropped offline, or review which geographic regions are generating the highest volume of queries.
Verifying Traffic Manager Settings & Performance Measurements
To ensure your routing methods and health probes are behaving correctly, you can use standard command-line diagnostic utilities like nslookup or dig:
Bash
# Verify DNS resolution and check which endpoint is currently being returned
nslookup my-app.trafficmanager.net
# For Multi-Value profiles, verify that multiple IPs are being returned simultaneously
dig my-app.trafficmanager.net
