Orchestrating Multi-Agent Solutions with Semantic Kernel
In the real world, complex problems aren't solved by one person. A DevOps crisis involves a monitoring expert, a root cause analyst, and a deployment specialist. We apply this same "Team" logic to AI using the Semantic Kernel AgentGroupChat.
1. Why Multi-Agent?
Single-agent systems often suffer from "context fatigue" when tasked with too many contradictory instructions. By breaking a problem down into specialized roles, we achieve:
- Higher Precision: Each agent has a narrow, hyper-focused system prompt.
- Separation of Concerns: One agent can be the "Thinker" (Incident Manager) while the other is the "Doer" (DevOps Assistant).
- Robust Workflows: Agents can check each other's work, reducing hallucinations in critical infrastructure.
The DevOps Example:
- Monitoring Agent: Detects the anomaly.
- Root Cause Agent: Identifies why it happened.
- Deployment Agent: Rolls back the change.
- Reporting Agent: Notifies the stakeholders.
2. Managing the Conversation: Selection & Termination
In a group chat, you need a "Moderator." Semantic Kernel provides two critical strategies to manage the flow:
A. Selection Strategy
This determines who speaks next.
- SequentialSelectionStrategy: A simple "round-robin" or fixed-order approach.
- KernelFunctionSelectionStrategy: Uses an LLM to dynamically decide which agent is best suited to answer the current state of the chat.
B. Termination Strategy
This determines when the job is done. Without a termination strategy, agents might loop forever. We define a specific condition (e.g., the word "RESOLVED" or "NO ACTION NEEDED") to stop the cycle.
Lab Section: Building a Self-Healing DevOps Multi-Agent System
In this lab, we will build a two-agent system: an Incident Manager that analyzes logs and a DevOps Assistant that executes fixes.
1. Project Prerequisites
Create your requirements.txt and .env as defined below:
python-dotenv
semantic-kernel[azure]
azure-identity
File Structure:
/logs/: (Created automatically by script)/sample_logs/: Containslog1.log,log2.log,log3.log, andlog4.log..env: Containing your connection strings.requirements.txt:python-dotenv,semantic-kernel,azure-identity.
2. Implementation (agent_chat.py)
This lab demonstrates how to subclass Selection and Termination strategies to create a controlled loop between two specialized agents.
import asyncio
import os
import textwrap
import shutil
from datetime import datetime
from pathlib import Path
from azure.identity.aio import DefaultAzureCredential
from semantic_kernel.agents import AgentGroupChat, AzureAIAgent, AzureAIAgentSettings
from semantic_kernel.agents.strategies import TerminationStrategy, SequentialSelectionStrategy
from semantic_kernel.contents.chat_message_content import ChatMessageContent
from semantic_kernel.contents.utils.author_role import AuthorRole
from semantic_kernel.functions.kernel_function_decorator import kernel_function
# --- Role Definitions and Instructions ---
INCIDENT_MANAGER = "INCIDENT_MANAGER"
INCIDENT_MANAGER_INSTRUCTIONS = """
Analyze the given log file or the response from the devops assistant.
Recommend which one of the following actions should be taken:
- Restart service {service_name}
- Rollback transaction
- Redeploy resource {resource_name}
- Increase quota
If there are no issues or if the issue has already been resolved, respond with "INCIDENT_MANAGER > No action needed"
If none of the options resolve the issue, respond with "Escalate issue."
RULES:
- Do not perform any corrective actions yourself.
- Read the log file on every turn.
- Prepend your response with this text: "INCIDENT_MANAGER > {logfilepath} | "
- Only respond with the corrective action instructions.
"""
DEVOPS_ASSISTANT = "DEVOPS_ASSISTANT"
DEVOPS_ASSISTANT_INSTRUCTIONS = """
Read the instructions from the INCIDENT_MANAGER and apply the appropriate resolution function.
Return the response as "{function_response}"
If the instructions indicate there are no issues or actions needed,
take no action and respond with "No action needed."
RULES:
- Use the instructions provided.
- Do not read any log files yourself.
- Prepend your response with this text: "DEVOPS_ASSISTANT > "
"""
# --- Custom Strategies ---
class SelectionStrategy(SequentialSelectionStrategy):
"""Determines which agent should take the next turn."""
async def select_agent(self, agents, history):
# The Incident Manager goes after the User or the DevOps Assistant
if history[-1].name == DEVOPS_ASSISTANT or history[-1].role == AuthorRole.USER:
return next((agent for agent in agents if agent.name == INCIDENT_MANAGER), None)
# Otherwise, it is the DevOps Assistant's turn
return next((agent for agent in agents if agent.name == DEVOPS_ASSISTANT), None)
class ApprovalTerminationStrategy(TerminationStrategy):
"""Determines when the conversation should end."""
async def should_agent_terminate(self, agent, history):
# End chat if the agent indicates no action is needed
return "no action needed" in history[-1].content.lower()
# --- Plugins ---
class LogFilePlugin:
"""Plugin to allow agents to read log files."""
@kernel_function(description="Reads the contents of a log file.")
def read_log(self, file_path: str) -> str:
with open(file_path, 'r') as f:
return f.read()
class DevopsPlugin:
"""Plugin to simulate DevOps corrective actions."""
@kernel_function(description="Restarts a service.")
def restart_service(self, service_name: str) -> str:
return f"{{Service {service_name} restarted successfully.}}"
@kernel_function(description="Rolls back a transaction.")
def rollback_transaction(self) -> str:
return "{Transaction rolled back successfully.}"
@kernel_function(description="Redeploys a resource.")
def redeploy_resource(self, resource_name: str) -> str:
return f"{{Resource {resource_name} redeployed successfully.}}"
@kernel_function(description="Increases quota.")
def increase_quota(self) -> str:
return "{Quota increased successfully.}"
# --- Main Logic ---
async def main():
# Clear the console
os.system('cls' if os.name == 'nt' else 'clear')
# Setup file paths
print("Getting log files...\n")
script_dir = Path(__file__).parent
src_path = script_dir / "sample_logs"
file_path = script_dir / "logs"
shutil.copytree(src_path, file_path, dirs_exist_ok=True)
# Get Azure AI Settings
ai_agent_settings = AzureAIAgentSettings()
async with (
DefaultAzureCredential(exclude_environment_credential=True, exclude_managed_identity_credential=True) as creds,
AzureAIAgent.create_client(credential=creds) as client,
):
# Create Incident Manager Agent
incident_agent_definition = await client.agents.create_agent(
model=ai_agent_settings.model_deployment_name,
name=INCIDENT_MANAGER,
instructions=INCIDENT_MANAGER_INSTRUCTIONS
)
agent_incident = AzureAIAgent(
client=client,
definition=incident_agent_definition,
plugins=[LogFilePlugin()]
)
# Create DevOps Assistant Agent
devops_agent_definition = await client.agents.create_agent(
model=ai_agent_settings.model_deployment_name,
name=DEVOPS_ASSISTANT,
instructions=DEVOPS_ASSISTANT_INSTRUCTIONS
)
agent_devops = AzureAIAgent(
client=client,
definition=devops_agent_definition,
plugins=[DevopsPlugin()]
)
# Add agents to a group chat with custom strategies
chat = AgentGroupChat(
agents=[agent_incident, agent_devops],
termination_strategy=ApprovalTerminationStrategy(
agents=[agent_incident],
maximum_iterations=10,
automatic_reset=True
),
selection_strategy=SelectionStrategy(agents=[agent_incident, agent_devops])
)
# Process log files in the directory
for filename in os.listdir(file_path):
current_log = file_path / filename
logfile_msg = ChatMessageContent(
role=AuthorRole.USER,
content=f"USER > {current_log} | Please analyze this log."
)
print(f"Ready to process log file: {filename}\n")
await asyncio.sleep(2) # Buffer to reduce TPM/rate limits
# Append the current log file message to the chat
await chat.add_chat_message(logfile_msg)
# Invoke the chat and display outputs
async for response in chat.invoke():
print(f"{response.name} > {response.content}")
print("\n" + "="*50 + "\n")
if __name__ == "__main__":
asyncio.run(main())3. Deployment Steps
Step 1: Virtual Environment
python -m venv my-env
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\my-env\Scripts\activate
Step 2: Install and Run
pip install -r requirements.txt
python agent_chat.py
4. Key Learnings for the AI-102 Exam
- AgentGroupChat Hierarchy: Understand that the
AgentGroupChatobject is the container that manages the shared history (Thread) across multiple agents. - Custom Selection Logic: You must know how to override
select_agent. This allows for complex business logic, like "Agent B only speaks if Agent A mentions a specific keyword." - State Management: In a multi-agent setup, the "History" is the source of truth. Each agent reads the entire history of the chat to understand what has already been attempted by its peers.
Lab Highlights
- Sequential Logic: The system is designed for the
INCIDENT_MANAGERto read logs first viaLogFilePlugin, followed by theDEVOPS_ASSISTANTexecuting a command viaDevopsPlugin.
- Termination: The chat terminates when "No action needed" is detected in the message content, signifying the incident is resolved.
- Automation: As seen in the final two screenshots, the agent successfully identifies a restart for
ServiceXinlog1.logand a rollback inlog2.log, confirming the multi-agent logic works as expected.
