Orchestrating Multi-Agent Solutions with Semantic Kernel

In the real world, complex problems aren't solved by one person. A DevOps crisis involves a monitoring expert, a root cause analyst, and a deployment specialist. We apply this same "Team" logic to AI using the Semantic Kernel AgentGroupChat.

1. Why Multi-Agent?

Single-agent systems often suffer from "context fatigue" when tasked with too many contradictory instructions. By breaking a problem down into specialized roles, we achieve:

Higher Precision: Each agent has a narrow, hyper-focused system prompt.
Separation of Concerns: One agent can be the "Thinker" (Incident Manager) while the other is the "Doer" (DevOps Assistant).
Robust Workflows: Agents can check each other's work, reducing hallucinations in critical infrastructure.

The DevOps Example:

Monitoring Agent: Detects the anomaly.
Root Cause Agent: Identifies why it happened.
Deployment Agent: Rolls back the change.
Reporting Agent: Notifies the stakeholders.

2. Managing the Conversation: Selection & Termination

In a group chat, you need a "Moderator." Semantic Kernel provides two critical strategies to manage the flow:

A. Selection Strategy

This determines who speaks next.

SequentialSelectionStrategy: A simple "round-robin" or fixed-order approach.
KernelFunctionSelectionStrategy: Uses an LLM to dynamically decide which agent is best suited to answer the current state of the chat.

B. Termination Strategy

This determines when the job is done. Without a termination strategy, agents might loop forever. We define a specific condition (e.g., the word "RESOLVED" or "NO ACTION NEEDED") to stop the cycle.

Lab Section: Building a Self-Healing DevOps Multi-Agent System

In this lab, we will build a two-agent system: an Incident Manager that analyzes logs and a DevOps Assistant that executes fixes.

1. Project Prerequisites

Create your requirements.txt and .env as defined below:

Plaintext

python-dotenv
semantic-kernel[azure]
azure-identity

File Structure:

/logs/: (Created automatically by script)
/sample_logs/: Contains log1.log, log2.log, log3.log, and log4.log.
.env: Containing your connection strings.
requirements.txt: python-dotenv, semantic-kernel, azure-identity.

2. Implementation (`agent_chat.py`)

This lab demonstrates how to subclass Selection and Termination strategies to create a controlled loop between two specialized agents.

Python
import asyncio
import os
import textwrap
import shutil
from datetime import datetime
from pathlib import Path

from azure.identity.aio import DefaultAzureCredential
from semantic_kernel.agents import AgentGroupChat, AzureAIAgent, AzureAIAgentSettings
from semantic_kernel.agents.strategies import TerminationStrategy, SequentialSelectionStrategy
from semantic_kernel.contents.chat_message_content import ChatMessageContent
from semantic_kernel.contents.utils.author_role import AuthorRole
from semantic_kernel.functions.kernel_function_decorator import kernel_function

# --- Role Definitions and Instructions ---
INCIDENT_MANAGER = "INCIDENT_MANAGER"
INCIDENT_MANAGER_INSTRUCTIONS = """
Analyze the given log file or the response from the devops assistant.
Recommend which one of the following actions should be taken:

- Restart service {service_name}
- Rollback transaction
- Redeploy resource {resource_name}
- Increase quota

If there are no issues or if the issue has already been resolved, respond with "INCIDENT_MANAGER > No action needed"
If none of the options resolve the issue, respond with "Escalate issue."

RULES:
- Do not perform any corrective actions yourself.
- Read the log file on every turn.
- Prepend your response with this text: "INCIDENT_MANAGER > {logfilepath} | "
- Only respond with the corrective action instructions.
"""

DEVOPS_ASSISTANT = "DEVOPS_ASSISTANT"
DEVOPS_ASSISTANT_INSTRUCTIONS = """
Read the instructions from the INCIDENT_MANAGER and apply the appropriate resolution function.
Return the response as "{function_response}"
If the instructions indicate there are no issues or actions needed, 
take no action and respond with "No action needed."

RULES:
- Use the instructions provided.
- Do not read any log files yourself.
- Prepend your response with this text: "DEVOPS_ASSISTANT > "
"""

# --- Custom Strategies ---

class SelectionStrategy(SequentialSelectionStrategy):
    """Determines which agent should take the next turn."""
    async def select_agent(self, agents, history):
        # The Incident Manager goes after the User or the DevOps Assistant
        if history[-1].name == DEVOPS_ASSISTANT or history[-1].role == AuthorRole.USER:
            return next((agent for agent in agents if agent.name == INCIDENT_MANAGER), None)
        
        # Otherwise, it is the DevOps Assistant's turn
        return next((agent for agent in agents if agent.name == DEVOPS_ASSISTANT), None)

class ApprovalTerminationStrategy(TerminationStrategy):
    """Determines when the conversation should end."""
    async def should_agent_terminate(self, agent, history):
        # End chat if the agent indicates no action is needed
        return "no action needed" in history[-1].content.lower()

# --- Plugins ---

class LogFilePlugin:
    """Plugin to allow agents to read log files."""
    @kernel_function(description="Reads the contents of a log file.")
    def read_log(self, file_path: str) -> str:
        with open(file_path, 'r') as f:
            return f.read()

class DevopsPlugin:
    """Plugin to simulate DevOps corrective actions."""
    @kernel_function(description="Restarts a service.")
    def restart_service(self, service_name: str) -> str:
        return f"{{Service {service_name} restarted successfully.}}"

    @kernel_function(description="Rolls back a transaction.")
    def rollback_transaction(self) -> str:
        return "{Transaction rolled back successfully.}"

    @kernel_function(description="Redeploys a resource.")
    def redeploy_resource(self, resource_name: str) -> str:
        return f"{{Resource {resource_name} redeployed successfully.}}"

    @kernel_function(description="Increases quota.")
    def increase_quota(self) -> str:
        return "{Quota increased successfully.}"

# --- Main Logic ---

async def main():
    # Clear the console
    os.system('cls' if os.name == 'nt' else 'clear')

    # Setup file paths
    print("Getting log files...\n")
    script_dir = Path(__file__).parent
    src_path = script_dir / "sample_logs"
    file_path = script_dir / "logs"
    shutil.copytree(src_path, file_path, dirs_exist_ok=True)

    # Get Azure AI Settings
    ai_agent_settings = AzureAIAgentSettings()

    async with (
        DefaultAzureCredential(exclude_environment_credential=True, exclude_managed_identity_credential=True) as creds,
        AzureAIAgent.create_client(credential=creds) as client,
    ):
        # Create Incident Manager Agent
        incident_agent_definition = await client.agents.create_agent(
            model=ai_agent_settings.model_deployment_name,
            name=INCIDENT_MANAGER,
            instructions=INCIDENT_MANAGER_INSTRUCTIONS
        )
        agent_incident = AzureAIAgent(
            client=client,
            definition=incident_agent_definition,
            plugins=[LogFilePlugin()]
        )

        # Create DevOps Assistant Agent
        devops_agent_definition = await client.agents.create_agent(
            model=ai_agent_settings.model_deployment_name,
            name=DEVOPS_ASSISTANT,
            instructions=DEVOPS_ASSISTANT_INSTRUCTIONS
        )
        agent_devops = AzureAIAgent(
            client=client,
            definition=devops_agent_definition,
            plugins=[DevopsPlugin()]
        )

        # Add agents to a group chat with custom strategies
        chat = AgentGroupChat(
            agents=[agent_incident, agent_devops],
            termination_strategy=ApprovalTerminationStrategy(
                agents=[agent_incident],
                maximum_iterations=10,
                automatic_reset=True
            ),
            selection_strategy=SelectionStrategy(agents=[agent_incident, agent_devops])
        )

        # Process log files in the directory
        for filename in os.listdir(file_path):
            current_log = file_path / filename
            logfile_msg = ChatMessageContent(
                role=AuthorRole.USER, 
                content=f"USER > {current_log} | Please analyze this log."
            )
            
            print(f"Ready to process log file: {filename}\n")
            await asyncio.sleep(2) # Buffer to reduce TPM/rate limits
            
            # Append the current log file message to the chat
            await chat.add_chat_message(logfile_msg)

            # Invoke the chat and display outputs
            async for response in chat.invoke():
                print(f"{response.name} > {response.content}")
            
            print("\n" + "="*50 + "\n")

if __name__ == "__main__":
    asyncio.run(main())

3. Deployment Steps

Step 1: Virtual Environment

PowerShell
python -m venv my-env
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\my-env\Scripts\activate

Step 2: Install and Run

PowerShell

pip install -r requirements.txt
python agent_chat.py

4. Key Learnings for the AI-102 Exam

AgentGroupChat Hierarchy: Understand that the AgentGroupChat object is the container that manages the shared history (Thread) across multiple agents.
Custom Selection Logic: You must know how to override select_agent. This allows for complex business logic, like "Agent B only speaks if Agent A mentions a specific keyword."
State Management: In a multi-agent setup, the "History" is the source of truth. Each agent reads the entire history of the chat to understand what has already been attempted by its peers.

Lab Highlights
Sequential Logic: The system is designed for the INCIDENT_MANAGER to read logs first via LogFilePlugin, followed by the DEVOPS_ASSISTANT executing a command via DevopsPlugin.
Termination: The chat terminates when "No action needed" is detected in the message content, signifying the incident is resolved.
Automation: As seen in the final two screenshots, the agent successfully identifies a restart for ServiceX in log1.log and a rollback in log2.log, confirming the multi-agent logic works as expected.

Orchestrating Multi-Agent Solutions with Semantic Kernel

Orchestrating Multi-Agent Solutions with Semantic Kernel

1. Why Multi-Agent?

The DevOps Example:

2. Managing the Conversation: Selection & Termination

A. Selection Strategy

B. Termination Strategy

Lab Section: Building a Self-Healing DevOps Multi-Agent System

1. Project Prerequisites

2. Implementation (agent_chat.py)

3. Deployment Steps

Step 1: Virtual Environment

Step 2: Install and Run

4. Key Learnings for the AI-102 Exam

Contact Form

2. Implementation (`agent_chat.py`)