Building Multi-Agent AI System(MAS) on AWS
Guide a building a multi-agent AI system using Amazon Bedrock LLMAgent

David is a seasoned cloud solutions architect/engineer with several years of experience architecting and building resilient, reliable and highly available systems in the cloud.
Why Multi-Agent Systems?
A single AI model may excel in specific tasks but struggle with multifaceted conversations. Multi-agent systems:
Divide responsibilities (e.g., summarization, recommendations, or data extraction).
Improve efficiency by delegating tasks to specialized agents.
Provide modularity and scalability.
This article will explore how to build one of these MA systems using the recently released multi-agent orchestrator.
Multi-Agent Orchestrator
The Multi-Agent Orchestrator is a flexible framework for managing multiple AI agents and handling complex conversations. It intelligently routes queries and maintains context across interactions. This makes it easy to interact with familiar AWS services and foundation models seamlessly.
How it works?
The Multi-Agent Orchestrator comes with pre-built components that allow you to easily interact with Amazon Bedrock capabilities. The orchestrator is broken down into different components while providing a lot of flexibility in implementation.
Implementation
We will be building AgentFix, an Incident Remediation System to help with the automated and prompt resolution of software issues without human intervention in the loop(self-healing).
Core Components of our Orchestrator
Orchestrator
Functions as the primary coordinator for all other modules. Oversees the exchange of data between the Classifier, Agents, Storage, and Retrievers. Interprets user inputs and directs the creation of suitable responses. Manages errors and implements fallback procedures.
Classifier
Evaluates user queries, agent profiles, and previous conversation history. Selects the best-matched agent for each specific request. Custom Classifiers: Build entirely new classifiers for targeted tasks or specialized domains.
# Initialize the orchestrator
custom_bedrock_classifier = BedrockClassifier(BedrockClassifierOptions(
model_id='anthropic.claude-3-haiku-20240307-v1:0',
client=bedrock_runtime_client,
inference_config={
'maxTokens': 500,
'temperature': 0.7,
'topP': 0.9
}
))
# Initialize the orchestrator with some options
orchestrator = MultiAgentOrchestrator(options=OrchestratorConfig(
LOG_AGENT_CHAT=True,
LOG_CLASSIFIER_CHAT=True,
LOG_CLASSIFIER_RAW_OUTPUT=True,
LOG_CLASSIFIER_OUTPUT=True,
LOG_EXECUTION_TIMES=True,
MAX_RETRIES=3,
USE_DEFAULT_AGENT_IF_NONE_IDENTIFIED=True,
MAX_MESSAGE_PAIRS_PER_AGENT=10,
),
classifier=custom_bedrock_classifier
)
Agents
Prebuilt agents can be customized and enhanced to fit particular needs.
from multi_agent_orchestrator.agents import (BedrockLLMAgent, BedrockLLMAgentOptions)
from multi_agent_orchestrator.agents import ChainAgent, ChainAgentOptions
def issue_classifier():
agent = BedrockLLMAgent(BedrockLLMAgentOptions(
name='Application Debugging Agent',
description='Specializes in classifying application logs.',
model_id='anthropic.claude-3-haiku-20240307-v1:0',
streaming=True,
inference_config={
'maxTokens': 1000,
'temperature': 0.7,
'topP': 0.9,
},
custom_system_prompt={
"template": """You are an expert in log analysis, able to categorize log data across multiple systems to allow easier issue resolution..
Core Competencies:
1. DevOps
2. Software Architecture
3. Metrics, Traces and Logging
4. Application Monitoring
5. Applciation Log Classifier
When classifying logs, organise them with the following headings:
- Issue:
- Root Cause:
- Actions to be Taken:
"""
}
))
return agent
def remediator():
agent = BedrockLLMAgent(BedrockLLMAgentOptions(
name='Application Remediator Agent',
description='Specializes in providing remedition actions for debugging specific application error logs.',
model_id='anthropic.claude-3-haiku-20240307-v1:0',
streaming=True,
inference_config={
'maxTokens': 1000,
'temperature': 0.7,
'topP': 0.9,
},
custom_system_prompt={
"template": """You are a site reliability engineer with expertise in providing remedition actions and resolving complex application error from the application logs..
Core Competencies:
1. Programming Languages
2. Software Architecture
3. Best Practices
4. Performance Optimization
When providing resolution for issues:
- Point out what the exact issue is
- From the logs in the past 15 to 30 minutes find out the root cause
- Explore actions to be taken to resolve the issue
- Also include exact commands to run for each step in a linux environment if needed
- Break down the resolution into actionable steps based on specific issue"""
}
))
return agent
def chain_agent():
agent = ChainAgent(ChainAgentOptions(
name='FixerChainAgent',
description='A simple chain of multiple agents',
agents=[issue_classifier(), remediator()]
))
return agent
Conversation Storage
Keeps a record of past interactions with in-memory storage. Operates on two distinct layers: context for the Classifier and context for the Agents.
Retrievers
Enhance the functionality of LLM-based agents by delivering relevant context and data. Optimize performance by retrieving needed information dynamically instead of relying entirely on pre-trained data. Retrieval can be from Bedrock Knowledge base or an entirely custom-built one.

Steps:
We first need to set up our local environment to use AWS CLI. For the required permissions, we can create roles manually or with SSO. For larger teams managing multiple AWS Accounts with an Organisation, I recommend using AWS TEAM to manage access to your AWS environment.
Once this is done, run the following commands to confirm that everything looks good:
# Show aws cli version aws --version # Get cli caller identity aws sts get-caller-identityAmazon Bedrock: Access to foundation models from AWS partners. On the AWS console, you can request access to any of the models provided, we will be using the Anthropic Clause model
anthropic.claude-3-haiku-20240307-v1:0for this demo.To get started with AgentFix, follow these steps to install the required packages:
Clone the repository:
git clone https://github.com/baldcodr/agentfix.git cd agentfixCreate and activate a virtual environment (optional but recommended):
virtualenv -p python3.13 venv source venv/bin/activate # On Windows use `venv\Scripts\activate`Install the required packages:
pip install -r requirements.txt
Running Locally
To run the project locally, follow these steps:
Ensure that you have completed the installation steps above.
Run the application:
python main.pyFollow the prompts in the terminal to interact with the Multi-Agent system by inputting a sample log data as below.:
{"source": "app_log", "error_code": 500, "message": "Database timeout"}
Closing Remark
Multi-agent systems are revolutionizing how we approach self-healing software solutions. By combining the autonomy, collaboration, and intelligence of MAS, organizations can achieve unparalleled system resilience and reliability.
Call to Action:
Begin by implementing monitoring and anomaly detection with MAS concepts.
Explore AWS tools to start building your self-healing architecture today!





