Building Agent Reasoning Loop
Last updated
Last updated
사용자가 여러 단계로 구성된 복잡한 질문을 하거나 설명이 필요한 모호한 질문을 하면 어떻게 해야 할까요? 이때 Agent Reasoning Loop를 사용합니다. Agents는 한 번에 호출하는 대신 여러 단계를 거치지 않고 여러 툴을 통해 추론할 수 있습니다.
Agentic RAG에서 Reasoning을 반복해서 실행 Reasoning Loop를 구축할 수 있습니다.
절차는 아래와 같습니다.
Query Tools을 설정한다.
Function Calling 에이전트를 Setup 한다.
Agent를 Run 하여 Task를 Step-by-Step으로 실행한다.
import os
import nest_asyncio
from dotenv import load_dotenv, find_dotenv
# the format for that file is (without the comment) #API_KEYNAME=AStringThatIsTheLongAPIKeyFromSomeService
def load_env():
_ = load_dotenv(find_dotenv())
def get_openai_api_key():
load_env()
openai_api_key = os.getenv("OPENAI_API_KEY")
return openai_api_key
OPENAI_API_KEY = get_openai_api_key()
nest_asyncio.apply()
!mkdir dataset
!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O 'dataset/metagpt.pdf'
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from typing import List, Optional
def get_doc_tools(
file_path: str,
name: str,
) -> str:
"""Get vector query and summary query tools from a document."""
# load documents
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
vector_index = VectorStoreIndex(nodes)
def vector_query(
query: str,
page_numbers: Optional[List[str]] = None
) -> str:
"""Use to answer questions over the MetaGPT paper.
Useful if you have specific questions over the MetaGPT paper.
Always leave page_numbers as None UNLESS there is a specific page you want to search for.
Args:
query (str): the string query to be embedded.
page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE
if we want to perform a vector search
over all pages. Otherwise, filter by the set of specified pages.
"""
page_numbers = page_numbers or []
metadata_dicts = [
{"key": "page_label", "value": p} for p in page_numbers
]
query_engine = vector_index.as_query_engine(
similarity_top_k=2,
filters=MetadataFilters.from_dicts(
metadata_dicts,
condition=FilterCondition.OR
)
)
response = query_engine.query(query)
return response
vector_query_tool = FunctionTool.from_defaults(
name=f"vector_tool_{name}",
fn=vector_query
)
summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize",
use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
name=f"summary_tool_{name}",
query_engine=summary_query_engine,
description=(
"Use ONLY IF you want to get a holistic summary of MetaGPT. "
"Do NOT use if you have specific questions over MetaGPT."
),
)
return vector_query_tool, summary_tool
/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
vector_tool, summary_tool = get_doc_tools("dataset/metagpt.pdf", "metagpt")
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="gpt-3.5-turbo",
temperature=0
)
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner
agent_worker = FunctionCallingAgentWorker.from_tools(
[vector_tool, summary_tool],
llm=llm,
verbose=True
)
agent = AgentRunner(agent_worker)
response = agent.query(
"MetaGPT의 Agent 역할에 대해 알려줄래?"
"그리고 그들이 서로 커뮤니케이션하는 방식에 대해 설명해줘."
)
Added user message to memory: MetaGPT의 Agent 역할에 대해 알려줄래?그리고 그들이 서로 커뮤니케이션하는 방식에 대해 설명해줘.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "Agent role in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT encompass a diverse set of responsibilities tailored to specific expertise within the collaborative framework. These roles include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for creating a detailed Product Requirement Document (PRD) that outlines goals, user stories, and competitive analysis. The Architect translates these requirements into system design components, while the Project Manager distributes tasks and oversees the project's progress. Engineers execute classes and functions based on the requirements, and QA Engineers formulate test cases to ensure code quality. This structured division of labor allows for efficient collaboration among agents with diverse skills to address complex tasks within the MetaGPT framework.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "Communication among agents in MetaGPT"}
=== Function Output ===
Communication among agents in MetaGPT is structured and relies on a publish-subscribe mechanism with a shared message pool. Agents communicate through structured messages, documents, and diagrams rather than dialogue, ensuring that all necessary information is included and irrelevant content is avoided. This structured communication approach enhances role communication efficiency within the multi-agent system, allowing agents to exchange information effectively and collaborate efficiently. Additionally, a subscription mechanism is utilized to manage information overload, where agents select information based on their role profiles to receive only task-related information, promoting focused communication and preventing distractions from irrelevant details.
=== LLM Response ===
- MetaGPT의 Agent 역할은 다양한 책임을 갖고 있으며, 협업 프레임워크 내에서 특정 전문 지식에 맞게 맞춤화되어 있습니다. Agent 역할에는 Product Manager, Architect, Project Manager, Engineer, 그리고 QA Engineer가 포함됩니다. Product Manager는 목표, 사용자 스토리, 경쟁 분석을 개요로 하는 자세한 제품 요구 사항 문서(PRD)를 작성하는 역할을 맡고 있습니다. Architect는 이러한 요구 사항을 시스템 설계 구성 요소로 변환하며, Project Manager는 작업을 분배하고 프로젝트 진행 상황을 감독합니다. 엔지니어들은 요구 사항에 기반한 클래스와 함수를 실행하고, QA 엔지니어는 코드 품질을 보장하기 위해 테스트 케이스를 작성합니다. 이러한 분업 구조는 다양한 기술을 갖춘 에이전트들 간의 효율적인 협업을 통해 MetaGPT 프레임워크 내에서 복잡한 작업에 대응할 수 있도록 합니다.
- MetaGPT에서의 에이전트 간 커뮤니케이션은 구조화되어 있으며, 공유된 메시지 풀을 통한 발행-구독 메커니즘에 의존합니다. 에이전트들은 대화가 아닌 구조화된 메시지, 문서, 다이어그램을 통해 소통하여 필요한 모든 정보를 포함하고 관련 없는 내용을 피합니다. 이러한 구조화된 커뮤니케이션 접근 방식은 멀티 에이전트 시스템 내에서 역할 간 커뮤니케이션 효율성을 향상시키며, 에이전트들이 정보를 효과적으로 교환하고 효율적으로 협업할 수 있도록 합니다. 또한, 정보 과부하를 관리하기 위해 구독 메커니즘이 활용되며, 에이전트들은 역할 프로필에 기반하여 작업 관련 정보만 수신하여 집중적인 커뮤니케이션을 유도하고 관련 없는 세부 사항으로부터의 주의를 방지합니다.
print(response.source_nodes[0].get_content(metadata_mode="all"))
page_label: 1
file_name: metagpt.pdf
file_path: dataset/metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-05-11
last_modified_date: 2024-05-11
Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, are complicated through logic inconsistencies due to
cascading hallucinations caused by naively chaining LLMs. Here we introduce
MetaGPT, an innovative meta-programming framework incorporating efficient
human workflows into LLM-based multi-agent collaborations. MetaGPT en-
codes Standardized Operating Procedures (SOPs) into prompt sequences for more
streamlined workflows, thus allowing agents with human-like domain expertise
to verify intermediate results and reduce errors. MetaGPT utilizes an assembly
line paradigm to assign diverse roles to various agents, efficiently breaking down
complex tasks into subtasks involving many agents working together. On col-
laborative software engineering benchmarks, MetaGPT generates more coherent
solutions than previous chat-based multi-agent systems. Our project can be found
at https://github.com/geekan/MetaGPT.
1 I NTRODUCTION
Autonomous agents utilizing Large Language Models (LLMs) offer promising opportunities to en-
hance and replicate human workflows. In real-world applications, however, existing systems (Park
et al., 2023; Zhuge et al., 2023; Cai et al., 2023; Wang et al., 2023c; Li et al., 2023; Du et al., 2023;
Liang et al., 2023; Hao et al., 2023) tend to oversimplify the complexities. They struggle to achieve
effective, coherent, and accurate problem-solving processes, particularly when there is a need for
meaningful collaborative interaction (Chen et al., 2024; Zhang et al., 2023; Dong et al., 2023; Zhou
et al., 2023; Qian et al., 2023).
Through extensive collaborative practice, humans have developed widely accepted Standardized
Operating Procedures (SOPs) across various domains (Belbin, 2012; Manifesto, 2001; DeMarco &
Lister, 2013). These SOPs play a critical role in supporting task decomposition and effective coor-
dination. Furthermore, SOPs outline the responsibilities of each team member, while establishing
standards for intermediate outputs. Well-defined SOPs improve the consistent and accurate exe-
cution of tasks that align with defined roles and quality standards (Belbin, 2012; Manifesto, 2001;
DeMarco & Lister, 2013; Wooldridge & Jennings, 1998). For instance, in a software company,
Product Managers analyze competition and user needs to create Product Requirements Documents
(PRDs) using a standardized structure, to guide the developmental process.
Inspired by such ideas, we design a promising GPT -based Meta -Programming framework called
MetaGPT that significantly benefits from SOPs. Unlike other works (Li et al., 2023; Qian et al.,
2023), MetaGPT requires agents to generate structured outputs, such as high-quality requirements
∗These authors contributed equally to this work.
†Chenglin Wu (alexanderwu@fuzhi.ai) is the corresponding author, affiliated with DeepWisdom.
1
response = agent.chat(
"사용된 평가 데이터 세트에 대해 알려줄래?"
)
Added user message to memory: 사용된 평가 데이터 세트에 대해 알려줄래?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "\uc0ac\uc6a9\ub41c \ud3c9\uac00 \ub370\uc774\ud130 \uc138\ud2b8"}
=== Function Output ===
The evaluation datasets used in the research include HumanEval, MBPP, and SoftwareDev.
=== LLM Response ===
연구에서 사용된 평가 데이터 세트는 HumanEval, MBPP, 그리고 SoftwareDev를 포함합니다.
response = agent.chat("의 데이터 세트 중 하나에 대한 결과를 알려줄래?")
Added user message to memory: 의 데이터 세트 중 하나에 대한 결과를 알려줄래?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "Results of HumanEval dataset in MetaGPT"}
=== Function Output ===
The results of the HumanEval dataset in MetaGPT are yet to be explored.
=== LLM Response ===
The results of the HumanEval dataset in MetaGPT have not been explored yet.
agent_worker = FunctionCallingAgentWorker.from_tools(
[vector_tool, summary_tool],
llm=llm,
verbose=True
)
agent = AgentRunner(agent_worker)
task = agent.create_task(
"MetaGPT의 Agent 역할에 대해 알려줄래?"
"그리고 그들이 서로 커뮤니케이션하는 방식에 대해 설명해줘."
)
step_output = agent.run_step(task.task_id)
Added user message to memory: MetaGPT의 Agent 역할에 대해 알려줄래?그리고 그들이 서로 커뮤니케이션하는 방식에 대해 설명해줘.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "Agent role in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT encompass a variety of responsibilities crucial to the software development process. These roles include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for creating a detailed Product Requirement Document (PRD) that outlines project goals and user stories. The Architect then translates these requirements into system design components. The Project Manager handles task distribution, while Engineers execute the code based on the design. The QA Engineer formulates test cases to ensure the quality of the code. Together, these agents collaborate to transform abstract requirements into detailed software designs and implementations, ensuring the efficiency and quality of the software solutions produced within the MetaGPT framework.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "Communication among agents in MetaGPT"}
=== Function Output ===
Communication among agents in MetaGPT is structured and efficient, facilitated through a shared message pool and subscription mechanism. Agents publish structured messages in the shared pool and can subscribe to relevant information based on their profiles. This approach aims to streamline interactions, enhance collaboration, and ensure that each agent understands their role within the software development framework. The communication process involves the Product Manager creating a detailed Product Requirement Document (PRD), which is then passed on to the Architect for system design. The Project Manager breaks down tasks and assigns them to Engineers, who develop the code based on the provided structure. Finally, the QA Engineer reviews the code, generates unit tests, and ensures high-quality software. This structured communication flow ensures effective contribution from each agent to the development process.
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)
Num completed for task 7bda22ec-0a28-43e9-8e06-96693129a8ff: 1
The agent roles in MetaGPT encompass a variety of responsibilities crucial to the software development process. These roles include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for creating a detailed Product Requirement Document (PRD) that outlines project goals and user stories. The Architect then translates these requirements into system design components. The Project Manager handles task distribution, while Engineers execute the code based on the design. The QA Engineer formulates test cases to ensure the quality of the code. Together, these agents collaborate to transform abstract requirements into detailed software designs and implementations, ensuring the efficiency and quality of the software solutions produced within the MetaGPT framework.
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]
Num upcoming steps for task 7bda22ec-0a28-43e9-8e06-96693129a8ff: 1
TaskStep(task_id='7bda22ec-0a28-43e9-8e06-96693129a8ff', step_id='6715703d-67ea-4f1a-9684-8a185ca3cb6c', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)
step_output = agent.run_step(
task.task_id, input="Agent가 정보를 공유하는 방식은 어떻게 해?"
)
Added user message to memory: Agent가 정보를 공유하는 방식은 어떻게 해?
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "Agent information sharing in MetaGPT", "page_numbers": ["5"]}
=== Function Output ===
Agent information sharing in MetaGPT involves structured communication interfaces. The software development process in MetaGPT emphasizes the significant dependence on Standard Operating Procedures (SOPs). The communication protocol in MetaGPT utilizes structured communication interfaces to facilitate information sharing among agents.
step_output = agent.run_step(task.task_id)
print(step_output.is_last)
=== LLM Response ===
Agent들은 MetaGPT에서 정보를 공유하기 위해 구조화된 커뮤니케이션 인터페이스를 활용합니다. 이는 표준 운영 절차(SOPs)에 크게 의존하는 소프트웨어 개발 프로세스에서 중요한 역할을 합니다. MetaGPT의 커뮤니케이션 프로토콜은 구조화된 커뮤니케이션 인터페이스를 활용하여 에이전트들 간의 정보 공유를 원활하게 합니다.
True
response = agent.finalize_response(task.task_id)
print(str(response))
assistant: Agent들은 MetaGPT에서 정보를 공유하기 위해 구조화된 커뮤니케이션 인터페이스를 활용합니다. 이는 표준 운영 절차(SOPs)에 크게 의존하는 소프트웨어 개발 프로세스에서 중요한 역할을 합니다. MetaGPT의 커뮤니케이션 프로토콜은 구조화된 커뮤니케이션 인터페이스를 활용하여 에이전트들 간의 정보 공유를 원활하게 합니다.