LangChain: How an Agent works
Overview
While you may have come across information about Agents in the LangChain documentation, understanding the intricate workings of these agents might still be unclear. This post aims to delve into the mechanisms behind agents to empower readers with a comprehensive understanding of this potent tool.
Overview
The diagram above provides a basic overview of the components related to agents.
Concepts
Agent
As outlined in the documentation, the Agent encompasses the following abstractions:
- AgentAction: Represents the subsequent action to be taken, comprising a tool and tool_input.
- AgentFinish: The final result from the agent, which contains the final agent output in return_values.
- Intermediate Steps: Denotes preceding agent actions and their corresponding outputs, organized as a
List of Tuples [AgentAction, Any]
.
This structure indicates that multiple agent actions may be executed for a query if necessary, with intermediate action executions being stored in the intermediate steps.
AgentExecutor
The AgentExecutor is tasked with utilizing the Agent until the final output is attained. Consequently, it employs the Agent to obtain the next action, executes the returned action iteratively, and continues this process until a conclusive answer is generated for the given input.
Agent Example
Let’s delve into a simple example to illustrate the process!
from langchain_openai import OpenAI
import langchain
from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent, Tool
from langchain_community.utilities import GoogleSearchAPIWrapper
# use google as a tool
google = GoogleSearchAPIWrapper()
def top5_results(query):
return google.results(query, 5)
TOOL_GOOGLE = Tool(
name="google-search",
description="Search Google for recent results.",
func=top5_results,
)
tools = [TOOL_GOOGLE]
# prompt
prompt = hub.pull("hwchase17/react-chat")
# llm
llm = OpenAI(temperature=0)
# agent
agent = create_react_agent(
llm=llm,
tools=tools,
prompt=prompt,
)
In this example, we’ve prepared a single tool for Google search. To utilize Google search functionality, it’s necessary to have both GOOGLE_CSE_ID
and GOOGLE_API_KEY
configured.
For details on setting this up, please refer to the documentation at Google Search Integration.
Subsequently, we initialize an agent by incorporating the language model (llm), the prepared tool, and the specified prompt using the create_react_agent
function.
Let’s delve deeper into the agent’s composition:
The agent is classified as a RunnableSequence
.
type(agent)
<class 'langchain_core.runnables.base.RunnableSequence'>
It’s constructed with RunnableAssign
, PromptTemplate
, RunnableBinding
, and ReActSingleInputOutputParser
, structured in the form of LangChain Expression Language (LCEL):
>>> agent
RunnableAssign(mapper={
agent_scratchpad: RunnableLambda(lambda x: format_log_to_str(x['intermediate_steps']))
})
| PromptTemplate(input_variables=['agent_scratchpad', 'chat_history', 'input'], partial_variables={'tools': 'google-search: Search Google for recent results.', 'tool_names': 'google-search'}, metadata={'lc_hub_owner': 'hwchase17', 'lc_hub_repo': 'react-chat', 'lc_hub_commit_hash': 'xxxxxx'}, template='Assistant is a large language model trained by OpenAI.\n\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n\nOverall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n\nTOOLS:\n------\n\nAssistant has access to the following tools:\n\n{tools}\n\nTo use a tool, please use the following format:\n\n```\nThought: Do I need to use a tool? Yes\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n```\n\nWhen you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:\n\n```\nThought: Do I need to use a tool? No\nFinal Answer: [your response here]\n```\n\nBegin!\n\nPrevious conversation history:\n{chat_history}\n\nNew input: {input}\n{agent_scratchpad}')
| RunnableBinding(bound=OpenAI(client=<openai.resources.completions.Completions object at 0x1138bcd70>, async_client=<openai.resources.completions.AsyncCompletions object at 0x1138bebd0>, temperature=0.0, openai_api_key=SecretStr('**********'), openai_organization='org-xxxxxx', openai_proxy=''), kwargs={'stop': ['\nObservation']})
| ReActSingleInputOutputParser()
The first step is RunnableAssign
, responsible for assigning key-value pairs to Dict[str, Any]
inputs. In this instance, the key is agent_scratchpad
, and the value is a RunnableLambda
that transforms the intermediate_steps
into a string:
>>> agent.steps[0]
RunnableAssign(mapper={
agent_scratchpad: RunnableLambda(lambda x: format_log_to_str(x['intermediate_steps']))
})
Let’s explore further into the agent’s steps:
The second step involves obtaining the prompt from https://smith.langchain.com/hub:
>>> agent.steps[1]
PromptTemplate(
input_variables=['agent_scratchpad', 'chat_history', 'input'],
partial_variables={
'tools': 'google-search: Search Google for recent results.',
'tool_names': 'google-search'
},
metadata={
'lc_hub_owner': 'hwchase17',
'lc_hub_repo': 'react-chat',
'lc_hub_commit_hash': '3ecd5f710db438a9cf3773c57d6ac8951eefd2cd9a9b2a0026a65a0893b86a6e'},
template='Assistant is a large language model trained by OpenAI.\n\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n\nOverall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n\nTOOLS:\n------\n\nAssistant has access to the following tools:\n\n{tools}\n\nTo use a tool, please use the following format:\n\n```\nThought: Do I need to use a tool? Yes\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n```\n\nWhen you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:\n\n```\nThought: Do I need to use a tool? No\nFinal Answer: [your response here]\n```\n\nBegin!\n\nPrevious conversation history:\n{chat_history}\n\nNew input: {input}\n{agent_scratchpad}'
)
Here, input_variables
(agent_scratchpad
, chat_history
, and input
) are passed when the input is provided. As seen in the previous step, agent_scratchpad
is generated based on the intermediate_steps
, containing the agent actions history and corresponding outputs.
The exact template for the prompt is as follows:
>>> print(agent.steps[1].template)
Assistant is a large language model trained by OpenAI.
Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.
Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.
Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.
TOOLS:
------
Assistant has access to the following tools:
{tools}
To use a tool, please use the following format:
```
Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
```
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:
```
Thought: Do I need to use a tool? No
Final Answer: [your response here]
```
Begin!
Previous conversation history:
{chat_history}
New input: {input}
{agent_scratchpad}
The third step involves a RunnableBinding
, which wraps a runnable with additional functionality like streaming, batching, and async. This step simply binds the runnables above with the OpenAI
client along with additional kwargs:
>>> agent.steps[2]
RunnableBinding(
bound=OpenAI(
client=<openai.resources.completions.Completions object at 0x1138bcd70>,
async_client=<openai.resources.completions.AsyncCompletions object at 0x1138bebd0>,
temperature=0.0,
openai_api_key=SecretStr('**********'),
openai_organization='org-xxxx', openai_proxy=''
),
kwargs={'stop': ['\nObservation']}
)
The additional kwargs stop
means that it truncates the output string after the given string:
>>> from langchain_community.chat_models import ChatOpenAI
>>> model = ChatOpenAI()
>>> model.invoke('Say "Parrot-MAGIC"')
AIMessage(content='Parrot-MAGIC', response_metadata={'finish_reason': 'stop', 'logprobs': None})
>>> runnable_binding.invoke('Say "Parrot-MAGIC"')
AIMessage(content='Parrot', response_metadata={'finish_reason': 'stop', 'logprobs': None})
The final step involves the ReActSingleInputOutputParser()
:
>>> agent.steps[3]
ReActSingleInputOutputParser()
This output parser is responsible for parsing the output from the language model and returning either AgentAction
or AgentFinish
. It’s straightforward to understand with the examples provided below:
>>> agent.steps[3].parse("Thought: agent thought here Action: search Action Input: what is the temperature in SF?")
AgentAction(tool='search', tool_input='what is the temperature in SF?', log='Thought: agent thought here Action: search Action Input: what is the temperature in SF?')
>>> agent.steps[3].parse("Thought: agent thought here Final Answer: The temperature is 100 degrees")
AgentFinish(return_values={'output': 'The temperature is 100 degrees'}, log='Thought: agent thought here Final Answer: The temperature is 100 degrees')
Now that we have a clearer understanding of what’s inside an agent, let’s proceed to run the agent. Since we observed the input_variables=[‘agent_scratchpad’, ‘chat_history’, ‘input’]
in the PromptTemplate, we need to provide those three input variables to the invoke function:
agent.invoke(
{
"input": "Who is Japan's prime minister?",
"intermediate_steps": [],
"chat_history": [],
}
)
Now the result would be something like this:
AgentAction(tool='google-search', tool_input='Japan prime minister', log='\nThought: Do I need to use a tool? Yes\nAction: google-search\nAction Input: Japan prime minister')
This output indicates that we need to use the google-search tool with the input ‘Japan prime minister’ to obtain the final answer.
In summary, the Agent determines the action to take based on the input and the previous steps.
Now, we require another component, the AgentExecutor, to execute the Agent. It utilizes the tool based on the Agent’s result (AgentAction
) until obtaining the final answer (AgentFinish
).
Let’s explore how the AgentExecutor functions in the subsequent section.
AgentExecutor
The AgentExecutor is responsible for orchestrating the execution of the agent. It calls the agent, executes the actions returned by the agent, and then calls the agent again with the result of the executed actions. This process continues until the final answer is received from the agent.
The pseudo code illustrating this logic is as follows: (ref)
next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action
Now, let’s integrate an AgentExecutor into the example agent we previously examined:
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
handle_parsing_errors=False,
)
AgentExecutor inherits from Chain, which has the implementation of the invoke function.
The main part is here:
class Chain(RunnableSerializable[Dict[str, Any], Dict[str, Any]], ABC):
...
def invoke(
self,
...
):
...
try:
self._validate_inputs(inputs)
outputs = (
self._call(inputs, run_manager=run_manager)
if new_arg_supported
else self._call(inputs)
)
final_outputs: Dict[str, Any] = self.prep_outputs(
inputs, outputs, return_only_outputs
)
...
return final_outputs
self._call
is called with inputs
and the outputs returned after modifying with prep_outputs
. The real logic is implemented in _call
in AgentExecutor
:
def _call(
self,
inputs: Dict[str, str],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
"""Run text through and get agent response."""
# Construct a mapping of tool name to tool for easy lookup
name_to_tool_map = {tool.name: tool for tool in self.tools}
# We construct a mapping from each tool to a color, used for logging.
color_mapping = get_color_mapping(
[tool.name for tool in self.tools], excluded_colors=["green", "red"]
)
intermediate_steps: List[Tuple[AgentAction, str]] = []
# Let's start tracking the number of iterations and time elapsed
iterations = 0
time_elapsed = 0.0
start_time = time.time()
# We now enter the agent loop (until it returns something).
while self._should_continue(iterations, time_elapsed):
next_step_output = self._take_next_step(
name_to_tool_map,
color_mapping,
inputs,
intermediate_steps,
run_manager=run_manager,
)
if isinstance(next_step_output, AgentFinish):
return self._return(
next_step_output, intermediate_steps, run_manager=run_manager
)
intermediate_steps.extend(next_step_output)
if len(next_step_output) == 1:
next_step_action = next_step_output[0]
# See if tool should return directly
tool_return = self._get_tool_return(next_step_action)
if tool_return is not None:
return self._return(
tool_return, intermediate_steps, run_manager=run_manager
)
iterations += 1
time_elapsed = time.time() - start_time
output = self.agent.return_stopped_response(
self.early_stopping_method, intermediate_steps, **inputs
)
return self._return(output, intermediate_steps, run_manager=run_manager)
- The
while
loop runs as long as the condition specified bywhile self._should_continue(iterations, time_elapsed):
evaluates toTrue
. This condition ensures that the agent continues to execute until either the maximum number of iterations is reached or the execution time exceeds the limit. - Inside the loop, the
_take_next_step
function is called to obtain the next action from the agent. This function provides inputs such as the mapping of tool names to tools, color mapping for logging, current inputs, intermediate steps, and the run manager. - The output of
_take_next_step
can be either anAgentAction
or anAgentFinish
. If anAgentFinish
is obtained, indicating that the final answer has been reached, the_return
function is called to return the final output along with the intermediate steps. - If the output is an
List[Tuple[AgentAction, str]]
, it’s added tointermediate_steps
list. - If the
next_step_output
consists of a singleAgentAction
, the_get_tool_return
function is called to check if the action should be directly returned by a tool. If so, the tool’s return value is used as the final output, and theintermediate_steps
are included in the return. - The loop continues to iterate, incrementing
iterations
and updatingtime_elapsed
. - Once the loop exits, the output is generated using
agent.return_stopped_response
method, which handles cases where the agent execution stopped due to reaching maximum iterations or exceeding time limits. The final output is returned along with the intermediate steps.
This comprehensive explanation illustrates how the AgentExecutor
orchestrates the execution of the agent, iteratively calling actions until a final answer is obtained. It also handles various scenarios, such as early stopping conditions and tool returns, to ensure the smooth execution of the agent's logic.
Let’s invoke the AgentExecutor!
agent_executor.invoke({"input": "Who is Japan's prime minister?", 'chat_history': []})
> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? Yes
Action: google-search
Action Input: Japan prime minister[{'title': 'Prime Minister of Japan - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Prime_Minister_of_Japan', 'snippet': 'The prime minister also serves as the commander-in-chief of the Japan Self Defence Forces and is a sitting member of either house of the National Diet (\xa0...'}, {'title': "Prime Minister's Office of Japan", 'link': 'https://japan.kantei.go.jp/', 'snippet': 'Press Conference by Prime Minister Kishida on His Visit to Fukushima Prefecture and Other Matters · #DisasterResponse · #GreatEastJapanQuake · #Reconstruction\xa0...'}, {'title': "Contact us | Opinions and Impressions | Prime Minister's Office ...", 'link': 'https://www.kantei.go.jp/foreign/forms/comment_ssl.html', 'snippet': 'Note. Feel free to submit your message in Japanese or English. The message is limited to a maximum of 2,000 characters (including line breaks and spaces).'}, {'title': "Previous Prime Ministers | Prime Minister's Office of Japan", 'link': 'https://japan.kantei.go.jp/past_cabinet/', 'snippet': "Previous Prime Ministers - Prime Minister's Office of Japan."}, {'title': 'Congressional leaders invite prime minister of Japan to address ...', 'link': 'https://apnews.com/article/congress-japan-prime-minister-kishida-ec22e62cdee0d258ccce0a88335e964d', 'snippet': 'Mar 4, 2024 ... Congressional leaders have invited the prime minister of Japan, Fumio Kishida, to address a joint meeting of Congress on April 11.'}]Do I need to use a tool? No
Final Answer: The current prime minister of Japan is Fumio Kishida.
> Finished chain.
{'input': "Who is Japan's prime minister?", 'chat_history': [], 'output': 'The current prime minister of Japan is Fumio Kishida.'}
In this simple case, the first attempt to call the agent returns the AgentAction
with tool google-search
and tool_input “Japan prime minister”. And the tool returned the following text:
[{'title': 'Prime Minister of Japan - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Prime_Minister_of_Japan', 'snippet': 'The prime minister also serves as the commander-in-chief of the Japan Self Defence Forces and is a sitting member of either house of the National Diet (\xa0...'}, {'title': "Prime Minister's Office of Japan", 'link': 'https://japan.kantei.go.jp/', 'snippet': 'Press Conference by Prime Minister Kishida on His Visit to Fukushima Prefecture and Other Matters · #DisasterResponse · #GreatEastJapanQuake · #Reconstruction\xa0...'}, {'title': "Contact us | Opinions and Impressions | Prime Minister's Office ...", 'link': 'https://www.kantei.go.jp/foreign/forms/comment_ssl.html', 'snippet': 'Note. Feel free to submit your message in Japanese or English. The message is limited to a maximum of 2,000 characters (including line breaks and spaces).'}, {'title': "Previous Prime Ministers | Prime Minister's Office of Japan", 'link': 'https://japan.kantei.go.jp/past_cabinet/', 'snippet': "Previous Prime Ministers - Prime Minister's Office of Japan."}, {'title': 'Congressional leaders invite prime minister of Japan to address ...', 'link': 'https://apnews.com/article/congress-japan-prime-minister-kishida-ec22e62cdee0d258ccce0a88335e964d', 'snippet': 'Mar 4, 2024 ... Congressional leaders have invited the prime minister of Japan, Fumio Kishida, to address a joint meeting of Congress on April 11.'}]
And then, this is added to intermediate_steps
and in the next loop, the agent is called with the following prompt and returned the final answer.
[llm/start] [1:chain:AgentExecutor > 10:chain:RunnableSequence > 15:llm:OpenAI] Entering LLM run with input:
{
"prompts": [
"Assistant is a large language model trained by OpenAI.\n\nAssistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.\n\nAssistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.\n\nOverall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.\n\nTOOLS:\n------\n\nAssistant has access to the following tools:\n\ngoogle-search: Search Google for recent results.\n\nTo use a tool, please use the following format:\n\n```\nThought: Do I need to use a tool? Yes\nAction: the action to take, should be one of [google-search]\nAction Input: the input to the action\nObservation: the result of the action\n```\n\nWhen you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:\n\n```\nThought: Do I need to use a tool? No\nFinal Answer: [your response here]\n```\n\nBegin!\n\nPrevious conversation history:\n[]\n\nNew input: Who is Japan's prime minister?\n\nThought: Do I need to use a tool? Yes\nAction: google-search\nAction Input: Japan prime minister\nObservation: [{'title': 'Prime Minister of Japan - Wikipedia', 'link': 'https://en.wikipedia.org/wiki/Prime_Minister_of_Japan', 'snippet': 'The prime minister also serves as the commander-in-chief of the Japan Self Defence Forces and is a sitting member of either house of the National Diet (\\xa0...'}, {'title': \"Prime Minister's Office of Japan\", 'link': 'https://japan.kantei.go.jp/', 'snippet': 'Press Conference by Prime Minister Kishida on His Visit to Fukushima Prefecture and Other Matters · #DisasterResponse · #GreatEastJapanQuake · #Reconstruction\\xa0...'}, {'title': \"Contact us | Opinions and Impressions | Prime Minister's Office ...\", 'link': 'https://www.kantei.go.jp/foreign/forms/comment_ssl.html', 'snippet': 'Note. Feel free to submit your message in Japanese or English. The message is limited to a maximum of 2,000 characters (including line breaks and spaces).'}, {'title': \"Previous Prime Ministers | Prime Minister's Office of Japan\", 'link': 'https://japan.kantei.go.jp/past_cabinet/', 'snippet': \"Previous Prime Ministers - Prime Minister's Office of Japan.\"}, {'title': 'Congressional leaders invite prime minister of Japan to address ...', 'link': 'https://apnews.com/article/congress-japan-prime-minister-kishida-ec22e62cdee0d258ccce0a88335e964d', 'snippet': 'Mar 4, 2024 ... Congressional leaders have invited the prime minister of Japan, Fumio Kishida, to address a joint meeting of Congress on April 11.'}]\nThought:"
]
}
[llm/end] [1:chain:AgentExecutor > 10:chain:RunnableSequence > 15:llm:OpenAI] [831ms] Exiting LLM run with output:
{
"generations": [
[
{
"text": "Do I need to use a tool? No\nFinal Answer: The current prime minister of Japan is Fumio Kishida.",
"generation_info": {
"finish_reason": "stop",
"logprobs": null
},
"type": "Generation"
}
]
],
"llm_output": null,
"run": null
}
The output parser parses the output into AgentFinish
and the AgentExecutor returns the final answer:
[chain/end] [1:chain:AgentExecutor] [2.37s] Exiting Chain run with output:
{
"output": "The current prime minister of Japan is Fumio Kishida."
}
{'input': "Who is Japan's prime minister?", 'chat_history': [], 'output': 'The current prime minister of Japan is Fumio Kishida.'}
You can check all the interactions among llm, agent, and output parser using langchain.debug = True
.
Summary
In this post, we delved into the intricacies of how LangChain’s Agent operates, offering a comprehensive understanding of its inner workings. By examining each component in detail, we gained insight into the mechanisms that power Agents and facilitate their customization for maximum effectiveness.
We began by exploring the core concepts underlying Agents, including AgentAction
, AgentFinish
, and intermediate steps, providing a foundation for understanding their functionality. Next, we examined the AgentExecutor
, which orchestrates the execution of Agents, calling actions and handling outputs until a final answer is obtained.
Further, we scrutinized the logic behind the AgentExecutor
's _call
method, breaking down its iterative process of action execution and output parsing. Through a detailed walkthrough of an invocation example, we witnessed the seamless interaction between components, culminating in the retrieval of the desired information.
By navigating through these details, readers are equipped with the knowledge to tailor Agents to their specific needs, harnessing the full potential of this powerful tool. With a deeper understanding of LangChain’s Agent, users can explore advanced customization options and leverage its capabilities to address diverse tasks and scenarios effectively.