One of the most common requests we've heard is better functionality and documentation for creating custom agents. This has always been a bit tricky - because in our mind it's actually still very unclear what an "agent" actually is, and therefor what the "right" abstractions for them may be. Recently, we've felt some of the abstractions starting to come together, so we did a big push across both our Python and TypeScript modules to better enforce and document these abstractions. Please see below for links to those technical docs, and then a description of the abstractions we've introduced and future directions.
TL;DR: we've introduced a BaseSingleActionAgent
as the highest level abstraction for an agent that can be used in our current AgentExecutor
. We've added a more practical LLMSingleActionAgent
that implements this interface in a simple and extensible way (PromptTemplate + LLM + OutputParser).
BaseSingleActionAgent
The most base abstraction we've introduced is a BaseSingleActionAgent
. As you can tell by the name, we don't consider this a base abstraction for all agents. Rather, we consider this the base abstraction for a family of agents that predicts a single action at a time.
A SingleActionAgent
is used in an our current AgentExecutor
. This AgentExecutor
can largely be thought of as a loop that:
- Passes user input and any previous steps to the Agent
- If the Agent returns an
AgentFinish
, then return that directly to the user - If the Agent returns an
AgentAction
, then use that to call a tool and get anObservation
- Repeat, passing the
AgentAction
andObservation
back to the Agent until anAgentFinish
is emitted.
AgentAction
is a response that consists of action
and action_input
. action
refers to which tool to use, and action_input
refers to the input to that tool.
AgentFinish
is a response that contains the final message to be sent back to the user. This should be used to end an agent run.
If you are interested in this level of customizability, check out this walkthrough. For most use cases, however, we would recommend using the abstraction below.
LLMSingleActionAgent
Another class we've introduced is the LLMSingleActionAgent
. This is a concrete implementation of the BaseSingleActionAgent
, but is highly modular so therefor is highly customizable.
The LLMSingleActionAgent
consists of four parts:
PromptTemplate
: This is the prompt template that can be used to instruct the language model on what to doLLM
: This is the language model that powers the agentstop
sequence: Instructs theLLM
to stop generating as soon as this string is foundOutputParser
: This determines how to parse the output of anLLM
into anAgentAction
orAgentFinish
object
The logic for combining these is:
- Use the
PromptTemplate
to turn the input variables (inlcuding user input and any previousAgentAction
,Observation
pairs) into a prompt - Pass the prompt to the
LLM
, with a specificstop
sequence - Parse the output of the
LLM
into anAgentAction
orAgentFinish
object
These abstraction can be used to customize your agent in a lot of ways. For example:
- Want to give your agent some personality? Use the
PromptTemplate
! - Want to format the previous
AgentAction
,Observation
pairs in a specific way? Use thePromptTemplate
! - Want to use a custom or local model? Write a custom LLM wrapper and pass that in as the LLM!
- Is the output parsing too brittle, or you want to handle errors in a different way? Use a custom OutputParser!
(The last one is in bold, because that's the one we'v maybe heard the most)
We imagine this being the most practically useful abstraction. Please see the documentation links at the beginning of the blog for links to concrete Python/TypeScripts guides for getting started here.
Future Directions
We hope these abstractions have clarified some of our thinking around agents, as well as open up places where we hope the community can contribute. In particular:
We are very excited about other examples of SingleActionAgents
, like:
- Using embeddings to do tool selection before calling an
LLM
- Using a
ConstitutionalChain
instead of anLLMChain
to improve reliability
We are also excited about other types of agents (which will require new AgentExecutors
), like:
- Multi-action agents
- Plan-execute agents
If any of those sound interesting, we are always willing to work with folks to implement their ideas! The best way is probably to do some initial work, open a RFC pull request, and we're happy to go from there :)