Autonomous Agents & Agent Simulations

Over the past two weeks, there has been a massive increase in using LLMs in an agentic manner. Specifically, projects like AutoGPT, BabyAGI, CAMEL, and Generative Agents have popped up. The LangChain community has now implemented some parts of all of those projects in the LangChain framework. While researching and implementing these projects, we’ve tried to best understand what the differences between them are and what the novel features of each are. This blog is an explanation of what we’ve learned.

Note: this is a fairly technical blog. It assumes some familiarity with LangChain and these related projects. If you are not familiar with these projects, it may be helpful to read a more introductory piece (like this fantastic one by Sophia Yang).

TL;DR:

The “autonomous agents” projects (BabyAGI, AutoGPT) are largely novel in their long-term objectives, which necessitate new types of planning techniques and a different use of memory.
The “agent simulation” projects (CAMEL, Generative Agents) are largely novel for their simulation environments and long-term memory that reflects and adapts based on events.

We also discuss what parts of each project we’ve replicated in the LangChain framework, and why we chose those parts. Implementing these in the LangChain framework has the benefits of:

Allowing easy switching between LLM providers
Allowing easy switching of VectorStore providers (or, alternative retrieval methods)
Allowing connectivity to LangChain’s collection of tools
Allowing connectivity to the LangChain ecosystem in general

Background

First, let’s start with some background context. What are “agents” and why are they important? For this discussion, we will use LangChain nomenclature, although it’s worth noting that this field is so new there’s no super standard terminology.

Agents generally refer to the idea of using a language model as a reasoning engine and connecting it to two key components: tools and memory.

Tools help connect the LLM to other sources of data or computation. Examples of tools include search engines, APIs, and other datastores. Tools are useful because LLMs only have knowledge of what they were trained on. This knowledge can quickly get out-of-date. In order to overcome this limitations, tools can fetch up-to-date data and insert it as context into the prompt. Tools can also be used to take actions (e.g. run code, modify files, etc), and the outcome of that action can then be observed by the LLM and factored into their decision on what to do next.

Memory helps the agent remember previous interactions. These interactions can either be with other entities (humans or other agents) or with tools. These memories can either be short term (e.g. a list of the previous 5 tool usages) or long term (tool usages from the past that seem most similar to the current situation).

Within LangChain, we refer to an “Agent” as the LLM that decides what actions to take; “Tools” as the actions an Agent can take; “Memory” the act of pulling in previous events, and an AgentExecutor as the logic for running an Agent in a while-loop until some stopping criteria is met.

The stereotypical LangChain Agent is based on the Reasoning and Acting (ReAct) framework proposed by Yao et all in November of 2022. This approach is characterized by the following algorithm:

User gives an agent a task
Thought: The agent “thinks” about what to do
Action/Action Input: The agent decides what action to take (aka what tool to use) and what the input to that tool should be
Observation: The output of the tool
Repeat steps 2-4 until the Agent “thinks” it is done

When discussing other implementations and frameworks we will compare them to this algorithm.

AutoGPT

Links:

What is novel about this project?

The main differences between the AutoGPT project and traditional LangChain agents can be attributed to different objectives. In AutoGPT, the goals are often more open ended and long running. This means that AutoGPT has a different AgentExecutor and different way of doing memory (both of which are more optimized for long running tasks). Previously, memory of agents in LangChain had two forms:

Memory of agent steps: this was done by keeping a list of intermediate agent steps relevant for that task, and passing the full list to the LLM calls
Memory of system: this remembered the final inputs and outputs (but forgot the intermediate agent steps)

Because AutoGPT is more long running, passing the full list of agent steps to the LLM call is no longer feasible. Instead, AutoGPT added a retrieval-based memory over the intermediate agent steps. Under the hood, this retrieval-based memory is doing doing semantic search over embeddings, using a VectorStore. Note that LangChain has this type of retrieval-based memory, but it was previously applied to user-agent interactions, not agent-tool interations.

How did we incorporate this into LangChain?

We added a version of this to langchain.experimental - a place where we are putting more experimental and newer code while we figure out the proper abstractions. Specifically, we’ve implemented the prompt templating logic used, as well as while loop used to run the agent. We’ve made it compatible with LangChain LLM wrappers, LangChain VectorStores, and LangChain tools.

We’ve also created this notebook showing how to use it.

BabyAGI

Links:

What is novel about this project?

The BabyAGI project differs from traditional LangChain Agents in the following regards:

Similar to AutoGPT, it applies retrieval-based memory to intermediate agent-tool steps.
It has separate planning and execution steps, where it plans a sequence of actions all at once (rather than just the next one)

Similar to AutoGPT, BabyAGI is designed for more long running tasks, which lead to both of these differences.

Let’s expand on the second point, since that is one of the more important and substantial differences. In the traditional LangChain Agent framework (and the AutoGPT framework), the agent thinks one step ahead at a time. For a given state of the world it think about what its next immediate action should be, and then does that action.

BabyAGI differs in that it explicitly plans out a sequence of actions. It then executes on the first one, and then uses the result of that to do another planning step and update it’s task list. Our intuition is that this enables it to execute better on more complex and involved tasks, by using the planning steps essentially as a state tracking system.. We’ve observed (anecdotally) that for tasks that require many steps, the traditional LangChain Agent can sometimes forget its original objective after a few steps, so planning all the steps ahead of time could be beneficial.

How did we incorporate this into LangChain?

Similar to AutoGPT, we added this to langchain.experimental. Specifically, we’ve implemented the prompt templating logic used, as well as while loop used to run the agent. We’ve made it compatible with LangChain LLM wrappers, LangChain vectorstores, and LangChain tools.

Camel

Links:

What is novel about this project?

The main novelty in this project comes from taking two agents, each with their own personality, and having them chat with each other. In this sense there are two novel components: the idea of having two agents interact with each other in a collaborative manner, and the specific simulation environment.

The idea of two agents interacting is not entirely new. Given the modular nature of LangChain, we have long been proponents of having agents use other agents as tools. However, what is novel about this type of interaction is that the two agents are poised as equals - in previous LangChain implementations there has always been one agent which calls the other as a tool, in a "stacking" approach. This idea of putting both agents on equal footing, rather than having one use the other as a tool strikes a chord of being particularly interesting to see evolving behavior emerge.

Note that these agents can have different tools available to them and could be specialized around that. For example, you could have one agent that is armed with tools needed for coding, another with tools needed for interacting with linear, etc. So it is still possible to achieve a "stacking" effect (where you have different agents responsible for different things).

The second novel component was the particular simulation environment. This is a two sided conversation, and is not terribly complex but still the first implementation of this in a research setting we have seen.

How did we incorporate this into LangChain?

We added a notebook, largely reflecting the simulation environment (having two agents chatting with each other). We may look into making this simulation environment more available off-the-shelf in the future.

Generative Agents

Links:

What is novel about this project?

There are two novel (and fairly complex) aspects to this project. The first is the simulation environment, which consists of 25 different agents. This seems fairly specific, and very complex, so we did not dive into this too much. The other aspect that is novel is the long-term memory they created for these agents.

We did a deep dive on this earlier this week. The agents’ memory is made up of:

Importance reflection steps, to give each observation an importance score. This score can be used in retrieval down the line to fetch particularly important memories and ignore basic ones
Reflection steps, to “pause” and think about what generalizations the agent has learned. These reflections can then be retrieved alongside normal memories. This reflection step can serve to condense information and observe patterns in recent memories
A retriever that combines recency, relevancy to the situation, and importance. This can allow for surfacing of memories that similar to the situation at hand, happened a short while ago, and particularly important. All of these seem to be attributes that naturally reflect how we as human “retrieve” memories

All of these memory components are fairly novel, and extremely exciting to us.

How did we incorporate this into LangChain?

The retriever logic seemed generalizable so we added it as a TimeWeightedVectorStoreRetriever.

We added a notebook showing off how to use the reflection steps + the new retriever to replicate part of the setup the paper described.

The simulation environment seemed complex and not super generalizable so we did not do any thing there.

Conclusion

All of these projects rightfully garnered a lot of attention. We view them as two separate categories:

Autonomous Agents, which have improved planning abilities
Agent Simulations, which have novel simulation environments and complex, evolving memory

We’re excited to have started implementing parts of these projects in the LangChain ecosystem, and look forward to seeing how the community uses these, adds to these, and combines these 🙂

Background

AutoGPT

BabyAGI

Camel

Generative Agents

Conclusion

Tags

Join our newsletter