The Agents Newsletter #9 - Why Multi-hop Reasoning Is Harder To Get Right
Hello agent enthusiasts. Thanks for tuning into this 9th issue of The Agents Newsletter. If this is your first edition, I’d highly recommend reading the first two issues, which will give you some of the basics of what agents are and how they work. In today’s issue, we’re going to discuss why agents that require more than one step to reason are harder to build and why.
Let’s start by reviewing how an autonomous agent reasons under the hood.
Step-based reasoning
If you recall, the underlying process an agent takes when responding to requests is actually quite simple - decide on a tool, execute the tool, analyze the results, and decide what to do next. In simpler agents, that last part, deciding what to do next, is pretty straightforward. It’s just a response to the user. If you’ve ever used ChatGPT with web browsing, you’ve already used a single-hop agent.
More sophisticated agents, and agents that are tasked with more complicated workflows, may require running additional tools, synthesizing and analyzing data, or taking a number of other actions, which require them to iterate on the process mentioned above, rather than simply responding to the user with the results. The demos you’ve seen from Claude and OpenAI’s “computer use” agents are examples of these sorts of multi-hop agents (though they don’t use “tools” in the same sense that I’ve been talking about them, rather, the actions they take are centered around moving and clicking the mouse and typing input).
While the difference between these two might seem minor, it leads to a host of complications that dramatically change the way multi-hop agents run under the hood. These agents require many more sub-components, which increases the complexity of the underlying code and the speed at which they can respond.
Here’s why.
Multi-hop reasoning complexity
There’s a lot more that needs to happen when an agent is responsible for executing a series of tasks, rather than a single task.
For one, it needs to know what tasks to execute and also when it’s done executing, neither of which are well-defined and require a lot more reasoning. In contrast, a single-hop agent only needs to select the most likely task according to the user’s request, run it once, and respond to the user, regardless of the results.
Additionally, many multi-hop reasoning agents implement planning, breaking down their execution into multiple steps and sub-steps. Again, single-hop agents don’t require this, since all they really need to do is select one function at a time.
Multi-hop agents can also go haywire, falling into infinite loops. While this happens less often today, given how far LLMs have come, it can still happen, so developers who create multi-hop agents need to implement checks like time-based timeouts or LLM fallbacks to account for these issues.
Because of these, and other similar issues, multi-hop agent developers need to implement guardrails and checks (which I covered in my last issue). These checks can lead to increased response times and token counts, increasing the monetary cost of these solutions, which could also limit the number of customers that use them.
These issues will likely limit the use of multi-hop agents for the foreseeable future, while leading to increased adoption of micro-agents and single-hop agents within existing tools, or as the basis for new mini-SaaS apps.
Adoption
Many companies will likely implement single-hop agents into existing applications, adding agentic capabilities into specific parts of their applications that were previously complicated, required some small amount of reasoning, or couldn’t be fully automated. By doing so, they can make their existing applications more user-friendly, automated, and with reduced friction.
Separately, we’ll likely start to see new SaaS apps that don’t require full-fledged multi-hop reasoning and can function entirely on single-hop reasoning. Nobi’s AI shopping assistant is currently an example of this, though we may expand its capabilities in the future. We’ll likely start to see similar products created, those that require only responding to a user by completing only a single task, waiting for their response back, after which they’ll complete another singular task, and so on and so forth. It’s very likely that “agentic” functionality will become a core building block of many applications.
I still strongly believe that, in the long run, we’ll all have general-purpose agents that can think and reason and act much like people can, but I also think we’re still a long ways off. For now, I’m excited to build agents in general (though, I will say, it’s been a lot easier and faster to build Nobi’s single-hop agent than Locusive’s mult-hop agent).
That’s all for today. If you’ve been finding these issues helpful, I’d love to hear from you. Even better, if you think they really suck, I’d also like to hear why. I’m a startup guy, so I’m used to hearing “constructive criticism” and using it to make things better. The worst thing, though, is shouting into the void, so if you’ve got something to say, let me hear it!
Finally, if someone forwarded you this issue and you’re interested in subscribing to this biweekly newsletter, feel free to subscribe here.
-Shanif