DIY Deep Research Agents

“Deep Research” is one of the breakout agentic use cases of 2025 (the other being Coding Assistants). All of the frontier labs have integrated these capabilities into their AI products. Deep Research systems are truly agentic in the sense that they plan, act and reflect upon their work independently until they determine that a task has been properly completed.

OpenAI describes its Deep Research agent as being trained to “plan and execute a multi-step trajectory to find the data it needs, backtracking and reacting to real-time information where necessary.” (openai.com)

Here, a “multi-step trajectory” means the model will take your query and internally figure out what to search for, which search results to read and what follow-up queries to ask. It’ll iterate this process until it has gathered enough information. There’s a name for this kind of pattern in the agentic AI jargon: ReACT (“Reasoning and Acting”) which encapsulates this iterative Think → Act → Reflect → Think → … process.

Claude’s Research mode goes a step further by using a multi-agent pattern. Here, a supervisor agent develops a research plan before spawning “sub-agents” (i.e. new calls to itself) to execute the plan in parallel. It’s the key reason why Claude’s Research mode tends to report having scanned a lot more sources than the OpenAI or Gemini research modes will.

So far this year, I’ve run well over a hundred deep research tasks and built three, custom Deep Research agents for clients. I’ve now got a pretty good feel for the capabilities of the leading labs’ agents and compared them to my own (which are designed for narrow and specific workflows) in practise.

My conclusion: in the right circumstances, “rolling your own” agent - rather than using an generalist research agent - is the superior choice.

What does “rolling your own” involve? Well, first off, it’s not as onerous as it sounds. If you’re considering doing this, you likely hve a repeatable research task in mind. That, in turn, means you probably understand it quite well. In such situations you don’t really need the agent to plan its research on your behalf: you can guide an LLM quite explicitly using your own methodology.

So, what’s the recipe?

DIY Method

A simple approach to building your own research agent is as follows:

  1. Write down your research workflow. (This is the hardest bit.)

  2. Decompose it into a sequence to sub-tasks and write good collateral (prompts, structured input/output templates) to help an LLM perform each step.

  3. Identify the “research tools” that the agent will need. “Web search” is an obvious one but there are many other potential sources of relevant content: databases, social media, news monitoring services, to name but a few.

  4. Using a framework like langgraph as the “connective tissue” of your agent, code your workflow and deploy it.

Once you have Step 1 completed, the rest should take a single person a matter of weeks to complete.

DIY Decision Factors

Still, several weeks of investment is more than nothing.  Better to use something crafted by the experts at Anthropic, if you can.  When might you consider rolling your own research agent?  Reflecting on my work this year, I’ve boiled this decision down into eight, key factors:

Quantity of sources

Even Claude’s multi-agent Research mode will max out at a few hundred sources. Don’t forget that the model is querying a search index. Sometimes, one needs to run an awfully large number of searches and grab many pages of results to be assured of finding all the relevant information. I wrote one recently that typically needs to consider 3,000 sources each time it runs before it could reasonably conclude it had researched everything available. (Part of the challenge was that it was very difficult to specify search queries for this task and the system had to depend a lot on successive iterations of filtering, reviewing, search generation.)  Hand crafting your retrieval, parsing and filtering process enables you to set an upper bound on searches and source retrieval that meets your needs.

Diversity of sources

What if your agent needs to look on social platforms, business systems, subscription services or online databases? Each of these examples can be thought of as a separate “search tool” to sit alongside the standard “web search tool” used by the research agent. You can get some coverage with a deep research agent (e.g. if the sources you need advertise as a MCP service, then Claude Research can use these sources) - if not, you’ll need to set the “tools” up yourself and that entails rolling your own agent.

Structured Outputs

Sometimes you don’t want a research report. Sometimes what you want is a set of outputs that you can tease apart, store in a database, load into an application or send for further processing. Output specifications can get fairly complex. I’ve had some success getting o3’s outputs to not be a standard research report but you’re kicking back against the agent’s post-training protocols when you attempt this and the results are unreliable. As for returning a fully-structured, nested schema of results - I’ve had no luck. Rolling your own research agent allows you to carefully construct a chain of steps which will yield an arbitrarily complicated and unique output structure, suitable for whatever downstream use you intend to make of the results.

Source Specificity

Some research agents I’ve built have a fixed list of sources from which to pull data. On one occasion, even the required language and locale settings had to be changed for different searches. Source lists can be quite large - I’ve created an agent which used a list of a couple of hundred. To ensure comprehensiveness, it is far more reliable to break down the data acquisition and filtering into small steps and run them in parallel. Relying on an LLM to guarantee to search every source from a long list will not work. Far better to do it yourself - and furthermore to have the benefit of a logging output to tell you what was successful and what was not.

Traceability

Whilst we’re on the subject of logging: not every search that a research agent makes will work. Certain sites block AI models from reading and parsing the content. When you hand-roll a research agent, you can trace exactly which sources were reached, parsed and filtered. You can also develop failover procedures when your primary parsing method fails. With a deep research model, it’s difficult to know which sources were blocked or which failed.

Quantification

I’ve had to create an agent which needed to cluster results according to topics and then count the volume of sources in each topic as part of its output. The only reliable way to group, count and segment results when they get higher than a few dozen in number, is via code that you control.

Fine-tuning

The big one. Something I repeatedly run into is the difficulty of getting an agent to determine which results are going to be relevant to the task at hand and which are not - particularly if the agent is researching a fast-moving topic like current affairs. This is one area where I still fine-tune models, rather than only relying on prompt optimisation. Another reason to fine-tune is to train a model to mimic the style or focus of an information extraction task. Breaking a research task down to handle sources individually means you can typically use a smaller model which, in turn, means that fine-tuning on human demonstrations is a viable option.

Bespoke Methodology

All research agents will build plans and then execute them. Most of this is hidden from you, the end user. (Claude Code is an example of a model which will show you it’s plan progressing.) If your research task is replicating a complex, real-world business process, then you will likely (if you’ve done your business analysis properly) have a much better plan than the model will come up with. You can try and instruct it to follow your plan (I’ve had mixed results with this) but as your process becomes more complicated - and especially when it includes some kind of branching structure - then you’re much better off rolling your own agent where you can be sure that it will faithfully follow the required methodology.

In conclusion

Of all the factors above, the last one, “bespoke methodology”, counts for the most. The golden rule with agents in 2025 seems to be: if you have deep knowledge of your task, replicate it within your agent. This will be far more successful than hoping your agent will reason its way to an effective approach.

Given the minimal investment required to build a sophisticated agent application (think four weeks to prototype, four to eight to productionise) and the tremendous time-savings they provide, bespoke deep research is an accessible use case for many organisations. Give it a go!

Next
Next

Why is AI like Interstellar Travel?