Open Deep Wide Research: A General-Purpose Agent Collaboration Architecture for Large-Scale Information Gathering

October 26, 2025Ollie @puppyone

Abstract

A novel AI research paradigm automates high-breadth information gathering tasks (such as horizontal research across hundreds of entities) by assigning a dedicated cloud virtual machine to each user session, within which multiple general-purpose agents execute subtasks in parallel. This architecture relies on a Turing-complete execution environment and a role-agnostic multi-agent collaboration mechanism, offering high flexibility. However, it still faces engineering challenges in latency control, resource scheduling, and cost predictability.

Problem Background

Traditional Retrieval-Augmented Generation (RAG) systems typically follow a linear flow: User Input → Retrieval → Generation. While effective for single-point Q&A, this design is significantly limited when faced with tasks requiring multi-round validation, structured comparison, or exploration across numerous heterogeneous sources (e.g., "Analyze the post-graduation career paths of PhDs from the computer science departments of the world's top 50 universities"). The main bottlenecks include:

  • Lack of proactive exploration and task decomposition capabilities in the retrieval phase.
  • Inability to dynamically plan or backtrack during the generation phase.
  • The overall process is non-interruptible and non-extensible, making it difficult to support long-running tasks.

To overcome these limitations, the new generation of systems models large-scale research tasks as a distributed agent collaboration problem.

Method Overview

The core design is to assign a dedicated cloud virtual machine (VM) to each user session. This VM provides a full operating system, network access, and an execution environment, forming a Turing-complete sandbox. Within this sandbox, the system dynamically launches multiple sub-agents. Each is a fully functional, general-purpose instance (rather than having a predefined role like "Researcher" or "Validator") with the following capabilities:

  • Independently initiate HTTP requests or call external APIs.
  • Execute scripts to parse unstructured data from web pages, PDFs, tables, etc.
  • Call built-in toolchains (e.g., headless browsers, document extractors).
  • Exchange intermediate results with other sub-agents.

Task decomposition is dynamically generated by a main controller. For example, to "research the generative AI tool ecosystem," the system might automatically break it down into:

  1. Obtain a list of tools from multiple platforms (GitHub, Product Hunt, official aggregator pages).
  2. For each tool, concurrently scrape documentation, version history, and user reviews.
  3. Extract key metrics (e.g., open-source status, API support, pricing model).
  4. Align entities and output a structured comparison table.

Since all sub-agents share the same execution environment and possess general-purpose capabilities, the task logic is not constrained by predefined roles, significantly enhancing generalization.

Key Technical Details

1. Virtual Machines as Execution Units

  • Each session has exclusive use of a lightweight Linux VM (possibly based on micro-virtualization technology like Firecracker).
  • Pre-installed with common runtimes (Python, Node.js), parsing libraries (BeautifulSoup, PyPDF2), and browser automation tools.
  • Network egress is rotated through a proxy pool to reduce the risk of being blocked by anti-scraping measures.
  • All operations are performed in an isolated environment, ensuring security and data boundaries.

2. Multi-Agent Communication and Scheduling

  • Sub-agents exchange data via shared memory or a lightweight message broker (like Redis Pub/Sub).
  • Intermediate results are persisted in a structured format (e.g., JSON or JSON-LD) to facilitate subsequent aggregation and validation.
  • The main controller maintains a task dependency graph (DAG), supporting dynamic scheduling, failure retries, and result caching.

3. Data Processing Pipeline

Take "Fortune 500 company analysis" as an example:

  • Discovery Phase: Call search engines or public databases to get a list of companies.
  • Collection Phase: Each sub-agent is responsible for several companies, scraping official websites, annual report PDFs, and press releases.
  • Parsing Phase: Use rule-based matching, OCR, or multimodal models to extract key fields (e.g., revenue, employee count, CEO).
  • Alignment Phase: Perform entity resolution based on a unified identifier (like a stock ticker) to build a standardized knowledge table.

This process is highly I/O-intensive, placing high demands on the VM's concurrent processing capabilities and network bandwidth.

Limitations and Scalability Challenges

Current Limitations

  • Uncontrollable Response Time: Task completion time is determined by the slowest subtask, with no mechanism for timeouts, circuit breaking, or returning partial results.
  • Non-Transparent Resource Costs: No resource consumption model is provided based on task scale, making it difficult for users to predict expenses.
  • Single-Node Scaling Bottleneck: All sub-agents run on the same VM, and contention for CPU/memory can lead to performance jitter.
  • Strong Dependency on the Public Internet: Cannot directly access private knowledge bases or internal data sources.

Large-Scale Deployment Challenges

  • Cold-Start Latency: VM creation and initialization typically take several to tens of seconds, affecting user experience.
  • Concurrent Scheduling Overhead: When a large number of subtasks run simultaneously, process management and communication can become bottlenecks.
  • Storage Costs: If intermediate results are not cleaned up promptly, a large amount of temporary data will accumulate.
  • Security and Compliance: A sandbox that dynamically executes arbitrary code requires strict auditing, especially in enterprise environments.

Improvement Directions

  • Introduce depth-breadth control parameters: Allow users to explicitly limit the maximum parallelism (breadth) and number of reasoning steps (depth).
  • Adopt a layered execution strategy: Prioritize high-value subtasks, while low-priority tasks can be downgraded or skipped.
  • Support hybrid data source access: Combine public web scraping with private vector database retrieval.
  • Provide a cost estimation API: Predict resource consumption for the current configuration based on historical task statistics.

If you are looking for a production-ready, self-hostable Agentic RAG solution with fine-grained control, puppyone offers an out-of-the-box implementation path. Built on the MCP protocol, puppyone supports dynamic adjustment of depth and breadth, multi-model backend switching, and seamless integration with private knowledge bases, making it suitable for a variety of scenarios from customer service Q&A to enterprise-level intelligent analysis. Visit https://www.puppyone.ai/ to learn how to deploy your own controllable research agent in minutes.

FAQ

Q1: What is the fundamental difference between this architecture and traditional multi-agent systems?
A: Traditional systems rely on predefined roles (e.g., "Planner," "Executor"), whereas in this architecture, all sub-agents are general-purpose instances that can autonomously decide their course of action. This makes the task structure more flexible and enhances generalization capabilities.

Q2: Can a similar system be deployed on-premises or in a private cloud?
A: Yes, but you would need to handle virtualization scheduling, network proxying, sandbox security, and task coordination yourself. A lightweight alternative is to use containers (like Docker) instead of full VMs and implement agent communication via a message queue.

Q3: What are the main performance bottlenecks in high-concurrency scenarios?
A: The main bottlenecks include VM cold-start latency, the throughput of the subtask scheduler, and the serialization overhead of inter-agent communication. Optimization techniques include using a pre-warmed pool, asynchronous task queues, and caching/reusing intermediate results.