NEWS · March 2025

Leading the World: BIGAI's TongAgents Tops Multiple International Agent Benchmarks

The Beijing Institute for General Artificial Intelligence (BIGAI) has announced that its proprietary TongAgents system has achieved breakthrough results across multiple international agent benchmarks, demonstrating the system's robust generalization capabilities and engineering reliability in complex, real-world task scenarios.

Benchmark Results

Top Rankings Across Four International Benchmarks

Terminal-Bench 2.0 — Terminal Environment Engineering Execution

In engineering execution and coding tasks within terminal environments, TongAgents ranked 2nd globally overall and 1st globally among comparable model-based systems.

AssistantBench — Long-Horizon Web Interaction

In long-horizon web interaction tasks involving customer service instructions, TongAgents achieved a global 1st place ranking.

TauBench2 — Multi-Turn Dialogue and Tool Use

In rule-constrained multi-turn dialogue and tool invocation tasks, TongAgents ranked 1st globally across the retail, airline, and telecommunications domains combined.

Mind2Web2 — Deep Research and Information Retrieval

In deep research and information retrieval tasks on the open web, TongAgents placed in the global top 3.

Terminal-Bench 2.0 benchmark results — 图1 Terminal-Bench 2.0: Evaluation results for engineering execution and coding tasks in terminal environments

AssistantBench benchmark results — 图2 AssistantBench: Evaluation results for long-horizon web interaction tasks with customer service instructions

TauBench2 benchmark results — 图3 TauBench2: Evaluation results for rule-constrained multi-turn dialogue and tool invocation tasks

Mind2Web2 benchmark results — 图4 Mind2Web2: Evaluation results for deep research and information retrieval tasks on the open web

System Design

Hierarchical Cognitive Architecture for Multi-Agent Collaboration

TongAgents decouples task planning, execution, and verification into three collaborative layers that form a closed loop, ensuring the system consistently progresses toward its objectives.

Planning Hub — Planner

Responsible for decomposing tasks and formulating adaptive plans. Unlike static planners, the TongAgents Planner features real-time feedback regulation — dynamically adjusting its plan queue and downstream strategy based on Executor reports. This design, which shields the Planner from low-level execution details, enables it to maintain strategic direction across complex, long-horizon tasks without losing focus after dozens of steps.

Execution Engine — Executor

Focused on completing sub-tasks assigned by the Planner. Each sub-task is handled by a dedicated Executor whose lifecycle consists of three phases: execution, reporting, and Q&A. Executors are equipped with command execution, multimodal LLM invocation, REPL-style interactive terminals, and other environment-aware tools. The system supports parallel tool invocation, streaming segmented output for long-running tools, and asynchronous completion notifications — significantly reducing interaction rounds. Executors can also query other agents in the team, enabling cross-agent experience reuse.

Acceptance Testing — Verifier

An independent black-box verification layer that does not rely on execution history. It examines results from multiple perspectives to identify potential issues, ensuring the accuracy and robustness of delivered outcomes.

Engineering Breakthroughs

Structured Context Management and Data-Driven Optimization

Structured Context Management

Executors handling different sub-tasks operate in isolated contexts. Combined with the inherently lower complexity of sub-tasks and hard limits on step counts and token budgets, each Executor's context size is precisely controlled within predetermined bounds. The Planner–Executor layered design enables elastic scaling of the overall task context. Context is not fully shared among the Planner, Executor, and Verifier — only key information is retained, with agents filling in gaps through a Q&A mechanism as needed. This prevents hallucinations and performance degradation caused by excessive context.

Full-Chain Trajectory Tracking

The system records critical data at every step: token consumption, latency, tool parameters, and return values, preserving complete trajectories. This design satisfies scientific reproducibility requirements, enables data-driven iterative optimization of agents, and provides robust support for post-mortem analysis and fault diagnosis.

Comprehensive Perception and Environment Interaction

Beyond standard file I/O and command execution, the system equips agents with multimodal understanding and deep retrieval capabilities. TongAgents integrates deep search and structured extraction technologies, optimizing the parsing of dynamically loaded web content so that agents can "see" web pages as humans do — accurately capturing pop-ups, dynamic charts, and visually critical information. The system also supports clicking buttons, scrolling pages, filling forms, and performing spatial reasoning and navigation on maps.

Fault Tolerance and Self-Healing

In real terminal environments, errors and stalls are the norm. TongAgents implements a multi-layered fault tolerance framework featuring automatic background suspension on command timeout, streaming segmented output, and asynchronous completion notifications — helping agents promptly detect and correct error states caused by internal or external factors.

Significance

From General Benchmarks to Real-World Industry Tasks

🖥️

🌐

🏢

These benchmark results mark only the beginning of TongAgents' journey toward real-world deployment. BIGAI will continue to advance agent technologies and drive their large-scale adoption across software engineering, industrial operations, scientific research, and other vertical domains.

About Us

TongAgents

TongAgents is a proprietary agent framework developed by the Beijing Institute for General Artificial Intelligence (BIGAI). It supports task planning, tool invocation, learning and reasoning, and multi-agent collaborative scheduling, providing a comprehensive standardized toolchain covering the full agent lifecycle — from design, training, and debugging through to production deployment.

The platform supports the construction and deployment of agents in diverse modalities, significantly lowering the barrier to entry and meeting the needs of developers and enterprises with varying levels of technical expertise. TongAgents deeply integrates BIGAI's value alignment and neuro-symbolic-logic fusion algorithmic architecture to build trustworthy, interpretable, and evolvable agents.

For government and enterprise clients, the TongAgents platform has been deployed across critical sectors including legal, finance, education, energy, and transportation, delivering measurable cost reductions and efficiency gains in real-world business scenarios — advancing agent technology from "usable" to "reliable."

Resources

TongAgents Website Documentation