Expand benchmarks section in README.md by Homy-Xu · Pull Request #2 · OpenDataBox/awesome-agent-memory

Homy-Xu · 2026-07-05T03:28:49Z

Added new benchmarks and metrics for evaluating agent memory systems, including accuracy, recall, robustness, and efficiency. Updated references and added descriptions for various benchmarks.

Beliefuture · 2026-07-05T03:31:33Z

-   Junhao Zheng, Xidi Cai, Qiuke Li, et al. *arXiv 2025*. [[Paper](https://arxiv.org/abs/2505.11942)]
-   - Evaluates sequential procedural skill transfer across structurally related database, operating system, and knowledge graph tasks (1,396 tasks sharing atomic skills)
+2. **LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks** ![](https://img.shields.io/badge/-503_QA-lightgrey) ![](https://img.shields.io/badge/-single_doc_QA-blue) ![](https://img.shields.io/badge/-multi_doc_QA-blue) ![](https://img.shields.io/badge/-long_context-purple) ![](https://img.shields.io/badge/-structured_data-orange)
+   *Yushi Bai, Shangqing Tu, Jiajie Zhang, et al. ACL 2025 / arXiv 2024.* [[Paper](https://arxiv.org/abs/2412.15204)] [[Dataset](https://huggingface.co/datasets/zai-org/LongBench-v2)] [[Github](https://github.com/THUDM/LongBench)]


Only retain ACL 2025

Expand benchmarks section in README.md

df76905

Added new benchmarks and metrics for evaluating agent memory systems, including accuracy, recall, robustness, and efficiency. Updated references and added descriptions for various benchmarks.

Beliefuture reviewed Jul 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expand benchmarks section in README.md#2

Expand benchmarks section in README.md#2
Homy-Xu wants to merge 1 commit into
OpenDataBox:mainfrom
Homy-Xu:patch-1

Homy-Xu commented Jul 5, 2026

Uh oh!

Beliefuture Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Homy-Xu commented Jul 5, 2026

Uh oh!

Beliefuture Jul 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants