mempool-challenge

Challenge requirements: ASSIGNMENT.md.

May 19 2025 update

Std's channels have been replaced with tokio channels. The benchmark data with std channels has been kept for reference.

How to

Run and test

cargo r       # runs the app (--release to run the release version of the binary)
cargo t       # tests the app (--release to test the release version of the binary)

Benchmark

cargo bench          # runs the criterion benchmark at `./benches`
./load/vegeta.sh     # runs the load test with vegeta. It requires both vegeta (https://github.com/tsenart/vegeta/releases) and jq (https://jqlang.org/) to be installed. Configurable parameters can be found in the script
k6 run load/k6.js    # runs the load test with k6. It requires k6 to be installed: https://github.com/grafana/k6/releases. Configurable parameters can be found in the script

Overview

This repo has three branches: main, skiplist, and arc_mutex. Each branch proposes a different implementation over the ordered data structure and the concurrency pattern. All use the axum crate to implement REST-compatible APIs, as well as leveraging tokio as the async runtime.

As there's not much core code, I decided to leave it at lib.rs to guarantee compatibility with criterion.

Data structure

The ordered data structure has been implemented with BTreeSet or BinaryHeap, depending on the branch. Since the standard library's (std) BinaryHeap allows for duplicated items, the implementations using the BinaryHeap make use of a side HashSet to prevent insertions of duplicated transactions.

Concurrency

main uses tokio::sync::mpsc::channel with the BinaryHeap + HashSet. channel allows for bounded concurrency (number of channels set to 10'000). Multiple producers ingest incoming requests, but only one consumer modifies the data structure
arc_mutex allows any incoming request to modify the data structure but it uses std's Arc and Mutex to synchronize the access. The default implementation uses the BinaryHeap, but in the same branch there's a version using the BTreeSet (which can be replaced in main.rs by changing AppState to AppStateBTreeSet). Both versions live in the same branch to allow running and comparing the criterion benchmarks
skiplist uses SkipSet of the external crate crossbeam-skiplist, a thread-safe version of std's BTreeSet. Every thread is allowed to modify the data structure thanks to the nature of the crate: no synchronization mechanism is required

Rationale

Expand

Transaction ordering is the same for all implementations: higher gas price and then earlier timestamp. In case both gas price and timestamp are equal between two transactions, the transaction id acts as a tie-breaker.

After exploring potential solutions based on the requirements, I took an iterative approach. I started with the one I was most familiar with (Arc Mutex) to have a working baseline, then progressing to less familiar options (SkipSet and mpsc), benchmarking each for proper evaluation.

I discarded RwLock immediately as as this is a write-heavy scenario. Tokio mutex was not considered as it only adds overhead if the code doesn't need to hold locks across await points (sources: rust docs tokio mutex and rust 100 exercises).

The BTreeSet contains Reverse<Transaction> instead of Transaction as it lives in the same branch of the BinaryHeap, which is a max-heap, so the ordering must be reversed. I have benchmarked the version with and without Reverse and they perform on-par, so no overhead added there. In a real-world application with a single data structure there would be no need to add Reverse.

I then went for the SkipSet solution: as it doesn't require synchronized access, it is by far the easiest code to read and write.

I then went for a popular option in multithreaded scenarios: message-passing. I chose std::mpsc bounded channels. This was the first time implementing it in Rust, but luckily, coming from Go's channels, I was already familiar with the idea.

I didn't try crossbeam's channels as I saw that the crate was copied to std's mpsc a couple of years ago. Although according to the first comment of this Reddit post, the two libraries might have diverged. It would be interesting to explore that too.

May 19 update

After a few days and some more reading, I noticed that I was blocking the entire thread in both the stats function and the consumer (one calls std::thread::sleep and the other calls receiver.recv()) I decided to try tokio's channels and also replace send with try_send in the API endpoints (when the channel is full, the endpoint returns immediately with try_send). When tokio's channels are waiting to send/receive, instead of blocking the whole thread, task's execution is suspended so another task can be executed.

I also tried std channels with tokio's spawn_blocking and performance were similar compared to tokio's channels.

Performance did improve significantly.

How well did the solutions perform? Check the Performance section.

Performance

Expand

Data

Lives in data. All data was gathered with the release version of the binary on a machine equipped with a Ryzen 9 7900 (12 cores/24 threads) and 16 GB of RAM, running in WSL Debian under Windows 11 24H2.

The benchmarks mainly focus on the submit transaction scenario. I assume there would be many users sending txs in the real-world, while only the block builder removing transactions when it's time to build a new block.

The content of the files can be guessed by the name of the folders and the files themselves. The flamegraph plots had to be taken with a low vegeta load (50 req/s, 10 seconds) because flamegraph was crashing each time it tried to generate the plot with a high load.

SUMMARY.md only shows text data of the criterion benchmarks. If you want to see the HTML reports, check the other folders.

Memory usage was tracked with bytehound under a high vegeta load (500k req/s, 10 seconds).

Considerations

vegeta was used first for the end-to-end test. As I was getting mixed results sometimes, I decided to run tests also with k6. Results with k6 were consistent. I am not sure whether the vegeta issue is due to WSL or my machine specifically, but I couldn't test on other computers as I only have one.

While end-to-end benchmarks with k6 and vegeta roughly give similar results, the criterion benchmarks are clear: mpsc channels win. However, memory usage is higher during insertions, and when the load stops, the memory usage is still slightly higher than the rest of the solutions.

May 19 update

While k6's results are roughly similar, vegeta and criterion numbers are clear: tokio's channels win. However, memory usage is the highest with tokio's channels, only if by a slight margin.

To my surprise, Arc Mutex with BinaryHeap + HashSet uses less memory than Arc Mutex with BTreeSeet. I guess the internals of the BTreeSet add more overhead to handle duplicates.

Bonus points

Async version using tokio -> axum
Performance profiling output (e.g., with criterion.rs, flamegraph, or perf) -> criterion and flamegraph
Visual output of transaction queue state over time (even just CLI stats) -> pool size printed every 3 seconds

Here goes a brief explanation of how I would implement the other two bonus points.

Two-step draining of the queue

I think consistency is important here: if the block builder retrieves some transactions (txs) and shortly after high-priority ones come in the mempool, when it's time to delete txs, the block builder only wants to delete those that have actually been included in a block.

fetch(n: usize) -> Vec<Transaction> and drain(txs: Vec<Transaction>) -> easiest solution for the mempool, least user-friendly for the block builder as it has to resubmit the entire array of txs
fetch(n: usize) -> Vec<Transaction> and drain(ids: Vec<String>) -> a bit more user-friendly for the block builder but the worst solution in terms of time-complexity as the code has to scan the whole data structure
fetch(n: usize) -> Vec<Transaction> and drain(n: usize)
- when the block builder calls fetch, the mempool would mark those txs as "to be deleted" via another HashSet. When the block builder then calls drain, the code would get the first transaction from the queue, check if its id exists in the HashSet and if so, remove it from both data structures. The whole process repeated n times.
  
  I understand such HashSet adds overhead in memory, but I think it is acceptable as if an id is a 64-bytes hash, then 1M of "to-be-deleted" hashes only add 64MB of memory usage.
  
  If new higher priority txs come in, some low-priority ones might become stale in the HashSet. In such scenario, the HashSet might be replaced for a BTreeSet, with txs ordered by time-to-live and a background task evicting such entries periodically. This is something that Redis does in a more sophisticated way with its "active expiration policy" (if you're curious, check my implementation here).

Configurable prioritization algorithm

features flags choosing the desired function
macro running before the server and generating the implementation code depending on a configuration parameter

Potential improvements and research

Sustained-load benchmark with requests sent over the network
Mixed benchmark of concurrent submit and drain
log is more appropriate in a production environment
More pool stats? e.g. real-time insertions and deletions, but in that case I'd definitely add the log crate and hide those logs behind a trace or debug level
Explore other data structures and concurrency options: avl tree, crossbeam::bounded, crossbeam-queue, mpmc with SkipSet or Arc Mutex BTreeSet, actix-web, sharded skiplist/binary heap
Definitely a configurable (and maybe dynamic) number of channels as well as a configurable stats interval printing

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
benches		benches
data		data
load		load
src		src
.gitignore		.gitignore
ASSIGNMENT.md		ASSIGNMENT.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mempool-challenge

May 19 2025 update

How to

Run and test

Benchmark

Overview

Data structure

Concurrency

Rationale

Performance

Data

Considerations

Bonus points

Two-step draining of the queue

Configurable prioritization algorithm

Potential improvements and research

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mempool-challenge

May 19 2025 update

How to

Run and test

Benchmark

Overview

Data structure

Concurrency

Rationale

Performance

Data

Considerations

Bonus points

Two-step draining of the queue

Configurable prioritization algorithm

Potential improvements and research

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages