Challenge requirements: ASSIGNMENT.md.
Std's channels have been replaced with tokio channels. The benchmark data with std channels has been kept for reference.
cargo r # runs the app (--release to run the release version of the binary)
cargo t # tests the app (--release to test the release version of the binary)cargo bench # runs the criterion benchmark at `./benches`
./load/vegeta.sh # runs the load test with vegeta. It requires both vegeta (https://github.com/tsenart/vegeta/releases) and jq (https://jqlang.org/) to be installed. Configurable parameters can be found in the script
k6 run load/k6.js # runs the load test with k6. It requires k6 to be installed: https://github.com/grafana/k6/releases. Configurable parameters can be found in the scriptThis repo has three branches: main, skiplist, and arc_mutex. Each branch proposes a different implementation over the ordered data structure and the concurrency pattern. All use the axum crate to implement REST-compatible APIs, as well as leveraging tokio as the async runtime.
As there's not much core code, I decided to leave it at lib.rs to guarantee compatibility with criterion.
The ordered data structure has been implemented with BTreeSet or BinaryHeap, depending on the branch. Since the standard library's (std) BinaryHeap allows for duplicated items, the implementations using the BinaryHeap make use of a side HashSet to prevent insertions of duplicated transactions.
mainusestokio::sync::mpsc::channelwith the BinaryHeap + HashSet.channelallows for bounded concurrency (number of channels set to 10'000). Multiple producers ingest incoming requests, but only one consumer modifies the data structurearc_mutexallows any incoming request to modify the data structure but it uses std'sArcandMutexto synchronize the access. The default implementation uses the BinaryHeap, but in the same branch there's a version using the BTreeSet (which can be replaced inmain.rsby changingAppStatetoAppStateBTreeSet). Both versions live in the same branch to allow running and comparing the criterion benchmarksskiplistusesSkipSetof the external cratecrossbeam-skiplist, a thread-safe version of std's BTreeSet. Every thread is allowed to modify the data structure thanks to the nature of the crate: no synchronization mechanism is required
Expand
Transaction ordering is the same for all implementations: higher gas price and then earlier timestamp. In case both gas price and timestamp are equal between two transactions, the transaction id acts as a tie-breaker.After exploring potential solutions based on the requirements, I took an iterative approach. I started with the one I was most familiar with (Arc Mutex) to have a working baseline, then progressing to less familiar options (SkipSet and mpsc), benchmarking each for proper evaluation.
I discarded RwLock immediately as as this is a write-heavy scenario. Tokio mutex was not considered as it only adds overhead if the code doesn't need to hold locks across await points (sources: rust docs tokio mutex and rust 100 exercises).
The BTreeSet contains Reverse<Transaction> instead of Transaction as it lives in the same branch of the BinaryHeap, which is a max-heap, so the ordering must be reversed. I have benchmarked the version with and without Reverse and they perform on-par, so no overhead added there. In a real-world application with a single data structure there would be no need to add Reverse.
I then went for the SkipSet solution: as it doesn't require synchronized access, it is by far the easiest code to read and write.
I then went for a popular option in multithreaded scenarios: message-passing. I chose std::mpsc bounded channels. This was the first time implementing it in Rust, but luckily, coming from Go's channels, I was already familiar with the idea.
I didn't try crossbeam's channels as I saw that the crate was copied to std's mpsc a couple of years ago. Although according to the first comment of this Reddit post, the two libraries might have diverged. It would be interesting to explore that too.
May 19 update
After a few days and some more reading, I noticed that I was blocking the entire thread in both the stats function and the consumer (one calls std::thread::sleep and the other calls receiver.recv()) I decided to try tokio's channels and also replace send with try_send in the API endpoints (when the channel is full, the endpoint returns immediately with try_send). When tokio's channels are waiting to send/receive, instead of blocking the whole thread, task's execution is suspended so another task can be executed.
I also tried std channels with tokio's spawn_blocking and performance were similar compared to tokio's channels.
Performance did improve significantly.
How well did the solutions perform? Check the Performance section.
Expand
Lives in data. All data was gathered with the release version of the binary on a machine equipped with a Ryzen 9 7900 (12 cores/24 threads) and 16 GB of RAM, running in WSL Debian under Windows 11 24H2.
The benchmarks mainly focus on the submit transaction scenario. I assume there would be many users sending txs in the real-world, while only the block builder removing transactions when it's time to build a new block.
The content of the files can be guessed by the name of the folders and the files themselves. The flamegraph plots had to be taken with a low vegeta load (50 req/s, 10 seconds) because flamegraph was crashing each time it tried to generate the plot with a high load.
SUMMARY.md only shows text data of the criterion benchmarks. If you want to see the HTML reports, check the other folders.
Memory usage was tracked with bytehound under a high vegeta load (500k req/s, 10 seconds).
vegeta was used first for the end-to-end test. As I was getting mixed results sometimes, I decided to run tests also with k6. Results with k6 were consistent. I am not sure whether the vegeta issue is due to WSL or my machine specifically, but I couldn't test on other computers as I only have one.
While end-to-end benchmarks with k6 and vegeta roughly give similar results, the criterion benchmarks are clear: mpsc channels win. However, memory usage is higher during insertions, and when the load stops, the memory usage is still slightly higher than the rest of the solutions.
May 19 update
While k6's results are roughly similar, vegeta and criterion numbers are clear: tokio's channels win. However, memory usage is the highest with tokio's channels, only if by a slight margin.
To my surprise, Arc Mutex with BinaryHeap + HashSet uses less memory than Arc Mutex with BTreeSeet. I guess the internals of the BTreeSet add more overhead to handle duplicates.
- Async version using tokio -> axum
- Performance profiling output (e.g., with criterion.rs, flamegraph, or perf) -> criterion and flamegraph
- Visual output of transaction queue state over time (even just CLI stats) -> pool size printed every 3 seconds
Here goes a brief explanation of how I would implement the other two bonus points.
I think consistency is important here: if the block builder retrieves some transactions (txs) and shortly after high-priority ones come in the mempool, when it's time to delete txs, the block builder only wants to delete those that have actually been included in a block.
fetch(n: usize) -> Vec<Transaction>anddrain(txs: Vec<Transaction>)-> easiest solution for the mempool, least user-friendly for the block builder as it has to resubmit the entire array of txsfetch(n: usize) -> Vec<Transaction>anddrain(ids: Vec<String>)-> a bit more user-friendly for the block builder but the worst solution in terms of time-complexity as the code has to scan the whole data structurefetch(n: usize) -> Vec<Transaction>anddrain(n: usize)-
when the block builder calls
fetch, the mempool would mark those txs as "to be deleted" via another HashSet. When the block builder then callsdrain, the code would get the first transaction from the queue, check if its id exists in the HashSet and if so, remove it from both data structures. The whole process repeatedntimes.I understand such HashSet adds overhead in memory, but I think it is acceptable as if an id is a 64-bytes hash, then 1M of "to-be-deleted" hashes only add 64MB of memory usage.
If new higher priority txs come in, some low-priority ones might become stale in the HashSet. In such scenario, the HashSet might be replaced for a BTreeSet, with txs ordered by time-to-live and a background task evicting such entries periodically. This is something that Redis does in a more sophisticated way with its "active expiration policy" (if you're curious, check my implementation here).
-
- features flags choosing the desired function
- macro running before the server and generating the implementation code depending on a configuration parameter
- Sustained-load benchmark with requests sent over the network
- Mixed benchmark of concurrent submit and drain
logis more appropriate in a production environment- More pool stats? e.g. real-time insertions and deletions, but in that case I'd definitely add the log crate and hide those logs behind a trace or debug level
- Explore other data structures and concurrency options: avl tree, crossbeam::bounded, crossbeam-queue, mpmc with SkipSet or Arc Mutex BTreeSet, actix-web, sharded skiplist/binary heap
- Definitely a configurable (and maybe dynamic) number of channels as well as a configurable stats interval printing