vello_hybrid: Use a staging belt to reuse allocations in wgpu backend#1532
Draft
vello_hybrid: Use a staging belt to reuse allocations in wgpu backend#1532
Conversation
bb20fec to
1a80d00
Compare
taj-p
reviewed
Mar 26, 2026
Contributor
taj-p
left a comment
There was a problem hiding this comment.
Taking a look at the WGPU staging belt implementation, I'm a little concerned about its memory overhead. It feels a bit immature / requiring some updates to use in a production setting (or at least for our use case).
There is a degenerate case as Raph alluded to. Steps:
- Allocate a large buffer that is N big. It gets allocated because it's larger than the max chunk size.
- Next frame, Allocate a buffer of size N+1. It gets a new allocation.
- Repeat steps 1 and 2 to see infinite memory growth.
My feeling is that the staging belt needs to be freed whenever we allocate a buffer larger than some value to prevent the degenerate case. Or, at the minimum, prevent allocating so many buffers that could largely be unused.
Collaborator
Author
|
Fair enough! I'll think about whether I can come up with something. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
From my understanding: By default, every time we call
write_texture, wgpu will first copy our data into a newly allocated CPU-based staging buffer, and only then upload the data from the staging buffer to the GPU.By using a staging belt, we can reuse the temporary buffers used by wgpu across frames, which eliminates unnecessary memory copies from the first frame onward.
The results are indeed much better, and the code (in my opinion) also becomes a lot simpler!
Before: For 1200 frames of GhostScript tiger, around 479ms are spent in

renderAfter: For 1200 frames of GhostScript tiger, only 155ms are spent in

render