Skip to content

vello_hybrid: Use a staging belt to reuse allocations in wgpu backend#1532

Draft
LaurenzV wants to merge 1 commit intomainfrom
laurenz/staging
Draft

vello_hybrid: Use a staging belt to reuse allocations in wgpu backend#1532
LaurenzV wants to merge 1 commit intomainfrom
laurenz/staging

Conversation

@LaurenzV
Copy link
Copy Markdown
Collaborator

From my understanding: By default, every time we call write_texture, wgpu will first copy our data into a newly allocated CPU-based staging buffer, and only then upload the data from the staging buffer to the GPU.

By using a staging belt, we can reuse the temporary buffers used by wgpu across frames, which eliminates unnecessary memory copies from the first frame onward.

The results are indeed much better, and the code (in my opinion) also becomes a lot simpler!

Before: For 1200 frames of GhostScript tiger, around 479ms are spent in render
image

After: For 1200 frames of GhostScript tiger, only 155ms are spent in render
image

@LaurenzV LaurenzV changed the title vello_hybrid: Use a staging buffer to reuse allocations vello_hybrid: Use a staging belt to reuse allocations Mar 24, 2026
@LaurenzV LaurenzV requested a review from taj-p March 24, 2026 16:32
@LaurenzV LaurenzV changed the title vello_hybrid: Use a staging belt to reuse allocations vello_hybrid: Use a staging belt to reuse allocations in wgpu backend Mar 24, 2026
@grebmeg grebmeg self-requested a review March 25, 2026 04:12
Copy link
Copy Markdown
Contributor

@taj-p taj-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at the WGPU staging belt implementation, I'm a little concerned about its memory overhead. It feels a bit immature / requiring some updates to use in a production setting (or at least for our use case).

There is a degenerate case as Raph alluded to. Steps:

  1. Allocate a large buffer that is N big. It gets allocated because it's larger than the max chunk size.
  2. Next frame, Allocate a buffer of size N+1. It gets a new allocation.
  3. Repeat steps 1 and 2 to see infinite memory growth.

My feeling is that the staging belt needs to be freed whenever we allocate a buffer larger than some value to prevent the degenerate case. Or, at the minimum, prevent allocating so many buffers that could largely be unused.

@LaurenzV
Copy link
Copy Markdown
Collaborator Author

Fair enough! I'll think about whether I can come up with something.

@LaurenzV LaurenzV marked this pull request as draft March 26, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants