[serve] deflake backpressure test#63005
Conversation
Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request moves the wrap_queued_request context manager to wrap the entire retry loop in route_and_send_request to prevent metric oscillation during replica rejections. However, a review comment points out that this change introduces a regression in metrics accuracy because the counter now increments before request arguments are resolved. The reviewer suggests moving the argument resolution logic before the context manager to ensure that queue metrics accurately reflect pending work and do not include time spent waiting for upstream dependencies.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1c26d69. Configure here.
Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>

This pull request moves
wrap_queued_requestup one level toroute_and_send_requestin therouter.py. It is to ensure that thenum_queued_requestsremains incremented across all routing retries, preventing oscillation when a replica rejects a request.Details:
num_queued_requestsnow reflects the requests this router is actively trying to place.route_and_send_request, thewrap_queued_requestcontext now wraps the entire retry loop, so thenum_queued_requestsstays incremented for the full duration of all retries instead of being reset on each retry._resolve_request_arguments(along with the_objref_resolution_latency_msobservation andActorDiedError → upstream_crash_errorhandling) is now called inroute_and_send_requestbefore thewrap_queued_requestcontext is opened.