[Java] Avoid bind conflicts when removing and adding a subscription to the same channel.#1955
Merged
vyazelenko merged 1 commit intomasterfrom Mar 9, 2026
Merged
Conversation
6ab1d61 to
ccd0859
Compare
…o the same channel. CHANGE TO SUBSCRIPTIONS ----------------------- In 1.49, we started to align the socket opening with the C driver, by moving the socket opening onto the conductor thread, but we left socket closing on the receiver thread. Therefore, it was possible that a REMOVE_SUBSCRIPTION command was not "finished" by the time the conductor started to process the subsequent ADD_SUBSCRIPTION and could result in a bind error. In a recent commit, we started to align the Java driver with the C driver. It now opens and closes sockets within the conductor agent. As discussed in the previous commit, the flow now looks like this: ``` Client -> Conductor: Remove Subscription Conductor -> Receiver: Stop using associated sockets Receiver -> Conductor: I've stopped using the associated sockets Conductor -> Receiver: Stop using associated sockets Receiver -> Conductor: I've stopped using the associated sockets Conductor: closes sockets Conductor: closes status indicator Conductor -> Client: operation completed ``` However, this change was not sufficient to prevent bind conflicts, as it was possible to see commands like ADD_SUBSCRIPTION interleaved. For example: ``` Client -> Conductor: Remove Subscription Conductor -> Receiver: Stop using associated sockets Client -> Conductor: Add Subscription Conductor: bind exception ``` In the Java driver, we now set the endpoints statusIndictor counter's to CLOSING to indicate that it should not be reused. The driver also detects this state when adding a subscription and sends a RESOURCE_TEMPORARILY_UNAVAILABLE error back to the client. This matches the C driver behaviour. ``` Client -> Conductor: Remove Subscription Conductor: Send endpoint status to CLOSING Conductor -> Receiver: Stop using associated sockets Client -> Conductor: Add Subscription FAILURE when opening socket with same port Conductor: Find endpoint with CLOSING status. Conductor -> Client: Error RESOURCE_TEMPORARILY_UNAVAILABLE ``` There are now two ways to safely close and reopen a subscription for the same channel: 1. Wait for the Subscription's channel status indicator to disappear after closing, before reopening. N.b., this only works when the closed subscription is the sole user of the endpoint. 2. Catch `RegistrationException` when opening a subscription and retry on `errorCode == RESOURCE_TEMPORARILY_UNAVAILABLE`. Ideally, it would be possible to hide this complexity from the user, but that would be a far more involved change. --- CHANGE TO AERON CLUSTER CLIENTS ------------------------------- One place where users are likely to run into this issue is when creating a new AeronCluster session, for example, after a session timeout. To improve usability of AeronCluster, we now handle RESOURCE_TEMPORARILY_UNAVAILABLE when creating the egress publication. --- PUBLICATIONS NOT FIXED ---------------------- There are similar problems around publications in the Java media driver; however, these are harder to fix, due to the way the endpoint enters the equivalent CLOSING state on a time event, rather than due to a publication removal. There are also complications when it comes to recreating a publication with the same explicit session-id, as one needs to consider the lifetime of the entries in the collection used to prevent session clashes. --- OTHER TIDBITS ------------- We discovered some issues/surprises on our journey: 1. The channel status indicator counter is created the first time an endpoint is created with the registrationId of its initial resource, but the endpoint could be reused and the initial resource may be closed; therefore, it is not obviously correct to look up the channel status indicator by registrationId, as we do in some places. Instead, one should use Subscription#channelStatusId to get the relevant counter identifier. 2. There are still conflicts when removing and adding MDS destinations that map onto the same socket bind address. 3. The C driver uses different channel endpoint status indicator values. Co-authored-by: Dmytro Vyazelenko <696855+vyazelenko@users.noreply.github.com>
ccd0859 to
131fa73
Compare
vyazelenko
approved these changes
Mar 9, 2026
vyazelenko
pushed a commit
that referenced
this pull request
Mar 11, 2026
…o the same channel. (#1955) CHANGE TO SUBSCRIPTIONS ----------------------- In 1.49, we started to align the socket opening with the C driver, by moving the socket opening onto the conductor thread, but we left socket closing on the receiver thread. Therefore, it was possible that a REMOVE_SUBSCRIPTION command was not "finished" by the time the conductor started to process the subsequent ADD_SUBSCRIPTION and could result in a bind error. In a recent commit, we started to align the Java driver with the C driver. It now opens and closes sockets within the conductor agent. As discussed in the previous commit, the flow now looks like this: ``` Client -> Conductor: Remove Subscription Conductor -> Receiver: Stop using associated sockets Receiver -> Conductor: I've stopped using the associated sockets Conductor -> Receiver: Stop using associated sockets Receiver -> Conductor: I've stopped using the associated sockets Conductor: closes sockets Conductor: closes status indicator Conductor -> Client: operation completed ``` However, this change was not sufficient to prevent bind conflicts, as it was possible to see commands like ADD_SUBSCRIPTION interleaved. For example: ``` Client -> Conductor: Remove Subscription Conductor -> Receiver: Stop using associated sockets Client -> Conductor: Add Subscription Conductor: bind exception ``` In the Java driver, we now set the endpoints statusIndictor counter's to CLOSING to indicate that it should not be reused. The driver also detects this state when adding a subscription and sends a RESOURCE_TEMPORARILY_UNAVAILABLE error back to the client. This matches the C driver behaviour. ``` Client -> Conductor: Remove Subscription Conductor: Send endpoint status to CLOSING Conductor -> Receiver: Stop using associated sockets Client -> Conductor: Add Subscription FAILURE when opening socket with same port Conductor: Find endpoint with CLOSING status. Conductor -> Client: Error RESOURCE_TEMPORARILY_UNAVAILABLE ``` There are now two ways to safely close and reopen a subscription for the same channel: 1. Wait for the Subscription's channel status indicator to disappear after closing, before reopening. N.b., this only works when the closed subscription is the sole user of the endpoint. 2. Catch `RegistrationException` when opening a subscription and retry on `errorCode == RESOURCE_TEMPORARILY_UNAVAILABLE`. Ideally, it would be possible to hide this complexity from the user, but that would be a far more involved change. --- CHANGE TO AERON CLUSTER CLIENTS ------------------------------- One place where users are likely to run into this issue is when creating a new AeronCluster session, for example, after a session timeout. To improve usability of AeronCluster, we now handle RESOURCE_TEMPORARILY_UNAVAILABLE when creating the egress publication. --- PUBLICATIONS NOT FIXED ---------------------- There are similar problems around publications in the Java media driver; however, these are harder to fix, due to the way the endpoint enters the equivalent CLOSING state on a time event, rather than due to a publication removal. There are also complications when it comes to recreating a publication with the same explicit session-id, as one needs to consider the lifetime of the entries in the collection used to prevent session clashes. --- OTHER TIDBITS ------------- We discovered some issues/surprises on our journey: 1. The channel status indicator counter is created the first time an endpoint is created with the registrationId of its initial resource, but the endpoint could be reused and the initial resource may be closed; therefore, it is not obviously correct to look up the channel status indicator by registrationId, as we do in some places. Instead, one should use Subscription#channelStatusId to get the relevant counter identifier. 2. There are still conflicts when removing and adding MDS destinations that map onto the same socket bind address. 3. The C driver uses different channel endpoint status indicator values. Co-authored-by: Dmytro Vyazelenko <696855+vyazelenko@users.noreply.github.com> (cherry picked from commit f08e152)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CHANGE TO SUBSCRIPTIONS
In 1.49, we started to align the socket opening with the C driver, by moving the socket opening onto the conductor thread, but we left socket closing on the receiver thread. Therefore, it was possible that a REMOVE_SUBSCRIPTION command was not "finished" by the time the conductor started to process the subsequent ADD_SUBSCRIPTION and could result in a bind error. In a recent commit, we started to align the Java driver with the C driver. It now opens and closes sockets within the conductor agent. As discussed in the previous commit, the flow now looks like this:
However, this change was not sufficient to prevent bind conflicts, as it was possible to see commands like ADD_SUBSCRIPTION interleaved. For example:
In the Java driver, we now set the endpoints statusIndictor counter's to CLOSING to indicate that it should not be reused. The driver also detects this state when adding a subscription and sends a RESOURCE_TEMPORARILY_UNAVAILABLE error back to the client. This matches the C driver behaviour.
There are now two ways to safely close and reopen a subscription for the same channel:
Wait for the Subscription's channel status indicator to disappear after closing, before reopening. N.b., this only works when the closed subscription is the sole user of the endpoint.
Catch
RegistrationExceptionwhen opening a subscription and retry onerrorCode == RESOURCE_TEMPORARILY_UNAVAILABLE.Ideally, it would be possible to hide this complexity from the user, but that would be a far more involved change.
CHANGE TO AERON CLUSTER CLIENTS
One place where users are likely to run into this issue is when creating a new AeronCluster session, for example, after a session timeout. To improve usability of AeronCluster, we now handle
RESOURCE_TEMPORARILY_UNAVAILABLE when creating the egress publication.
PUBLICATIONS NOT FIXED
There are similar problems around publications in the Java media driver; however, these are harder to fix, due to the way the endpoint enters the equivalent CLOSING state on a time event, rather than due to a publication removal.
There are also complications when it comes to recreating a publication with the same explicit session-id, as one needs to consider the lifetime of the entries in the collection used to prevent session clashes.
OTHER TIDBITS
We discovered some issues/surprises on our journey:
The channel status indicator counter is created the first time an endpoint is created with the registrationId of its initial resource, but the endpoint could be reused and the initial resource may be closed; therefore, it is not obviously correct to look up the channel status indicator by registrationId, as we do in some places. Instead, one should use Subscription#channelStatusId to get the relevant counter identifier.
There are still conflicts when removing and adding MDS destinations that map onto the same socket bind address.
The C driver uses different channel endpoint status indicator values.