Optimize RTC exchange performance in global RIB#2881
Conversation
|
@fujita could you please take a look at it, if it is possible? |
What' VPN GRT? |
I meant global rib tables for VPN families. |
5819e58 to
8127e3a
Compare
|
@fujita could you take a look at it again, please? |
|
I made some changes: the approach that distinguished between VPN and non-VPN families was wrong. Now all families are could be use in the RTC scheme. And current PR:
|
1914fde to
177fee0
Compare
|
Hello, @fujita , |
|
Hello, @fujita , When a large number of RTs are loaded into GoBGP, the RTC filter becomes dramatically slow. I measured the performance on two GoBGPs running on the same PC. I loaded paths on one server and RTs on another, then established peering. For example, if we have 10 paths with the same RT:
The performance depends on the average number of paths per RT and the total number of RTs. I modified the filter process to use maps instead of nested loops. Now it works much faster, with O(1) complexity instead of O(nm). We are using this change in our environment and have not encountered any issues in this code so far. |
1d1c567 to
43e988a
Compare
a0c9a70 to
a2abdad
Compare
There was a problem hiding this comment.
Pull request overview
This pull request optimizes Route Target Constraint (RTC) handling in GoBGP by introducing fast lookup data structures. The changes reduce complexity from O(n*m) to approximately O(1) for RTC-related operations at the cost of additional memory usage.
Changes:
- Introduces RT hash-based indexing using uint64 keys for fast route target lookups
- Adds
vpnFamilyRTCMapfor global VPN tables androuteFamilyRTCMapfor adjacency RTC tables - Refactors RTC path propagation logic to leverage the new indexing structures
- Updates destination.Calculate to return withdrawn paths for proper RTC bookkeeping
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/pkg/table/rtc.go | New RTC handler interface and implementations for fast RT lookups |
| internal/pkg/table/rtc_test.go | Unit tests for RTC handler low-level operations |
| internal/pkg/table/table.go | Adds RTC handler to Table struct and integrates with path updates |
| internal/pkg/table/table_test.go | Tests for table creation with RTC and RT-based path retrieval |
| internal/pkg/table/path.go | Helper functions for RT-based path filtering and retrieval |
| internal/pkg/table/destination.go | Modifies Calculate to return withdrawn path for RTC unregistration |
| internal/pkg/table/destination_test.go | Updates tests for new Calculate signature |
| internal/pkg/table/adj.go | Integrates RTC handlers into adjacency RIB updates |
| internal/pkg/table/adj_test.go | Tests for adjacency RTC functionality |
| internal/pkg/table/table_manager.go | Manager-level methods for RT-based path retrieval across tables |
| internal/pkg/table/table_manager_test.go | Integration tests for table manager RTC operations |
| internal/pkg/table/policy.go | Updates function call to renamed ExtCommRouteTargetKey |
| pkg/server/server.go | Refactored RTC propagation logic using new fast lookup methods |
| pkg/server/peer.go | Optimized peer interest checking using RTC index |
| pkg/server/server_test.go | End-to-end test for RTC with multiple route targets |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if t.rtc != nil { | ||
| t.rtc.unregister(path, true) | ||
| } |
There was a problem hiding this comment.
When unregistering a withdrawn path from the RTC map, the old path should be passed instead of the withdrawal path for consistency. While both paths have the same NLRI and the current implementation works, passing 'old' is more semantically correct since it's the path that was originally registered.
| } | ||
| } | ||
| case <-t1.C: | ||
| t.Logf("no paths have been withdrawn") |
There was a problem hiding this comment.
The loop logic is incorrect. When the timer expires at line 2090, the code logs "no paths have been withdrawn" but doesn't set found to true or break out of the loop. This means the test will continue waiting indefinitely after the timeout. The timer case should set found=true to exit the loop, or there should be a break statement.
| t.Logf("no paths have been withdrawn") | |
| t.Logf("no paths have been withdrawn") | |
| found = true |
| // rtc is used for RTC rt saving. Every global vpn table has vpnPathsPart, | ||
| // every Adj RF_RTC_UC table has rtPathsPart, and other tables have not rtc instance. |
There was a problem hiding this comment.
The comment references type names "vpnPathsPart" and "rtPathsPart" which don't exist in the code. These should be "vpnFamilyRTCMap" and "routeFamilyRTCMap" respectively to match the actual type names.
cc3af40 to
efea3e0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if val > 0 { | ||
| rtc.rts[rtHash]-- | ||
| } | ||
| if val <= 1 && deleteEmpty { |
There was a problem hiding this comment.
The decrement logic is incorrect. The condition if val > 0 decrements the counter, but then the condition if val <= 1 && deleteEmpty checks the original value val before the decrement. This means that when val equals 1, it will be decremented to 0, but then the check val <= 1 will be true (since val is still 1), causing the entry to be deleted. However, if val is 2, it will be decremented to 1, and val <= 1 will be false (since val is 2), so the entry won't be deleted even though the counter is now 1.
The correct logic should check the value AFTER decrementing, not before. It should be: if rtc.rts[rtHash] <= 0 && deleteEmpty { delete(rtc.rts, rtHash) }
| if val <= 1 && deleteEmpty { | |
| if rtc.rts[rtHash] <= 0 && deleteEmpty { |
| Family bgp.Family | ||
| destinations Destinations | ||
| logger *slog.Logger | ||
| // map[rt valsue]->(number of such rtc paths in the table). Only for adj table and RF_RTC_UC family. |
There was a problem hiding this comment.
Spelling error in the comment: "valsue" should be "value".
| // map[rt valsue]->(number of such rtc paths in the table). Only for adj table and RF_RTC_UC family. | |
| // map[rt value]->(number of such rtc paths in the table). Only for adj table and RF_RTC_UC family. |
Replace linear route scanning for RTC requests with O(1) lookups by adding a route target index to global RIB tables. The index maps route targets to containing paths and requesting peers for instant RTC response generation. Includes comprehensive test coverage.
|
Dear @fujita, Key changes:
|
|
Hi @fujita, gentle ping on this PR — it’s been a week since my last comment. When you have a moment, could you please take a look? Happy to clarify anything or adjust the approach. |
This is about complexity of RTC.
This PR proposes a new approach that potentially reduces the complexity to O(1) for filtering and retrieving VPN paths, assuming the map container is used effectively. However, this approach requires additional memory. Specifically, for each VPN GRT, it necessitates:
n*L*m*sizeof(*Path) , where L is the average number of extRTs in the Path, m - number of paths, n - number of known RTs +
n*sizeof(RouterID) - to store interested on RT peers +
n*sizeof(uint64) - for indexing RTs
Additionally, for each Adj RF_RTC_UC table, the memory requirement is approximately:
n*(sizeof(uint64)+sizeof(int))