fix(portForwarding): serialize concurrent PF rule creation per VIP to prevent duplicates#3345
fix(portForwarding): serialize concurrent PF rule creation per VIP to prevent duplicates#3345zstack-robot-1 wants to merge 15 commits into5.5.6from
Conversation
…isk offering Resolves: ZSTAC-74683 Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP call now times out after 30s instead of 5+ minutes, and automatically retries the next available MDS via tryNext mechanism. Resolves: ZSTAC-80595 Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report the VM as Stopped. Without this transition, the state machine throws an exception and the VM stays stuck in Destroying state forever. Resolves: ZSTAC-80620 Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
…pacity Resolves: ZSTAC-79709 Change-Id: I45a2133bbb8c51c25ae3549d59e588976192a08d
Walkthrough对若干子系统进行了小范围逻辑调整:端口转发创建改为基于 ChainTask 的同步链式处理并增加重叠检查;VM NIC、镜像根盘大小和实例状态、L3 网络空闲 IP 过滤、Ceph 容量释放策略及 ZBS HTTP 调用重试行为均做了局部修改。 Changes
序列图sequenceDiagram
actor Client
participant PortForwardingManager
participant ChainTask
participant Database
participant VIPService
Client->>PortForwardingManager: APICreatePortForwardingRuleMsg
PortForwardingManager->>ChainTask: 提交 ChainTask(key: vipUuid)
ChainTask->>Database: 查询同 VIP、同协议的现有规则
Database-->>ChainTask: 返回规则列表
alt 存在端口范围重叠
ChainTask->>PortForwardingManager: 发布错误事件
ChainTask-->>Client: 返回错误响应
else 无重叠
ChainTask->>VIPService: 获取 VIP 详情
VIPService-->>ChainTask: VIP 信息
ChainTask->>Database: 持久化端口转发规则
Database-->>ChainTask: 持久化成功
ChainTask->>PortForwardingManager: 发布成功事件
ChainTask-->>Client: 返回创建结果
end
预估代码审查工作量🎯 4 (复杂) | ⏱️ ~45 分钟 诗歌
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.5)plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.javaComment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java`:
- Around line 740-741: The FlowChain name in PortForwardingManagerImpl is
misspelled as "create-portforwading"; update the string passed to
flowChain.setName(...) (the FlowChain created via
FlowChainBuilder.newShareFlowChain() and stored in variable flowChain) to the
correct spelling "create-portforwarding" so the chain name is accurate.
| FlowChain flowChain = FlowChainBuilder.newShareFlowChain(); | ||
| flowChain.setName("create-portforwading"); |
There was a problem hiding this comment.
FlowChain 名称存在拼写错误。
"create-portforwading" 应为 "create-portforwarding"(缺少字母 'r')。
🐛 建议修复
FlowChain flowChain = FlowChainBuilder.newShareFlowChain();
-flowChain.setName("create-portforwading");
+flowChain.setName("create-portforwarding");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FlowChain flowChain = FlowChainBuilder.newShareFlowChain(); | |
| flowChain.setName("create-portforwading"); | |
| FlowChain flowChain = FlowChainBuilder.newShareFlowChain(); | |
| flowChain.setName("create-portforwarding"); |
🤖 Prompt for AI Agents
In
`@plugin/portForwarding/src/main/java/org/zstack/network/service/portforwarding/PortForwardingManagerImpl.java`
around lines 740 - 741, The FlowChain name in PortForwardingManagerImpl is
misspelled as "create-portforwading"; update the string passed to
flowChain.setName(...) (the FlowChain created via
FlowChainBuilder.newShareFlowChain() and stored in variable flowChain) to the
correct spelling "create-portforwarding" so the chain name is accurate.
Resolves: ZSTAC-82153 Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
…fterAddIpAddress to prevent NPE during rollback Resolves: ZSTAC-81741 Change-Id: I53bcf20a10306afc7b6172da294d347b74e6c41f
Resolves: ZSTAC-81182 Change-Id: Id1bb642154dc66ae9995dcc4d9fc00cdce9bcaf8
<fix>[vm]: add Destroying->Stopped state transition See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation See merge request zstackio/zstack!9155
<fix>[network]: filter reserved IPs in getFreeIp See merge request zstackio/zstack!9170
<fix>[ceph]: apply over-provisioning ratio when releasing snapshot capacity See merge request zstackio/zstack!9162
<fix>[network]: add null check for L3 network system tags in API interceptor See merge request zstackio/zstack!9169
Resolves: ZSTAC-77673 Change-Id: I7fa4cd853dddb1f9a7b7d90bd3731d2559ba6542
719563c to
252bdc7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
compute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.java (1)
56-64:⚠️ Potential issue | 🟠 Major
UsedIpVO在 NIC 存在性校验之前已被更新,可能导致数据不一致Line 58 的 SQL 更新将
UsedIpVO.vmNicUuid设置为传入的vmNicUUid,但 NIC 存在性检查在 Lines 61-64 才执行。如果 NIC 不存在,UsedIpVO已经关联到一个无效的vmNicUuid,造成数据库状态不一致。建议将 null 检查移到 SQL 更新之前:
🐛 建议的修复方案
`@Override` public void afterAddIpAddress(String vmNicUUid, String usedIpUuid) { - /* update UsedIpVO */ - SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update(); - VmNicVO nic = Q.New(VmNicVO.class).eq(VmNicVO_.uuid, vmNicUUid).find(); if (nic == null) { logger.debug(String.format("VmNic[uuid:%s] not found, skip afterAddIpAddress", vmNicUUid)); return; } + + /* update UsedIpVO */ + SQL.New(UsedIpVO.class).eq(UsedIpVO_.uuid, usedIpUuid).set(UsedIpVO_.vmNicUuid, vmNicUUid).update(); UsedIpVO temp = null;
🤖 Fix all issues with AI agents
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java`:
- Around line 1078-1083: The removal logic may call NetworkUtils.isInRange with
mismatched IP versions (IPv4 vs IPv6) causing exceptions; update the block that
iterates self.getReservedIpRanges() and freeIpInventorys (the lambda using
freeIp and reservedIpRanges.stream) to first filter reserved ranges by the same
IP version as freeIp (use whatever helper/method the file uses elsewhere for
version checking, consistent with the pattern around lines ~486-487), then call
NetworkUtils.isInRange only for ranges with matching versions so ipv4/ipv6
conversions are not mixed and exceptions are avoided.
|
|
||
| Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges(); | ||
| if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) { | ||
| freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch( | ||
| r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp()))); | ||
| } |
There was a problem hiding this comment.
IP 版本不匹配可能导致异常
当前代码在比较 free IP 与 reserved IP range 时,未按 IP 版本过滤。若 free IP 是 IPv4 而 reserved range 是 IPv6(或相反),NetworkUtils.isInRange 内部会调用错误的转换方法(如 ipv4StringToLong 处理 IPv6 地址),可能抛出异常。
参考同文件第 486-487 行的已有模式,应先按 IP 版本过滤 reserved ranges:
🔧 建议修复
Set<ReservedIpRangeVO> reservedIpRanges = self.getReservedIpRanges();
if (reservedIpRanges != null && !reservedIpRanges.isEmpty()) {
- freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream().anyMatch(
- r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp())));
+ freeIpInventorys.removeIf(freeIp -> reservedIpRanges.stream()
+ .filter(r -> r.getIpVersion() == freeIp.getIpVersion())
+ .anyMatch(r -> NetworkUtils.isInRange(freeIp.getIp(), r.getStartIp(), r.getEndIp())));
}🤖 Prompt for AI Agents
In `@network/src/main/java/org/zstack/network/l3/L3BasicNetwork.java` around lines
1078 - 1083, The removal logic may call NetworkUtils.isInRange with mismatched
IP versions (IPv4 vs IPv6) causing exceptions; update the block that iterates
self.getReservedIpRanges() and freeIpInventorys (the lambda using freeIp and
reservedIpRanges.stream) to first filter reserved ranges by the same IP version
as freeIp (use whatever helper/method the file uses elsewhere for version
checking, consistent with the pattern around lines ~486-487), then call
NetworkUtils.isInRange only for ranges with matching versions so ipv4/ipv6
conversions are not mixed and exceptions are avoided.
Problem\nConcurrent
APICreatePortForwardingRuleMsgrequests for the same VIP could create duplicate port forwarding rules with overlapping port ranges. The interceptor checks for port overlap before the handler, but without synchronization, two concurrent requests can both pass the check and persist duplicate rules.\n\nResolves: ZSTAC-77673\n\n## Root Cause\nThehandle(APICreatePortForwardingRuleMsg)method persists the PortForwardingRuleVO without any per-VIP synchronization. The existingPortForwardingApiInterceptorchecks for VIP port overlap, but the window between the interceptor check and handlerpersist()allows race conditions.\n\n## Fix\n1. Wrap the CREATE handler inthdf.chainSubmit(new ChainTask)with sync signatureportforwardingrule-vip-{vipUuid}— same pattern as the DELETE handler\n2. Re-check VIP port overlap inside the synchronized ChainTask before persist\n3. Addchain.next()at all async exit points (VIP acquire callbacks + FlowChain done/error)\n\nThis serializes all port forwarding rule creation per VIP, eliminating the race condition window.\n\n## Test Plan\n- Verify creating port forwarding rules normally still works\n- Verify concurrent creation of rules with overlapping ports on the same VIP returns error for the second request\n- Verify rules on different VIPs can still be created concurrentlysync from gitlab !9174