Skip to content

<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition#3332

Closed
MatheMatrix wants to merge 11 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-77711
Closed

<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition#3332
MatheMatrix wants to merge 11 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-77711

Conversation

@MatheMatrix
Copy link
Owner

Summary

  • ZSTAC-77711: 双 MN 一致性哈希环出现不一致,消息路由到错误 MN,导致 UI 任务卡顿
  • 根因:nodeJoin/nodeLeft/iJoin 和 makeDestination 等方法无同步,心跳线程与事件线程并发修改 nodeHash 和 nodes
  • 修复:所有读写 nodeHash/nodes 的方法加 synchronized lock,getManagementNodesInHashRing/getAllNodeInfo 返回防御性拷贝
  • 额外修复 getNodeInfo 中 nodes.put() 返回值 bug

Files Changed

  • ResourceDestinationMakerImpl.java — synchronized lock on all methods

Resolves: ZSTAC-77711

sync from gitlab !9154

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Walkthrough

对管理节点哈希环、节点信息访问和生命周期处理增加同步保护与两轮缺失确认;在 VM 生命周期关键点新增设备地址同步调用;调整 VM 根盘分配大小计算和状态机新增从 Destroying→Stopped 的转移;为 ZBS HTTP 调用添加可选的“尝试下一个 MDS”重试开关。

Changes

Cohort / File(s) Summary
资源目的地与节点信息同步保护
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java
将多处涉及节点哈希环与节点缓存的方法改为 synchronized(nodeJoin/nodeLeft/iAmDead/iJoin/makeDestination/isManagedByUs/getManagementNodesInHashRing/getNodeInfo/getAllNodeInfo/getManagementNodeCount/isNodeInCircle),并返回集合副本或在缺失时从 DB 补充缓存。关注点:并发性能、锁范围与死锁风险。
生命周期与心跳对齐改动
portal/src/main/java/org/zstack/portal/managementnode/ManagementNodeManagerImpl.java
新增 lifecycleLock 用于序列化 lifecycle 回调与心跳重整,加入 suspectedMissingFromDb 做两轮确认才能从哈希环移除节点;在 nodeJoin 时清理怀疑列表,并在重整中对 DB 与哈希环差异作双轮判断。关注点:事件延迟、锁竞争与一致性边界。
VM 设备地址信息同步
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
新增私有方法 syncVmDevicesAddressInfo(String vmUuid) 并在 VM 的若干生命周期路径(启动、InstantStart 完成、重启、恢复、从暂停/启动重启等)处调用以向宿主发送设备地址同步请求并记录结果。关注点:异步消息失败处理与性能。
根盘分配大小计算调整
header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java
rootDiskOffering 为 null 时,getRootDiskAllocateSize 改为返回 image 的虚拟大小与实际大小中的最大值(Math.max(virtualSize, actualSize)),替代之前仅返回虚拟大小的行为。
状态机新增迁移
header/src/main/java/org/zstack/header/vm/VmInstanceState.java
在状态机中新增从 Destroying 在收到 stopped 事件时转到 Stopped 的迁移。
ZBS HTTP 调用重试开关
plugin/zbs/src/main/java/org/zstack/storage/zbs/ZbsStorageController.java
在内部 HttpCaller<T> 类中新增 setTryNext(boolean) 方法以设置尝试“下一个” MDS 的重试行为;GetVolumeClients 的调用路径改用该 HttpCaller 并开启 tryNext,可能累积多个错误后再失败。关注点:错误聚合逻辑与重试语义。

Sequence Diagram(s)

sequenceDiagram
    participant Heartbeat as HeartbeatReconciler
    participant Manager as ManagementNodeManagerImpl
    participant DB as Database
    participant HashRing as NodeHashRing

    Heartbeat->>Manager: start reconciliation()
    Manager->>Manager: synchronized(lifecycleLock)
    Manager->>HashRing: list nodes in hash ring
    loop for each node
        Manager->>DB: query node by uuid
        alt DB has node
            DB-->>Manager: node exists
            Manager->>HashRing: ensure node present
            Manager->>Manager: suspectedMissingFromDb.remove(node)
        else DB missing node
            DB-->>Manager: not found
            Manager->>Manager: mark node in suspectedMissingFromDb (first detection)
            alt still missing on next reconciliation
                Manager->>HashRing: nodeLeft(node)
                Manager->>Manager: suspectedMissingFromDb.remove(node)
            end
        end
    end
Loading

预估代码审查工作量

🎯 4 (复杂) | ⏱️ ~45 分钟

🐰 锁声轻敲在环的边,
我把节点一一点名;
两轮风雨方肯放手,
设备地址悄悄去问宿主,
春草里,系统又安宁。 🥕


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error Pull request title is 79 characters long, exceeding the maximum requirement of 72 characters. Shorten the title to 72 characters or less while maintaining clarity. Consider: '[core]: sync hash ring to prevent dual-MN race condition'
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description is related to the changeset and clearly describes the issue, root cause, and fixes applied to ResourceDestinationMakerImpl.java and related files.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into 5.5.6
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync/ye.zou/fix/ZSTAC-77711

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java

Comment @coderabbitai help to get the list of available commands and usage tips.

AlanJager and others added 6 commits February 12, 2026 12:02
…isk offering

Resolves: ZSTAC-74683

Change-Id: Id0339ed0221e92e506f60745cde972cc3ee6d9ae
When anti-split-brain check selects a disconnected MDS node, the HTTP
call now times out after 30s instead of 5+ minutes, and automatically
retries the next available MDS via tryNext mechanism.

Resolves: ZSTAC-80595

Change-Id: I1be80f1b70cad1606eb38d1f0078c8f2781e6941
When MN restarts during a destroy operation, the hypervisor may report
the VM as Stopped. Without this transition, the state machine throws
an exception and the VM stays stuck in Destroying state forever.

Resolves: ZSTAC-80620

Change-Id: I037edba70d145a44a88ce0d3573089182fedb162
Resolves: ZSTAC-82153

Change-Id: Ib51c2e21553277416d1a9444be55aca2aa4b2fc4
add syncVmDeviceInfo when vm is running

Resolves: ZSTAC-67275

Change-Id: I746e63786a677676676b6d6265657063666b7677
<fix>[compute]: add syncVmDeviceInfo when vm is running

See merge request zstackio/zstack!9197
@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/fix/ZSTAC-77711 branch from e8732a5 to 312bd83 Compare February 15, 2026 17:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)

80-93: ⚠️ Potential issue | 🔴 Critical

修复 getNodeInfo 中 put 返回值导致的空返回。

Map.put 返回的是旧值,当前写法会让 info 变为 null 并直接返回,功能错误。应先创建 NodeInfo,再 put,并返回新对象。

🔧 建议修复
         if (info == null) {
             ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class);
             if (vo == null) {
                 throw new ManagementNodeNotFoundException(nodeUuid);
             }

-            nodeHash.add(nodeUuid);
-            info = nodes.put(nodeUuid, new NodeInfo(vo));
+            info = new NodeInfo(vo);
+            nodeHash.add(nodeUuid);
+            nodes.put(nodeUuid, info);
         }

         return info;

gitlab and others added 5 commits February 16, 2026 00:50
<fix>[vm]: add Destroying->Stopped state transition

See merge request zstackio/zstack!9156
<fix>[i18n]: improve snapshot error message for unattached volume

See merge request zstackio/zstack!9192
<fix>[zbs]: reduce mds connect timeout and enable tryNext for volume clients

See merge request zstackio/zstack!9153
<fix>[vm]: use max of virtual and actual size for root disk allocation

See merge request zstackio/zstack!9155
…talling

In dual management node scenarios, concurrent modifications to the
consistent hash ring from heartbeat reconciliation and canonical event
callbacks can cause NodeHash/Nodes inconsistency, leading to message
routing failures and task timeouts.

Fix: (1) synchronized all ResourceDestinationMakerImpl methods to
ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in
ManagementNodeManagerImpl to serialize heartbeat reconciliation with
event callbacks, (3) added two-round delayed confirmation before
removing nodes from hash ring to avoid race with NodeJoin events.

Resolves: ZSTAC-77711

Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/fix/ZSTAC-77711 branch from 312bd83 to 26b8b1a Compare February 16, 2026 02:22
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
header/src/main/java/org/zstack/header/vm/VmInstanceState.java (1)

166-172: ⚠️ Potential issue | 🟠 Major

建议为新增状态迁移提供兼容性开关(Line 166-172)
新增 Destroying → Stopped 的迁移会改变既有状态机行为,可能影响依赖旧行为的用户。建议增加版本/配置开关或仅对新安装/升级路径启用,避免存量行为被直接改变。

As per coding guidelines, “之前的代码产生的行为不要直接去改动,即使是 bug 也可能用户有依赖,不应直接改动,可以考虑升级用户一套行为、新安装用户一套行为,开关控制等”.

core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)

80-93: ⚠️ Potential issue | 🟠 Major

修复 Map.put() 返回值使用不当导致的空指针异常。

第 89 行代码 info = nodes.put(nodeUuid, new NodeInfo(vo)) 使用了 put() 的返回值,但 put() 返回的是旧值(首次缓存时为 null)而非新插入的对象。因此 info 在首次调用时被赋值为 null,方法会返回 null。后续在 ManagementNodeManagerImplCloudBusImpl3 中链式调用 getNodeInfo(nodeUuid).getNodeIP() 会抛出 NullPointerException

应将新创建的 NodeInfo 对象保存在独立变量中,再返回该对象:

修复方案
             nodeHash.add(nodeUuid);
-            info = nodes.put(nodeUuid, new NodeInfo(vo));
+            NodeInfo newInfo = new NodeInfo(vo);
+            nodes.put(nodeUuid, newInfo);
+            info = newInfo;
🧹 Nitpick comments (2)
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java (1)

8476-8496: 建议缓存 hostUuid 以避免异步日志漂移。
异步回调里再读 self.getHostUuid() 可能与发送时不一致,容易产生误导日志。建议在方法开头缓存并复用。

♻️ 建议修改
 private void syncVmDevicesAddressInfo(String vmUuid) {
-    if (self.getHostUuid() == null) {
+    String hostUuid = self.getHostUuid();
+    if (hostUuid == null) {
         return;
     }
     SyncVmDeviceInfoMsg msg = new SyncVmDeviceInfoMsg();
     msg.setVmInstanceUuid(vmUuid);
-    msg.setHostUuid(self.getHostUuid());
-    bus.makeTargetServiceIdByResourceUuid(msg, HostConstant.SERVICE_ID, msg.getHostUuid());
+    msg.setHostUuid(hostUuid);
+    bus.makeTargetServiceIdByResourceUuid(msg, HostConstant.SERVICE_ID, hostUuid);
     bus.send(msg, new CloudBusCallBack(msg) {
         `@Override`
         public void run(MessageReply reply) {
             if (!reply.isSuccess()) {
                 logger.warn(String.format("Failed to sync vm device info for vm[uuid:%s], %s",
                     vmUuid, reply.getError()));
             } else {
                 logger.debug(String.format("Sent SyncVmDeviceInfoMsg for vm[uuid:%s] on host[uuid:%s]",
-                    vmUuid, self.getHostUuid()));
+                    vmUuid, hostUuid));
             }
         }
     });
 }
header/src/main/java/org/zstack/header/vm/VmInstanceSpec.java (1)

848-855: 存在空指针异常风险需要防御

rootDiskOfferingnull 时,代码直接访问 this.getImageSpec().getInventory().getSize()getActualSize()。存在两个 NPE 风险:

  1. getInventory() 返回 null:ImageSpec 的 inventory 字段未经初始化,getInventory() 无空值检查
  2. 值拆箱 NPE:ImageInventory 中 size 和 actualSize 都是 Long 类型(可为 null),拆箱到原始类型时会抛出 NPE

建议增加防御性检查:

♻️ 建议的改进
 public long getRootDiskAllocateSize() {
     if (rootDiskOffering == null) {
+        ImageInventory inv = this.getImageSpec().getInventory();
+        if (inv == null) {
+            return 0L;
+        }
-        long virtualSize = this.getImageSpec().getInventory().getSize();
-        long actualSize = this.getImageSpec().getInventory().getActualSize();
+        long virtualSize = inv.getSize();
+        long actualSize = inv.getActualSize();
         return Math.max(virtualSize, actualSize);
     }
     return rootDiskOffering.getDiskSize();
 }

@MatheMatrix MatheMatrix deleted the sync/ye.zou/fix/ZSTAC-77711 branch February 16, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants