磁盘心跳丢失,哪个节点丢失哪个节点失败,不存在选择性。所有节点都丢失则左右节点全部失败。

心跳丢失的节点保留策略,11g 时是最小节点号的节点存活,在 12c 是根据内部算法计算实时的权重值。权重值大的存活,小的驱逐。所以存在不确定性。权重的计算涉及细节未公开,但是主要参考OS 负载和性能等。 (参考 Doc ID 1951726.1)

1951726.1:In 11.2 or even older version, the lowest number node will survive when split brain takes place, however this has changed in 12.1.0.2 with the introduction of node weight. Started from 12.1.0.2, during split brain resolution, node with higher weight will survive.

The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node.

当仅仅使用Oracle集群时,11g misscount的默认值是30秒。当同时使用了第三方的集群工具时,默认值是600秒。这是为了保证给第三方的集群软件留出足够长的时间来处理脑裂情况。

下面是 RAC + ServerSAN 架构的 misscount 默认值:

$ crsctl get css misscount
CRS-4678: Successful get misscount 600 for Cluster Synchronization Services.

单独对Oracle集群来说,30秒已经是比较长的时间了。如果使用了第三方集群,Oracle 建议 Do not change default misscount values if you are running Vendor Clusterware along with Oracle Clusterware. The default values for misscount should not be changed when using vendor clusterware. Modifying misscount in this environment may cause clusterwide outages and potential corruptions. 这时候主要顾虑的是第三方集群软件的处理速度。

-- By 许望(RHCA、OCM、VCP)
最后修改:2019 年 10 月 23 日 11 : 13 AM
如果觉得我的文章对你有用,请随意赞赏