Update 5/24/2022: I’ve updated the DMV to reflect the various node and witness dynamic settings with a new column at the end of the DMV called number_of_current_votes. This is only in 2022 and will not be backported.
Background
I’ve now run across a few different instances where the monitoring for quorum was done via this DMV. On the surface, it seems like nothing would be wrong with using the “number_of_quorum_votes” column to check on the members of the cluster and see their voting status. However, this isn’t quite the case… you see there are various mechanisms that influence whether or not a member (or witness) has a vote and these continue to be expanded in each version of WSFC.
Big deal, what’s the problem?
When Failover Cluster Instances (FCI’s) were first implemented in the early versions of Windows when clustering was just getting started, there was an attribute that could be accessed via the WSFC API which would return the vote of the member. This worked well for the time and eventually WSFC continued to evolve and get immensely better!
Then, in 2012, came Availability Groups (AG’s) and a new host of DMVs. It was during this time that the DMV’s reflection of this data was chosen to continue to use the standard attribute for this, which is whether the member can have a vote. However, in the same year release of Windows Server, there was a new attribute introduced for a more dynamic approach to voting – this simplified administrators’ jobs of WSFC by automagically changing voting based on member and configuration availability.
What should I do instead?
Continue to ask WSFC what the actual truth is and stop using the DMV for quorum-based information. Additionally, there are various other member_types which are not reflected in the DMV.
Pictures are worth… 17 ½ words?
Here’s what the DMV says:
However, here is the real state in the cluster:
Can it be fixed?
Yes, it can. If it were to be fixed there would need to be another column added to give a better picture of the cluster and the states. Additionally due to earlier versions of Windows Server that SQL Server could technically run on includes versions where these changes are not actually reflected in WSFC as they are before the changes, it would add extra overhead to check for earlier versions – which means it wouldn’t make sense to make this change in any version of SQL Server that could be run on Windows Server 2008R2 or before… which is thankfully dwindling per day. If anything, the change could be made to 2019 and going forward.
Last thoughts…
DMV queries are not free. I understand that most DBAs do believe they are materialized from the Aether without any cost, but it is not free. In this case, it is even worse as querying DMVs that implemented the need to go outside of SQL Server and call WSFC APIs to gather information. The WSFC API may return quickly or may not, it also requires going preemptive for scheduling, and causes more WSFC API calls which might already be high due to other monitoring solutions that might be running.
It is something that is fine to query every once and a while, but I wouldn’t make a habit of doing this at regular short intervals.
1 thought on “Stop using Sys.dm_hadr_cluster_members for “quorum” information”
Comments are closed.