Replica Sets - Voting

Each replica sets contains only one primary node. This is the only node in the set that can accept write commands (insert/update/delete).

The primary node is elected by a consensus vote of all reachable nodes.

Consensus Vote

For a node to be elected primary, it must receive a majority of votes. This is a majority of all votes in the set: if you have a 5-member set and 4 members are down, a majority of the set is still 3 members (floor(5/2)+1). Each member of the set receives a single vote and knows the total number of available votes.

If no node can reach a majority, then no primary can be elected and no data can be written to that replica set (although reads to secondaries are still possible).

Arbiters

An arbiter is a member which votes but has no data. An arbiter cannot be a primary or a secondary, as it has no data.

It is solely used for breaking ties in elections, so at most one arbiter is ever needed.

Reachable Node

Replicas in a set are in regular communication with each other. They do this via a "heartbeat" that is communicated to all nodes in the set.

If node A fails to receive a heartbeat from node B, A will assume that B is unreachable (it will continue to try to re-establish contact, but it will take that into consideration when determining whether a majority is reachable).

Triggering an Election

An election is triggered when the following is true:

  • a node sees that the primary is not reachable
  • that node is not an arbiter
  • that node has priority greater than or equal to other eligible nodes in the set

This means that an election is triggered if the primary node is turned off (mongod stopped, computer shutdown, port blocked,...). An election can also be triggered if the primary stops responding due to a network issue (DNS, internet connectivity,...)

Changing votes

Do not change the number of votes.
  • Do not change vote weights in an attempt to create a "preferred master" – this will not work. Instead use priorities to achieve this.
  • In a two node replica set, it is far better to have an arbiter than to give one of the two members an extra vote.

By default each machine in a replica set receives one vote. The vote field can be set to any non-negative integer, however it is highly suggested that this number be either 0 or 1.

The primary purpose for changing the voting weight is to allow for larger replica sets. Each replica set is limited to 12 total nodes and 7 voting nodes.

The number of votes can be modified in the replica set configuration. You should never change the number of votes per member unless your set has more than seven members.


Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus