Data Center Awareness

Examples

One primary data center, one disaster recovery site

Multiple set members can be primary at the main data center. Have a member at a remote site that is never primary (at least, not without human intervention).

{ _id: 'myset',
  members: [
   { _id:0, host:'sf1', priority:1 },
   { _id:1, host:'sf2', priority:1 },
   { _id:2, host:'ny1', priority:0 }
  ]
}

Multi-site with local reads

The following example shows one set member in each of three data centers. At election time, any healthy update to date node, arbitrarily, can become primary. The others are then secondaries and can service queries locally if the client uses slaveOk mode.

{ _id: 'myset',
  members: [
   { _id:0, host:'sf1', priority:1 },
   { _id:1, host:'ny1', priority:1 },
   { _id:2, host:'uk1', priority:1 }
  ]
}

Refer to your driver's documentation for more information about read routing.

Confirming propagation of writes with getLastError

Calling [getLastError] (called "write concern" in some drivers) with w:"majority" (v2.0+) assures the write reaches a majority of the set before acknowledgement. For example, if you had a three-member replica set, calling db.runCommand({getLastError : 1, w : "majority"}) would make sure the last write was propagated to at least 2 servers.

Once a write reaches a majority of the set members, the cluster wide commit has occurred (see Replica Set Design Concepts).

Replicating from nearby members

In v2.0+, secondaries automatically sync data from members which are nearby. You can see the latencies that the mongod process is observing to its peers in the replSetGetStatus command's output. If nearby members are not healthy, more distant members will be used for syncing.

Example output, highlighting new ping time and sync target fields:

> rs.status()
{
        ...
        "syncingTo" : "B:27017",
        "members" : [
                {
                        "_id" : 0,
                        "name" : "A:27017",
                        ...
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "B:27017",
                        ...
                        "pingMs" : 14
                },
                {
                        "_id" : 2,
                        "name" : "C:27017",
                        ...
                        "pingMs" : 271
                }
        ],
        "ok" : 1
}

Tagging (version 2.0+)

Tagging gives you fine-grained control over where data is written. It is:

  • Customizable: you can express your architecture in terms of machines, racks, data centers, PDUs, continents, etc. (in any combination or level that is important to your application).
  • Developer/DBA-friendly: developers do not need to know about where servers are or changes in architecture.

Each member of a replica set can be tagged with one or more physical or logical locations, e.g., {"dc" : "ny", "rack" : "rk1", "ip" : "192.168", "server" : "192.168.4.11"}. Modes can be defined that combine these tags into targets for getLastError's w option.

For example, suppose we have 5 servers, A, B, C, D, and E. A and B are in New York, C and D are in San Francisco, and E is in the cloud somewhere.

Our replica set configuration might look like:

{
    _id : "someSet",
    members : [
        {_id : 0, host : "A", tags : {"dc": "ny"}},
        {_id : 1, host : "B", tags : {"dc": "ny"}},
        {_id : 2, host : "C", tags : {"dc": "sf"}},
        {_id : 3, host : "D", tags : {"dc": "sf"}},
        {_id : 4, host : "E", tags : {"dc": "cloud"}}
    ]
    settings : {
        getLastErrorModes : {
            veryImportant : {"dc" : 3},
            sortOfImportant : {"dc" : 2}
        }
    }
}

Now, when a developer calls getLastError, they can use any of the modes declared to ensure writes are propagated to the desired locations, e.g.:

> db.foo.insert({x:1})
> db.runCommand({getLastError : 1, w : "veryImportant"})

"veryImportant" makes sure that the write has made it to at least 3 tagged "regions", in this case, "ny", "sf", and "cloud". Once the write has been replicated to these regions, getLastError will return success. (For example, if the write was present on A, D, and E, that would be a success condition).

If we used "sortOfImportant" instead, getLastError would return success once the write had made it to two out of the three possible regions. Thus, A and C having the write or D and E having the write would both be "success." If C and D had the write, getLastError would continue waiting until a server in another region also had the write.

Below are some common examples and how you’d specify tags and w modes for them.

Server X should have a copy.

Suppose you want to be able to specify that your backup server (B) should have a copy of a write. Then you'd use the following tags:

Server Tags
B {"backup" : “B”}

To define a mode for "server B should have a copy," create the mode:

backedUp : {“backup” : 1}

You want one server with a "backup" tag to have the write.

So, your config would look like:

{
   _id : replSetName,
   members : [
       {
           “_id” : 0,
           “host” : B,
           “tags” : {"backup" : "B"}
       },
       ...
   ],
   settings : {
       getLastErrorModes : {
           backedUp : {backup : 1}
       }
   }
}

To use this mode in your application, you’d call getLastError with w set to backedUp:

> db.runCommand({getLastError : 1, w : "backedUp"})

In the following examples, we will skip the configuration and the usage for brevity. Tags are always added to a member’s configuration, modes are always added to getLastErrorModes.

Make n backups

Suppose you have three backup servers (B1, B2, B3) and you want at least two of them to have a copy. Then you’d give each of them a unique "backup" tag:

Server Tags
B1 {"backup" : "B1"}
B2 {"backup" : "B2"}
B3 {"backup" : "B3"}

Then you would create the mode:

backedUp : {"backup" : 2}

Make sure there are at least three copies of the data and it is present on at least two continents.

All of the rules up until now have only had one condition, but you can include as many and-conditions as you want. Suppose we have the following:

Server Tags
S1 {"continent" : "nAmerica", "copies" : "S1"}
S2 {"continent" : "nAmerica", "copies" : "S2"}
S3 {"continent" : "nAmerica", "copies" : "S3"}
S4 {"continent" : "Africa", "copies" : "S4"}
S5 {"continent" : "Asia", "copies" : "S5"}

Then create a mode like:

level : {copies : 3, continent : 2}

Note that modes can contain as many clauses as you need.

Make sure at least two servers across at least two racks in nyc have it.

This is a complication of our original example. The key concept here is that not all tags need to be present on all servers. For example, some servers below are tagged with "nyc", others are not.

Server Tags
S1 {"nycRack" : "rk1", "nyc" : "S1"}
S2 {"nycRack" : "rk2", "nyc" : "S2"}
S3 {"nycRack" : "rk2", "nyc" : "S3"}
S4 {"sfRack" : "rk1", "sf" : "S4”}
S5 {"sfRack" : "rk2", "sf" : S5"}

Now our rule would look like:

customerData : {“nycRack” : 2}

Notes

The examples above generally use hostnames (e.g., "nyc" : "S1"). This isn't required, it's just and convenient way to specify a server-unique tag. You could just as well use "foo", "bar", "baz" or "1", "2", "3", or any other identifiers.

Do not use "*" or "$" in tags, these characters are reserved for future use.

Follow @mongodb

MongoDB Pittsburgh - May 15
MongoNYC - May 23
MongoDB Paris - Jun 14
MongoDB UK - Jun 20
MongoDC - June 26


Labels

tagging tagging Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.

blog comments powered by Disqus