|
On the master mongod instance, the local database will contain a collection, oplog.$main, which stores a high-level transaction log. The transaction log essentially describes all actions performed by the user, such as "insert this object into this collection." Note that the oplog is not a low-level redo log, so it does not record operations on the byte/disk level. The slave mongod instance polls the oplog.$main collection from master. The actual query looks like this:
local.oplog.$main.find({ ts: { $gte: ''last_op_processed_time'' } }).sort({$natural:1});
where 'local' is the master instance's local database. oplog.$main collection is a capped collection, allowing the oldest data to be aged out automatically. See the Replication section of the Mongo Developers' Guide for more information. OpTimeAn OpTime is a 64-bit timestamp that we use to timestamp operations. These are stored as Javascript Date datatypes but are not JavaScript Date objects. Implementation details can be found in the OpTime class in repl.h. Applying OpTime OperationsOperations from the oplog are applied on the slave by reexecuting the operation. Naturally, the log includes write operations only. Note that inserts are transformed into upserts to ensure consistency on repeated operations. For example, if the slave crashes, we won't know exactly which operations have been applied. So if we're left with operations 1, 2, 3, 4, and 5, and if we then apply 1, 2, 3, 2, 3, 4, 5, we should achieve the same results. This repeatability property is also used for the initial cloning of the replica. TailingAfter applying operations, we want to wait a moment and then poll again for new data with our $gteoperation. We want this operation to be fast, quickly skipping past old data we have already processed. However, we do not want to build an index on ts, as indexing can be somewhat expensive, and the oplog is write-heavy. Instead, we use a table scan in [natural] order, but use a tailable cursor to "remember" our position. Thus, we only scan once, and then when we poll again, we know where to begin. InitiationTo create a new replica, we do the following: t = now(); cloneDatabase(); end = now(); applyOperations(t..end); cloneDatabaseeffectively exports/imports all the data in the database. Note the actual "image" we will get may or may not include data modifications in the time range (t..end). Thus, we apply all logged operations from that range when the cloning is complete. Because of our repeatability property, this is safe. See class Cloner for more information. |

PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus