concepts
- config database - the top level database that stores information about servers and where things live.
- shard. this can be either a single server or a replica pair.
- database - one top level namespace. a database can be partitioned or not
- chunk - a region of data from a particular collection. A chunk can be though of as (collectionname,fieldname,lowvalue,highvalue). The range is inclusive on the low end and exclusive on the high end, i.e., [lowvalue,highvalue).
components and database collections
- config database
- config.servers - this contains all of the servers that the system has. These are logical servers. So for a replica pair, the entry would be 192.168.0.10,192.168.0.11
- config.databases - all of the databases known to the system. This contains the primary server for a database, and information about whether its partitioned or not.
- config.shards - a list of all database shards. Each shard is a db pair, each of which runs a db process.
- config.homes - specifies which shard is home for a given client db.
- shard databases
- client.system.chunklocations - the home shard for a given client db contains a client.system.chunklocations collection. this collection lists where to find particular chunks; that is, it maps chunk->shard.
- mongos process
- "routes" request to proper db's, and performs merges. can have a couple per system, or can have 1 per client server.
- gets chunk locations from the client db's home shard. load lazily to avoid using too much mem.
- chunk information is cached by mongos. This information can be stale at a mongos (it is always up to date at the owning shard; you cannot migrate an item if the owning shard is down). If so, the shard contacted will tell us so and we can then retry to the proper location.
db operations
- moveprimary - move a database's primary server
- migrate - migrate a chunk from one machine to another.
- lock and migrate
- shard db's coordinate with home shard to atomically pass over ownership of the chunk (two phase commit)
- split - split a chunk that is growing too large into pieces. as the two new chunks are on the same machine after the split, this is really just a metadata update and very fast.
- reconfiguration operations
- add shard - dbgrid processes should lazy load information on a new (unknown) shard when encountered.
- retire shard - in background gradually migrate all chunks off
minimizing lock time
If a chunk is migrating and is 50MB, that might take 5-10 seconds which is too long for the chunk to be locked.
We could perform the migrate much like Cloner works,where we copy the objects and then apply all operations that happened during copying. This way lock time is minimal.
PLEASE POST QUESTIONS IN THE USER GROUPS FORUM. Post non-question comments and helpful hints here.
blog comments powered by Disqus