Introduction to MongoDB¶
MongoDB is a document database that provides high performance, high availability, and easy scalability.
- Document Database
- Documents (objects) map nicely to programming language data types.
- Embedded documents and arrays reduce need for joins.
- Dynamic schema makes polymorphism easier.
- High Performance
- Embedding makes reads and writes fast.
- Indexes can include keys from embedded documents and arrays.
- Optional streaming writes (no acknowledgments).
- High Availability
- Replicated servers with automatic master failover.
- Easy Scalability
- Automatic sharding distributes collection data across machines.
- Eventually-consistent reads can be distributed over replicated servers.
MongoDB Data Model¶
A MongoDB deployment hosts a number of databases. A database holds a set of collections. A collection holds a set of documents. A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.
Queries in MongoDB provides a set of operators to define how the find() method selects documents from a collection based on a query specification document that uses a combination of exact equality matches and conditionals using a query operator.
Although MongoDB supports a “standalone” or single-instance operation, production MongoDB deployments are distributed by default. Replica sets provide high performance replication with automated failover, while sharded clusters make it possible to partition large data sets over many machines transparently to the users. MongoDB users combine replica sets and sharded clusters to provide high levels redundancy for large data sets transparently for applications.
MongoDB Design Philosophy¶
MongoDB wasn’t designed in a lab. We built MongoDB from our own experiences building large scale, high availability, robust systems. We didn’t start from scratch, we really tried to figure out what was broken, and tackle that. So the way I think about MongoDB is that if you take MySql, and change the data model from relational to document based, you get a lot of great features: embedded docs for speed, manageability, agile development with schema-less databases, easier horizontal scalability because joins aren’t as important. There are lots of things that work great in relational databases: indexes, dynamic queries and updates to name a few, and we haven’t changed much there. For example, the way you design your indexes in MongoDB should be exactly the way you do it in MySql or Oracle, you just have the option of indexing an embedded field.
—Eliot Horowitz, MongoDB CTO and Co-founder
- New database technologies are needed to facilitate horizontal scaling of the data layer, easier development, and the ability to store order(s) of magnitude more data than was used in the past.
- A non-relational approach is the best path to database solutions which scale horizontally to many machines.
- It is unacceptable if these new technologies make writing applications harder. Writing code should be faster, easier, and more agile.
- The document data model (JSON/BSON) is easy to code to, easy to manage(dynamic schema), and yields excellent performance by grouping relevant data together internally.
- It is important to keep deep functionality to keep programming fast and simple. While some things must be left out, keep as much as possible – for example secondaries indexes, unique key constraints, atomic operations, multi-document updates.
- Database technology should run anywhere, being available both for running on your own servers or VMs, and also as a cloud pay-for-what-you-use service.
Key MongoDB Features¶
MongoDB focuses on flexibility, power, speed, and ease of use:
MongoDB stores data in JSON documents (which we serialize to BSON). JSON provides a rich data model that seamlessly maps to native programming language types, and the dynamic schema makes it easier to evolve your data model than with a system with enforced schemas such as a RDBMS.
MongoDB provides a lot of the features of a traditional RDBMS such as secondary indexes, dynamic queries, sorting, rich updates, upserts (update if document exists, insert if it doesn’t), and easy aggregation. This gives you the breadth of functionality that you are used to from an RDBMS, with the flexibility and scaling capability that the non-relational model allows.
By keeping related data together in documents, queries can be much faster than in a relational database where related data is separated into multiple tables and then needs to be joined later. MongoDB also makes it easy to scale out your database. Autosharding allows you to scale your cluster linearly by adding more machines. It is possible to increase capacity without any downtime, which is very important on the web when load can increase suddenly and bringing down the website for extended maintenance can cost your business large amounts of revenue.
- Ease of use
MongoDB works hard to be very easy to install, configure, maintain, and use. To this end, MongoDB provides few configuration options, and instead tries to automatically do the “right thing” whenever possible. This means that MongoDB works right out of the box, and you can dive right into developing your application, instead of spending a lot of time fine-tuning obscure database configurations.
MongoDB is a server process that runs on Linux, Windows and OS X. It can be run both as a 32 or 64-bit application. We recommend running in 64-bit mode, since MongoDB is limited to a total data size of about 2GB for all databases in 32-bit mode.
The MongoDB process listens on port 27017 by default (note that this can be set at start time - please see mongod options for more information).
Clients connect to the MongoDB process, optionally authenticate themselves if security is turned on, and perform a sequence of actions, such as inserts, queries and updates.
MongoDB stores its data in files (default location is /data/db/), and uses memory mapped files for data management for efficiency.
MongoDB can also be configured for data replication.
For more information on MongoDB administration, please see the administration guide.