What is MongoDB?

Introduction

Many developers start with relational databases and are familiar with them because they have been around for decades. Nowadays, though, programmers use NoSQL databases alongside (or instead) traditional ones, as they perform better in specific scenarios.

MongoDB is one of the most used NoSQL (non-relational structured query language) databases, and you are bound to meet it in person sooner or later. Let`s explore what MongoDB is, when it is used and what its features are.

Comparison

The structure of a relational database looks like a spreadsheet. Data is stored in rows and columns, where each row is a record, and each column is a record field. Tables connect via primary keys. For example, table User has ID and Name. Table Sport has User_id and Name. The user_id from table Sport corresponds to ID field from Users.

example4

In the MongoDB, this example can be represented as follows by merging the data from the two tables:

example3

The idea of MongoDB is to collect all the data in one place instead of in separate tables. In the example, three documents are presented, each with a different ID so that each can be identified, adn the data is stored in key-value pairs. Keys correspond to column names in relational databases, and their values compare to records.

MongoDB`s greatest strengths are that it scales easily and accepts any data without defining its type, therefore, we can have records of the same model that have different data pieces (keys) and different values types for the same key. This cannot happen in relational databases.

Terminology

Document

A document is a basic unit in MongoDB - that's why it is a document-based storage. Each document consists of key-value pairs, and it is optional to state in advance how many such pairs there will be and what type of data they will contain - numbers, text, array, etc. They are similar to JavaScript Object Notation (JSON) but use a variant called Binary JSON (BSON).

_id

It is a mandatory field in every MongoDB document because documents are identified with it and is used as a primary key. If you create a new document without an _id field, MongoDB will automatically create the field. The default format is the ObjectId of the document. See its description here https://www.mongodb.com/docs/manual/reference/method/ObjectId/

Collection

This is a combination of documents that may have its schema. It's equivalent to a table in the relational database.

Database

A database in MongoDB holds a set of collections. Each database may contain zero or more collections.

The following table shows the correspondences in terminology between relational databases with MongoDB.

example7

Use cases

MongoDB is suitable if your data could become extensive and there are no connections between it as in SQL.

A few use cases might be:

A product catalog where each product can have different features and can't be grouped.
An application consisting of varying data formats that you don't need to store data in a particular way or hierarchy.
- For example, mobile or social networking applications by which data comes from multiple sources and have highly dynamic growth.
Real-time data, like periodic sensors
- Marine thermometers/current indicator
- Wind/humidity sensors
Logs
- Access/error logs
- Performance logs

Trying to model relations in MongoDB will result in poor performance and is an anti-pattern.

MongoDB Features

Document-oriented model

All the data is stored in documents in fields (key-value pairs) instead of rows and columns, making the data much more flexible than in RDBMS. Each document has its unique ID, and they are grouped into collections.

Ad-hoc Queries

When we design a database schema, we don't know in advance about the queries we will perform. MongoDB supports ad-hoc queries, which is one of the most significant benefits. Ad-hoc queries are the queries not known while modeling the database. They are short-lived and are created at runtime. For example, suppose a company monitors its average sales daily and one day finds that sales are lower than the previous day. In that case, admins can write a series of ad hoc queries to identify the reasons for the decrease.

Replication

Replication protects a database and recovers it from server crashes, computer network service interruptions, and even hardware failure. The replication process includes copying data from a primary database on one server to a second database on another server.

MongoDB achieves replication by using replica sets. A replica set contains a single primary database (branch) and one or more secondary databases (branches). At a time of failure or maintenance, whenever the primary branch dies, the system elects one of the secondary branches as the primary. After recovery, the failed branch joins the replica set and works as a secondary branch.

Load balancing

MongoDB replication helps the whole client traffic and requests to be well distributed across the different servers to improve performance and reduce congestion. To do so, you can instruct clients to read from secondaries instead of the primary server. This will reduce the load on the primary server, which will only take care of writing queries.

Schema-less database

MongoDB is a schema-less database that can store different types of data in documents in a single collection. MongoDB usually focuses on keeping everything in just one document by reducing the number of relations between data, in contrast to SQL.

These two factors allow developers to be flexible, which is one of MongoDB's leading powers.

Sharding

In MongoDB, sharding splits the data sets and distributes them across multiple machines. Each machine is called a database shard and contains only a portion of the database. Breaking the data this way allows for the smaller size of each shard which might be crucial if your database is enormous and cannot be handled by a single server.

By adding more servers, sharding can automatically balance the load across the various shard servers and increase the database speed. Sharding can also be paired with replication. In this scenario, each document is replicated in two or more shards making the database setup fault tolerant at the expense of taking up more storage.

mongo-example

Indexing

MongoDB uses indexes to sieve out information rather than searching each document, one after the other, for a particular entry. This indexing feature of MongoDB reduces the time in solving queries and improves performance. Indexes are defined at the collection level and are supported on any field or subfield of the documents in a collection.

Change streams

MongoDB’s Change Streams are an abstraction over "oplog" (operations log) - the collection that records all CRUD operations. Change streams watch for data changes to documents in real-time and emit events to which you can subscribe and take action.

Change streams serve the same function as triggers in popular SQL databases.

Conclusion

MongoDB is a superb schemaless document database that saves time by allowing rapid development. It is a high-performing, cost-effective database that scales horizontally across multiple servers and works easily with large amounts of scattered data.

It is suitable for Mobile Applications, high-traffic applications, real-time applications, and big data analysis which makes it a big hit in the IT world. You already know the basics, so go ahead and build your next passion project with MongoDB. It will surely help you stand out from the crowd when looking for a new job.

Q&A

: MongoDB is a NoSQL database that stores data in a document-oriented manner using key-value pairs. It's commonly chosen for applications with extensive data and no complex relationships, such as mobile apps, high-traffic systems, real-time apps, and comprehensive data analysis.
: Unlike relational databases with tables and rows, MongoDB uses flexible documents with key-value pairs. It supports ad-hoc queries, data recovery replication, data distribution sharding, and change streams for real-time monitoring.
: MongoDB features include a document-oriented model, ad-hoc queries, replication, load balancing, schema-less database structure, sharding for large datasets, indexing for query performance, and change streams for real-time data changes. These features suit applications needing flexibility, scalability, and real-time processing.