Monday, May 11, 2020

Introduction to MongoDB

MongoDB is a cross-platform, open-source, NoSQL database, used by many modern Node-based web applications to persist data.

MongoDB is a document-oriented database. This means that it doesn’t use tables and rows to store its data, but instead collections of JSON-like documents. These documents support embedded fields, so related data can be stored within them.

MongoDB is also a schema-less database, By means of ‘schema less’ is that we can store different documents having different schema inside a same collection.

Here’s an example of what a MongoDB document might look like:

{
    _idObjectId(3da252d3902a),
    type"Tutorial",
    title"An Introduction to MongoDB",
    author"Pankaj_Kapoor",
    tags: [ "mongodb""compass""crud" ],
    categories: [
      {
        name"javascript",
        description"Tutorialss on client-side and server-side JavaScript programming"
      },
      {
        name"databases",
        description"Tutorialss on different kinds of databases and their management"
      },
    ],
    content"MongoDB is a cross-platform, open-source, NoSQL database..."

  } 

As you can see, the document has a number of fields (typetitle etc.), which store values (“Tutorial”, “An Introduction to MongoDB” etc.). These values can contain strings, numbers, arrays, arrays of sub-documents (for example, the categories field), geo-coordinates and more.

The _id field name is reserved for use as a primary key. Its value must be unique in the collection, it’s immutable, and it may be of any type other than an array.

History of MongoDB

MongoDB was created by Eliot and Dwight (founders of DoubleClick) in 2007, when they faced scalability issues while working with relational database. The organization that developed MongoDB was originally known as 10gen.

In Feb 2009, they changed their business model and released MongoDB as an open source Project. The organization changed its name in 2013 and now known as MongoDB Inc.

Important Features of MongoDB

  • Queries: It supports ad-hoc queries and document-based queries.
  • Index Support: Any field in the document can be indexed.
  • Replication: It supports Master–Slave replication. MongoDB uses native application to maintain multiple copies of data. Preventing database downtime is one of the replica set’s features as it has self-healing shard.
  • Multiple Servers: The database can run over multiple servers. Data is duplicated to foolproof the system in the case of hardware failure.
  • Auto-sharding: This process distributes data across multiple physical partitions called shards. Due to sharding, MongoDB has an automatic load balancing feature.
  • MapReduce: It supports MapReduce and flexible aggregation tools.
  • Failure Handling: In MongoDB, it’s easy to cope with cases of failures. Huge numbers of replicas give out increased protection and data availability against database downtime like rack failures, multiple machine failures, and data center failures, or even network partitions.
  • GridFS: Without complicating your stack, any sizes of files can be stored. GridFS feature divides files into smaller parts and stores them as separate documents.
  • Schema-less Database: It is a schema-less database written in C++.
  • Document-oriented Storage: It uses BSON format which is a JSON-like format.
  • Procedures: MongoDB JavaScript works well as the database uses the language instead of procedures.

Where do we use MongoDB?

MongoDB is preferred over RDBMS in the following scenarios:

·       Big Data: If you have huge amount of data to be stored in tables, think of MongoDB before RDBMS databases. MongoDB has built in solution for partitioning and sharding your database.

·       Unstable Schema: Adding a new column in RDBMS is hard whereas MongoDB is schema-less. Adding a new field, does not effect old documents and will be very easy.

·       Distributed data Since multiple copies of data  are stored across different servers, recovery of data is instant and safe even if there is a hardware failure.

Language Support by MongoDB

MongoDB currently provides official driver support for all popular programming languages like C, C++, C#, Java, Node.js, Perl, PHP, Python, Ruby, Scala, Go and Erlang.

Where to Use MongoDB?

·      Big Data

·      Content Management and Delivery

·      Mobile and Social Infrastructure

·      User Data Management

·      Data Hub 

Organizations that use MongoDB

Below are some of the big and notable organizations which are using MongoDB as database for most of their business applications.

  • Adobe
  • LinkedIn
  • McAfee
  • FourSquare
  • eBay
  • MetLife
  • SAP

 

MongoDB  Architecture

MongoDB consists of a set of databases. Each database again consists of Collections. Data in MongoDB is stored in collections. The below figure depicts the typical database structure in MongoDB.

Database in MongoDB

Database in MongoDB is nothing but a container for collections. We will learn how to create a new Database, drop a Database and how to use an existing Database in the coming lessons.

Collections in MongoDB

Collection is nothing but a set of MongoDB documents. These documents are equivalent to the row of data in tables in RDBMS. But, collections in MongoDB do not relate to any set schema as compared to RDBMS. Collections are a way of storing related data. Being schemaless, any type of Document can be saved in a collection. Document's can have a maximum size of 4MB. A collection is physically created as soon as the first document is created in it.

Document in MongoDB

Document in MongoDB is nothing but the set of key-value pairs. These documents will have dynamic schema which means that the documents in the same collection do not need to possess the same set of fields.

Since MongoDB is considered as a schema-less database, each collection can hold different type of objects. Every object in a collection is known as Document, which is represented in a JSON like (JavaScript Object Notation) structure(nothing but a list of key-value pair). Data is stored and queried in BSON, its binary representation of JSON-like data.  


Fields

Fields (key and value pairs) are stored in document, documents are stored in collection and collections are stored in database.

This is how a document looks in MongoDB: As you can see this is similar to the row in RDBMS. The only difference is that they are in JSON format.


Table vs Collection

Here we will see how a table in relational database looks in MongoDB. As you see columns are represented as key-value pairs(JSON Format), rows are represented as documents. MongoDB automatically inserts a unique _id(12-byte field) field in every document, this serves as primary key for each document.


Another cool thing about MongoDB is that it supports dynamic schema which means one document of a collection can have 4 fields while the other document has only 3 fields. This is not possible in relational database.

Mapping Relational Databases to MongoDB

If you are coming from a relational database background then it might be difficult for you to relate the RDBMS terms with MongoDB. In this guide, we will see the mapping between relational database and MongoDB.

Mapping relational database to MongoDB


Collections in MongoDB is equivalent to the tables in RDBMS.
Documents in MongoDB is equivalent to the rows in RDBMS.
Fields in MongoDB is equivalent to the columns in RDBMS.

Difference between MongoDB & SQL Database

Below are some of the key term differences between SQL Database and MongoDB 

SQL Database

NoSQL Database (MongoDB)

Relational database

Non-relational database

Supports SQL query language

Supports JSON query language

Table based

Collection based and key-value pair

Row based

Document based

Column based

Field based

Support foreign key

No support for foreign key

Support for triggers

No Support for triggers

Contains schema which is predefined

Contains dynamic schema

Not fit for hierarchical data storage

Best fit for hierarchical data storage

Vertically scalable - increasing RAM

Horizontally scalable - add more servers

Emphasizes on ACID properties (Atomicity, Consistency, Isolation and Durability)

Emphasizes on CAP theorem (Consistency, Availability and Partition tolerance)