Skip to main content

Lab: MongoDB Basics

Environment Setup

Connect to Atlas Cluster:

mongosh "mongodb+srv://cluster0.css32tr.mongodb.net/myFirstDatabase" --apiVersion 1 --username admin

MongoDB CRUD

Run the below code on mongo console. It will insert 5 documents, which will serve as sample data for the the assignment.

use training
db.languages.insert({"name":"java","type":"object oriented"})
db.languages.insert({"name":"python","type":"general purpose"})
db.languages.insert({"name":"scala","type":"functional"})
db.languages.insert({"name":"c","type":"procedural"})
db.languages.insert({"name":"c++","type":"object oriented"})

Task 1: Insert an entry for 'Haskell' programming language which is of type 'functional

db.languages.insert({"name":"Haskell","type":"functional"})

Task 2: Query for all functional languages

db.languages.find({"type":"functional"})

Task 3: Add ‘Bjarne Stroustrup’ as creator for c++

db.languages.updateMany({"name":"c++"},{$set:{"creator":"Bjarne Stroustrup"}})

Task 4: Delete all functional programming languages

db.languages.remove({"type":"functional"})

Task 5: Disconnect from the mongodb server

exit

MongoDB Indexing

Task 1: Create a collection named bigdata

db.createCollection("bigdata")

Insert documents

  • Let us insert a lot of documents into the newly created collection.
  • This should take around 3 minutes, so please be patient.
  • The code given below will insert 200000 documents into the 'bigdata' collection.
  • Each document would have a field named account_no which is a simple auto increment number.
  • And a field named balance which is a randomly generated number, to simulate the bank balance for the account.

Copy the below code and paste it on the mongo client.

use training
for (i=1;i<=200000;i++){print(i);db.bigdata.insert({"account_no":i,"balance":Math.round(Math.random()*1000000)})}

Task 2: Verify that 200000 documents got inserted

db.bigdata.count()

Task 3: Measure the time taken by a query

Let us run a query and find out how much time it takes to complete.

Let us query for the details of account number 58982.

We will make use of the explain function to find the time taken to run the query in milliseconds.

db.bigdata.find({"account_no":58982}).explain("executionStats").executionStats.executionTimeMillis

Working with indexes

Before you create an index, choose the field you wish to create an index on. It is usually the field that you query most.

Run the below command to create an index on the field account_no.

db.bigdata.createIndex({"account_no":1})

Task 4: Get a list of indexes on the ‘bigdata’ collection.

db.bigdata.getIndexes()

Task 5: Find out how effective an index is

Let us query for the details of account number 69271:

db.bigdata.find({"account_no": 69271}).explain("executionStats").executionStats.executionTimeMillis

Task 6: Delete the index we created earlier

db.bigdata.dropIndex({"account_no":1})

Task 7: Create an index on the balance field

db.bigdata.createIndex({"balance":1})

Task 8: Query for documents with a balance of 10000 and record the time taken

db.bigdata.find({"balance":10000}).explain("executionStats").executionStats.executionTimeMillis

Task 9: Drop the index you have created

db.bigdata.dropIndex({"balance":1})

Task 10: Query for documents with a balance of 10000 and record the time taken, and compare it with the previously recorded time

db.bigdata.find({"balance": 10000}).explain("executionStats").executionStats.executionTimeMillis

Task 11: Disconnect from the mongodb server

exit

MongoDB Aggregation

Load sample data into the training database:

use training
db.marks.insert({"name":"Ramesh","subject":"maths","marks":87})
db.marks.insert({"name":"Ramesh","subject":"english","marks":59})
db.marks.insert({"name":"Ramesh","subject":"science","marks":77})
db.marks.insert({"name":"Rav","subject":"maths","marks":62})
db.marks.insert({"name":"Rav","subject":"english","marks":83})
db.marks.insert({"name":"Rav","subject":"science","marks":71})
db.marks.insert({"name":"Alison","subject":"maths","marks":84})
db.marks.insert({"name":"Alison","subject":"english","marks":82})
db.marks.insert({"name":"Alison","subject":"science","marks":86})
db.marks.insert({"name":"Steve","subject":"maths","marks":81})
db.marks.insert({"name":"Steve","subject":"english","marks":89})
db.marks.insert({"name":"Steve","subject":"science","marks":77})
db.marks.insert({"name":"Jan","subject":"english","marks":0,"reason":"absent"})

Assignment 1

Task 1: Limiting the rows in the output - print only 2 documents from the marks collection

use training
db.marks.aggregate([{"$limit":2}])

Task 2: Sorting based on a column - sorts the documents based on field marks in ascending/descending order

-- ascending
db.marks.aggregate([{"$sort":{"marks":1}}])

--descending
db.marks.aggregate([{"$sort":{"marks":-1}}])

Task 3: Sorting and limiting - create a two stage pipeline that answers the question What are the top 2 marks?

db.marks.aggregate([
{"$sort":{"marks":-1}},
{"$limit":2}
])

Task 4: Group by - prints the average marks across all subjects

db.marks.aggregate([
{
"$group":{
"_id":"$subject",
"average":{"$avg":"$marks"}
}
}
])

Task 5: Write the SQL equivalent of the above query

SELECT subject, average(marks)
FROM marks
GROUP BY subject

Task 6: Write a query to answer this question - Who are the top 2 students by average marks?

This involves:

  • finding the average marks per student.
  • sorting the output based on average marks in descending order.
  • limiting the output to two documents.
db.marks.aggregate([
{
"$group":{
"_id":"$name",
"average":{"$avg":"$marks"}
}
},
{
"$sort":{"average":-1}
},
{
"$limit":2
}
])

Assignment 2

Task 1: Find the total marks for each student across all subjects

db.marks.aggregate([
{
"$group":{"_id":"$name","total":{"$sum":"$marks"}}
}
])

Task 2: Find the maximum marks scored in each subject

db.marks.aggregate([
{
"$group":{"_id":"$subject","max_marks":{"$max":"$marks"}}
}
])

Task 3: Find the minimum marks scored by each student

db.marks.aggregate([
{
"$group":{"_id":"$name","min_marks":{"$min":"$marks"}}
}
])

Task 4: Find the top two subjects based on average marks

db.marks.aggregate([
{
"$group":{
"_id":"$subject",
"average":{"$avg":"$marks"}
}
},
{
"$sort":{"average":-1}
},
{
"$limit":2
}
])

Task 5: Disconnect from the mongodb server

exit