NoSQL is popular when dealing with big data. Since MongoDB is the leading open-source N0SQL database, you might have to work with MongoDB as a Python developer. There are several benefits offered by NoSQL over SQL Database.
- There is no need to first define the structure of Documents and then create data, unlike SQL databases where the structure of Tables should be defined first then insert data.
- Each document can have a different unique structure.
- It can handle large volumes of structured, semi-structured, and unstructured data as well whereas SQL Database follows a structured format. Hence MongoDB is a good choice for Big Data.
- It supports associate arrays similar to Dictionaries in Python.
- Deployment of MongoDB is very simple which makes it a good option.
MongoDB stores data in collections unlike Tables in RDBMS. And the data records are stored as BSON documents which is a binary representation of JSON documents. MongoDB documents are composed of field-and-value pairs like dictionary Key-value pairs in Python. Below is the sample JSON document image:
MongoDB is a document-oriented database with increased performance. This uses document-based query language which is easier than SQL but robust as SQL.
Getting started with Python MongoDB
MongoDB has a native Python driver, PyMongo, which is provided by MongoDB so that Python and MongoDB can work together smoothly. Python also has a pymongo library to work with MongoDB.
1. Install the Python Driver:
The first step is to install the driver if not yet installed on the system. Installation can be done by pip command on the command prompt.
Now you can run the below command on your Python console and verify if it runs without error. If so, installation is successful and If not, i.e. command runs with errors, then installation has some issues which need to be rectified.
2. Connecting to MongoDB from Python:
First, start the MongoDB by using the below command on the command prompt:
Then this running instance of MongoDB has to be connected with python using the pymongo library. A MongoDB connection has to be built using MongoClient. There are two methods to create a client either using MongoDB URL or providing host and port number. Here is the syntax:
client = MongoClient(‘MONGODB URL/host,port_number’)
Here we are initializing using localhost and default port 27017.
client = MongoClient(‘localhost’, 27017)
OR
client = MongoClient(“mongodb://localhost:27017/”)
3. Create a new or connect to the existing Database:
The syntax of creating or connecting to existing database is :
Mydatabase_objectname = client.name_of_the_database
Once the client is in place, we can access the existing database using the below code:
If the resalebusiness database exists in the MongoDB database will be connected, if not, a new database resalebusiness will be created.
4. Accessing the Collection:
A collection stores a set of documents and is similar to the table in RDBMS. The collection holds records in the dictionary format as python. The syntax is as below:
Collection name= database name.tablename
We have taken resalecoll as collection name and resaleitems as table name. resalebusiness is our database name which we have created in the previous step.
5. Inserting Data in collection:
MongoDB stores data in BSON/JSON format. So the next step is to create a record which we want to insert into the resaleitems table. Here we have created 3 records for insertion.
name: 'Mathew Haag',
country:’Ottawa’,
resaleasset: ['Smart TV', 'Car', 'Bed'],
totalassetvalue : ‘$50000’
}
rec_item2={
name: 'Peter Wank',
country:’Ottawa’,
resaleasset: ['Refrigerator', 'Car', 'Bed'],
totalassetvalue : ‘$30000’
}
rec_item3={
name: 'Samual Paul',
country:’Toronto’,
resaleasset: ['Oven', 'Car', 'Bed'],
totalassetvalue : ‘$10000’
}
# Now inserting into the table
result1 = resalecoll.insert(rec_item1)
result2 = resalecoll.insert(rec_item2)
result2 = resalecoll.insert(rec_item3)
Or below one line of code can also be written to add records in a single command.
print('Multiple records: {0}'.format(result.inserted_ids))
Multiple records: [
ObjectId('2222747dea542a13e9ec7ae7'),
ObjectId('2222747dea542a13e9ec7ae8'),
ObjectId('2222747dea542a13e9ec7ae9')
]
ObjectID is generated dynamically when you insert data. It consists of a Unix epoch, machine identifier, and other unique data.
6. Querying in MongoDB:
The find_one() method is used to retrieve a document. This fetch a single document. Suppose you want to retrieve a document resaleitems whose country is “Toronto”. Below code snippet can be run to achieve this:
print(toronto_items)
{ ‘name’: 'Samual Paul',
‘country’:’Toronto’,
‘resaleasset’: ['Oven', 'Car', 'Bed'],
‘totalassetvalue’ : ‘$10000’,
‘_id’ : ObjectId(‘2222747dea542a13e9ec7ae9’)
}
ObjectId is associated with each record while inserting data in the collection as ‘_id’. This is the same object ID which is generated dynamically which we saw in the output section while inserting rec_item3 in the collection.Now, there is one method, find(), to fetch many records from documents. Let say the requirement is to fetch resale items client details available in the country “Ottawa”. Below is the sample code to fetch many records. The only thing that is important to notice is that this method returns the result set in the cursor object. We have to iterate over the cursor to see each data.
for items in ottawa_items:
print(items)
This will display the 2 record set of Ottawa country in the collection. The cursor also supports cursor methods like count. If the requirement is just to count how many records of Ottawa country are available in the collection, the below code can be run to see the count.
print(ottawa_reseale_count)
2
7. Updating a record:
There are functions to update your MongoDB data similar to insert_one and insert_many. These are update_one, update_many, and replace_one. The update_one method will update a single document based on a query result set. For example in the above record set only, you want to update the totalassetvalue of client based in Toronto. In this below program, first set up client connection with MongoDB database resalebusiness and to table resaleitems. Then resalecoll collection has been connected. First, we will find the Toronto record using the find_one() method. Since ‘_id’ is unique, update the totalassetvalue field using update_one() for given _id value. The $set is used to provide new value which needs to be updated.
#setup the MongoClient connection to your MongoDB database instance
client = MongoClient(port=27020)
db=client.business
client = MongoClient(‘localhost’, 27017)
db=client.resalebusiness
resalecoll = db.resaleitems
record_one = resalecoll.find_one({'country': 'Toronto'})
print('Toronto record:')
pprint(record_one)
result = resalecoll.update_one({'_id' : record_one.get('_id') }, { "$set": { "totalassetvalue": "$50000" } })
print('Number of documents modified : ' + str(result.modified_count))
UpdatedDocument = resalecoll.find_one({'_id': record_one.get('_id')})
print('The updated document:')
pprint(UpdatedDocument)
{
name: 'Samual Paul',
country:’Toronto’,
resaleasset: ['Oven', 'Car', 'Bed'],
totalassetvalue : ‘$50000’
}
MongoDB’s connection with Python is highly flexible and easy. Retrieving and updating unstructured data in MongoDB is simple with python driver. Library pymongo has made things robust and simple. There are many other methods available in the pymongo library to manipulate data which needs to be dogged more. MongoDB is a good fit for the Big data and Python is a popular robust programming language. The combination of MongoDB and Python is a must to learn.
To explore more details on MongoDB, refer to below tutorials:
MongoDB Basics Tutorial