How to Find Duplicate Documents in MongoDB – Basic Guide

Find Duplicate Documents Featured Image

In this guide, I will explain how to find duplicate documents in MongoDB.

In MongoDB, you can choose from a variety of operators and features to suit your requirements. A MongoDB database is a one-of-a-kind database that provides you with the ability to build applications that can perform far more than just CRUD operations. With its flexibility and power, MongoDB enjoys great popularity.

As well as providing the ability to query, print, and update data, MongoDB also supports a variety of operators. No matter what level of experience its users have, it offers something for everyone.

In this guide, I will use many operators to demonstrate examples for you to find duplicate documents in MongoDB. So, let us not waste much time and get going!

How to Find Duplicate Documents in MongoDB

Let us get started with using a set of operators with the aggregate() pipeline to find duplicate documents in MongoDB.

  • Start up your MongoDB server
  • Pick the database where you want to work and move into it:
show dbs
use dronesLand
  • Let us take a look at our existing documents in the drones collection without any duplicates. We will then add duplicate documents to our dataset.
> db.drones.find({}).pretty()

{
        "_id" : ObjectId("615c7c3683a7f705b8ecf0c6"),
        "utility" : [
                "Recreation"
        ],
        "onSale" : false,
        "name" : "Yuva Plum Flyer 75",
        "price" : 6770,
        "weight" : "2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7c7e83a7f705b8ecf0c7"),
        "utility" : [
                "Security"
        ],
        "onSale" : false,
        "name" : "Red Hawk Ares 9000",
        "price" : 3400,
        "weight" : "3 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7ce483a7f705b8ecf0c8"),
        "utility" : [
                "Monitoring or Inspection"
        ],
        "onSale" : false,
        "name" : "Cattani Mustang 65",
        "price" : 8900,
        "weight" : "5 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7d1b83a7f705b8ecf0c9"),
        "utility" : [
                "Photography"
        ],
        "onSale" : false,
        "name" : "Flovera Julia Cranberry Red",
        "price" : 7800,
        "weight" : "1.2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7dd383a7f705b8ecf0ca"),
        "utility" : [
                "Delivery"
        ],
        "onSale" : false,
        "name" : "X-fin Rafael 5442",
        "price" : 17000,
        "weight" : "50 kilograms",
        "__v" : 0
}
  • Now, we will add duplicate documents and I will show you how the collection looks. Here’s what our collection with duplicate documents looks like:
> db.drones.find({}).pretty()

{
        "_id" : ObjectId("615c7c3683a7f705b8ecf0c6"),
        "utility" : [
                "Recreation"
        ],
        "onSale" : false,
        "name" : "Yuva Plum Flyer 75",
        "price" : 6770,
        "weight" : "2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7c7e83a7f705b8ecf0c7"),
        "utility" : [
                "Security"
        ],
        "onSale" : false,
        "name" : "Red Hawk Ares 9000",
        "price" : 3400,
        "weight" : "3 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7ce483a7f705b8ecf0c8"),
        "utility" : [
                "Monitoring or Inspection"
        ],
        "onSale" : false,
        "name" : "Cattani Mustang 65",
        "price" : 8900,
        "weight" : "5 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7d1b83a7f705b8ecf0c9"),
        "utility" : [
                "Photography"
        ],
        "onSale" : false,
        "name" : "Flovera Julia Cranberry Red",
        "price" : 7800,
        "weight" : "1.2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7dd383a7f705b8ecf0ca"),
        "utility" : [
                "Delivery"
        ],
        "onSale" : false,
        "name" : "X-fin Rafael 5442",
        "price" : 17000,
        "weight" : "50 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c905383a7f705b8ecf0cb"),
        "utility" : [
                "Delivery"
        ],
        "onSale" : false,
        "name" : "X-fin Rafael 5442",
        "price" : 17000,
        "weight" : "50 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c9a9383a7f705b8ecf0cc"),
        "utility" : [
                "Recreation"
        ],
        "onSale" : false,
        "name" : "Yuva Plum Flyer 75",
        "price" : 6770,
        "weight" : "2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c9ab783a7f705b8ecf0cd"),
        "utility" : [
                "Monitoring or Inspection"
        ],
        "onSale" : false,
        "name" : "Cattani Mustang 65",
        "price" : 8900,
        "weight" : "5 kilograms",
        "__v" : 0
}
  • Next, we will proceed to find duplicate documents in our collection that have the same name. We will use the aggregrate() pipeline and use the $group and $match operator to achieve this:
> db.drones.aggregate(
...     {"$match": {"name" :{ "$ne" : null } } },
...     {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
...     {"$match": {"count" : {"$gt": 1} } },
...     {"$project": {"name" : "$_id", "_id" : 0} }
... )


{ "name" : "Yuva Plum Flyer 75" }
{ "name" : "X-fin Rafael 5442" }
{ "name" : "Cattani Mustang 65" }

Amazing! We can now see the names listed for us for all the duplicate documents in the database.

Explaining the Code in Brief

Here in the above code, we are simply using the aggregate pipeline to find duplicate documents in our collection in MongoDB.

Below are the following aggregate pipelines we have made use of for our example that we demonstrated to find duplicate documents in MongoDB:

  • First, we group all the records having same names using the $group operator
  • Second, we match those groups that have documents greater than 1.
  • Lastly, we grouped the documents again and projected all the duplicate names as an array.

This way, we have learned how to find duplicate documents in our collection in MongoDB.

Read More: findOneAndUpdate in MongoDB Using Nodejs

Conclusion

MongoDB is a popular database management system service and comes with a lot of advanced functionalities. It is never easy to learn by heart all its operators and features by even the smartest developer. I mean someone could but that’s very rare.

In MongoDB, you can choose from a variety of operators and features to suit your requirements. A MongoDB database is a one-of-a-kind database that provides you with the ability to build applications that can perform far more than just CRUD operations. With its flexibility and power, MongoDB enjoys great popularity.

In this guide, I have used many operators to demonstrate examples for you to find duplicate documents in MongoDB.

Noteworthy References

SO Answer

SO Answer