How to Find Duplicate Documents in MongoDB – Basic Guide

Finding duplicate documents is challenging in MongoDB. A MongoDB database is a type of database that gives you the ability to create applications that can perform much more than just CRUD operations, providing the ability to query, print, and update data, it supports many operators that can be used to find duplicates.

In this guide, we’ll implement easy-to-use operators to demonstrate examples of finding duplicate documents in MongoDB.

Finding Duplicate Documents in MongoDB

We will use a set of operators with the aggregate() method to MongoDB find duplicates.

Example Documents:

We have already inserted some documents in the “drones” collection to display you as an example, those documents do not have any duplicate values ​​yet.

> db.drones.find({}).pretty()

{
        "_id" : ObjectId("615c7c3683a7f705b8ecf0c6"),
        "utility" : [
                "Recreation"
        ],
        "onSale" : false,
        "name" : "Yuva Plum Flyer 75",
        "price" : 6770,
        "weight" : "2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7c7e83a7f705b8ecf0c7"),
        "utility" : [
                "Security"
        ],
        "onSale" : false,
        "name" : "Red Hawk Ares 9000",
        "price" : 3400,
        "weight" : "3 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7ce483a7f705b8ecf0c8"),
        "utility" : [
                "Monitoring or Inspection"
        ],
        "onSale" : false,
        "name" : "Cattani Mustang 65",
        "price" : 8900,
        "weight" : "5 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7d1b83a7f705b8ecf0c9"),
        "utility" : [
                "Photography"
        ],
        "onSale" : false,
        "name" : "Flovera Julia Cranberry Red",
        "price" : 7800,
        "weight" : "1.2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7dd383a7f705b8ecf0ca"),
        "utility" : [
                "Delivery"
        ],
        "onSale" : false,
        "name" : "X-fin Rafael 5442",
        "price" : 17000,
        "weight" : "50 kilograms",
        "__v" : 0
}

In the above code, we have used a method pretty() to make the resulting documents readable and attractive, ultimately making them more legible.

We have a separate tutorial on this, click here to read it.

Updated Document:

Now, let’s add some duplicate documents so that we can see how we can find duplicate documents among them. The collection contains the following documents after duplicates have been added.

> db.drones.find({}).pretty()

{
        "_id" : ObjectId("615c7c3683a7f705b8ecf0c6"),
        "utility" : [
                "Recreation"
        ],
        "onSale" : false,
        "name" : "Yuva Plum Flyer 75",
        "price" : 6770,
        "weight" : "2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7c7e83a7f705b8ecf0c7"),
        "utility" : [
                "Security"
        ],
        "onSale" : false,
        "name" : "Red Hawk Ares 9000",
        "price" : 3400,
        "weight" : "3 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7ce483a7f705b8ecf0c8"),
        "utility" : [
                "Monitoring or Inspection"
        ],
        "onSale" : false,
        "name" : "Cattani Mustang 65",
        "price" : 8900,
        "weight" : "5 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7d1b83a7f705b8ecf0c9"),
        "utility" : [
                "Photography"
        ],
        "onSale" : false,
        "name" : "Flovera Julia Cranberry Red",
        "price" : 7800,
        "weight" : "1.2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c7dd383a7f705b8ecf0ca"),
        "utility" : [
                "Delivery"
        ],
        "onSale" : false,
        "name" : "X-fin Rafael 5442",
        "price" : 17000,
        "weight" : "50 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c905383a7f705b8ecf0cb"),
        "utility" : [
                "Delivery"
        ],
        "onSale" : false,
        "name" : "X-fin Rafael 5442",
        "price" : 17000,
        "weight" : "50 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c9a9383a7f705b8ecf0cc"),
        "utility" : [
                "Recreation"
        ],
        "onSale" : false,
        "name" : "Yuva Plum Flyer 75",
        "price" : 6770,
        "weight" : "2 kilograms",
        "__v" : 0
}
{
        "_id" : ObjectId("615c9ab783a7f705b8ecf0cd"),
        "utility" : [
                "Monitoring or Inspection"
        ],
        "onSale" : false,
        "name" : "Cattani Mustang 65",
        "price" : 8900,
        "weight" : "5 kilograms",
        "__v" : 0
}

Finding Duplicates:

Next, let’s move on to finding duplicate documents in the updated collection. We’ll use the aggregate() pipeline and use the $group and $match operators to achieve this:

> db.drones.aggregate(
...     {"$match": {"name" :{ "$ne" : null } } },
...     {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
...     {"$match": {"count" : {"$gt": 1} } },
...     {"$project": {"name" : "$_id", "_id" : 0} }
... )


{ "name" : "Yuva Plum Flyer 75" }
{ "name" : "X-fin Rafael 5442" }
{ "name" : "Cattani Mustang 65" }

Wonderful! Now we can see the names listed for us for all the duplicate documents in the database.

Code Explanation:

Here in the above code, we are simply using the aggregate method to find duplicate documents in a given collection.

Below are the following aggregate pipelines we have made use of for this example that we demonstrated to find duplicate documents in MongoDB:

  • First, we group all the records having the same names using the $group operator
  • Second, we match those groups that have documents greater than 1 using the $match operator.
  • Lastly, we grouped the documents again and then projected all the duplicate names as an array using the $project operator.

This way, we have learned how to find duplicate documents in a collection in MongoDB.

Read More: findOneAndUpdate in MongoDB Using Nodejs

Conclusion

Sometimes, we need to find duplicates in a collection to remove or combine them to free up some space. Finding duplicates in MongoDB is complicated but in this tutorial, we have used the easiest way to do that, hope you like this tutorial.

Reference

https://stackoverflow.com/questions/26984799/find-duplicate-records-in-mongodb/26984964#26984964

Aneesha S
Aneesha S
Articles: 172