Building “Related Post” Engine Using MongoDB Aggregation

Moving to custom Node.js stack from WordPress creates many challenges. One of them is to custom code the plugins feature such as the Related post plugin. It’s so easy to do that in WordPress, just add a plugin and you are pretty much done.

Also Read: Migrating WordPress Content to MongoDB

I had to custom code the “Related Post” section and in this article, I will explain how I achieved it.

So what is a related post? At the bottom of each article, you should be seeing this:

Related post engine using MongoDB aggregation

This shows that there are more posts related to the one reader is reading at the moment.

It’s not completely accurate but it works and I generated it using the following steps:

Step 1: Tagging data

In order to generate posts based on the one reader is currently viewing, we need information about the post i.e what type of post is this? Is it based on Nodejs or databases or both?

To achieve this information we need to perform the relative tagging to each post. Lucky for me, WordPress already had categories and tags as metadata for each post.

So each article in my MongoDB database looks like this:

> db.posts.findOne();
{
    "_id" : ObjectId("5d246ef5f45b115cde3009bc"),
    "id" : 6354,
    "title" : "5 Free Python Programming Courses for Beginners",
    "date" : "2019-07-08T17:29:49",
    "url" : "https://codeforgeek.com/free-python-programming-courses-for-beginners/",
    "slug" : "free-python-programming-courses-for-beginners",
    "status" : "publish",
    "type" : "post",
    "excerpt" : "-----------------",
    "content" : "-----------------",
    "author" : 1,
    "categories" : [
        {
            "name" : "Python",
            "slug" : "python"
        }
    ],
    "tags" : [
        {
            "name" : "python",
            "slug" : "python"
        },
        {
            "name" : "Tips",
            "slug" : "tips"
        }
    ],
    "featured_image" : "-------------------",
    "pageviews" : 197
}

Notice the category and tags, we store categories and tags in a separate collection as well.

Step 2: Running aggregation query

MongoDB aggregation is a data processing pipeline where documents enter various stages of the pipeline and transform itself into an aggregated result. In order to generate “Related post” result for our site, we also need to look up in the collection for the matching posts with one or more matching categories and tags.

So the process would go like this:

  • Look up in the posts and match them with categories.
  • Pass the matched data to the next stage and match it with tags.
  • Sample the result in a maximum of four records.

Here is a live MongoDB query that our site currently using:

function getRelatedArticle(data, callback) {
    dbo.collection('posts').aggregate([{
            $match: {
                "categories.slug": {
                    $in: data.categories
                }
            }
        },
        {
            $match: {
                "tags.slug": {
                    $in: data.tags
                }
            }
        },
        {
            $sample: {
                size: 4
            }
        }
    ]).toArray((err, records) => {
        if (err) {
            return callback(true, 'error retrieving related records');
        }
        callback(false, records);
    });
}

The data variable contains tags and categories of the current post user is reading. We use those tags and categories to look up for the related posts inside the collection. This is of course not accurate but it works as of now.

Also, we cache the returned result in Redis so that we don’t run aggregation queries every single time the user requests a page.

I am also thinking to add a few more parameters to get more accurate results such as filtering with page views etc.

Conclusion

MongoDB aggregation is a powerful tool to achieve complex results by breaking down the query in various pipelines and combining them together at the end to yield the result. We used this to generated this feature, you can also use the same any purpose you seem fit.