Cluster in Node.js tutorial

If you run your single Node server on a specific port, it runs on a single thread. It takes advantage of your Single core only and as far as my knowledge every server out there consist of more than one cores.

So is your Node application taking advantages of those cores and utilizing your resources ? Well if you are not running it on the cluster, then probably you are wasting your hardware capabilities.

In this tutorial, i am going to explain one of the awesome feature of Node.js which made it different from rest of backend technologies. It’s “Clustering”.

What is clustering ?

Clustering in Node.js allows you to create separate processes which can share same server port. For example, if we run one HTTP server on Port 3000, it is one Server running on Single thread on a single core of the processor.

But I want to take advantage of all core available in my machine. So I will cluster my application and run them on all cores. So if I run one server on Port 3000 by having 4 core of processor then actually I am running 4 servers all are listening to Port 3000.

So if one server goes down then other is there to take the place of it, also in peak load of traffic, Node will automatically allocate the worker to particular process so basically it does internal load balancing very efficiently.

Code which does this Magic.

Code shown below allow you to cluster your application. This code is official code represented by Node.js.

var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Fork workers.
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

   Object.keys(cluster.workers).forEach(function(id) {
    console.log("I am running with ID : "+cluster.workers[id].process.pid);
  });

  cluster.on('exit', function(worker, code, signal) {
    console.log('worker ' + worker.process.pid + ' died');
  });
} else {

      //Do further processing.
}

If you observe the code, we are loading Cluster module in the first line. Then we are counting how many cores do we have in our machine. Then in “if” condition , we are checking that if its Master process then creates the copy of the same process by a number of core time.

If it is not the Master machine then basically run our normal Node program. So there would be one process which is master which in turn create processes number of core time.

In my case, i have Intel i5 consist of 4 core processor. If i run this code, i will get this output in terminal.

nodejs clustering

Our project:

I have covered Express so far as default HTTP server. So for demo purpose i am going to build simple Express program using clustering.

Directory structure:

------node_modules
      |---express
--- app.js
--- cluster.js
--- package.json

So here is package.json.

package.json
{
  "name": "cluster-demo",
  "version": "0.0.1",
  "dependencies": {
    "express": "^4.10.6"
  }
}

Install Express by running.

npm install

Here is our express code.

app.js
var express=require("express");
var app=express();

app.get('/',function(req,res){

          res.end("Hello world !");

});

app.listen(3000,function(){

          console.log("Running at PORT 3000");

});

Here is main file. In order to use this file in your production server, just change the last line.

cluster.js
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {

  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', function(worker, code, signal) {
    console.log('worker ' + worker.process.pid + ' died');
  });
} else {

    //change this line to Your Node.js app entry point.
    require("./app.js");
}

So now to run your server type,

node cluster.js

You should see this output with above code.
Screenshot from 2014-12-28 23:09:54

Performance analysis:

If we tweak above code little bit and add line to show Process ID which serve the request, we will get following response if we try to access localhost:3000 in different browser or client.
Screenshot from 2014-12-28 23:13:47
Notice anything in process IDs, Since i am running this code on Localhost most i can find to prove clustering is this only, although if we run this code on production workload we may get different response.

For example, suppose you are getting 1000 request in a minute to Port 8000. Before clustering, all request will be served by this instance of the program.

After clustering , you have 4 copy of your program running and listening on Port 3000. So just for example, in one minute 1000 request will be divided to 250 for each core. So together performance will increase.

This is just, for example, internal load balancing is done by the node itself.

Conclusion:

In the local environment, you may not be able to judge exact performance improvement but I believe in production workload this technique will help you a lot to improve the throughput of your System and utilize resources at the top.

This is all about clustering in node.js. Share your views in comments.

Shahid (UnixRoot) Shaikh

Hey there, This is Shahid, an Engineer and Blogger from Bombay. I am also an Author and i wrote a programming book on Sails.js, MVC framework for Node.js.

Related Posts

10 Comments

    1. Any special use of sticky load balancing ? My concern is load balancing is all about distributing traffic or load from one payload to other (if available) and making sure that no node gets more load than its suppose to.

      1. First, I want to say that this was a fantastic article, and many thanks to the author.

        As to the sticky load balancing, in my experience, sticky load balancing has come into play in e-commerce situations where things like shopping carts are heavily dependent upon maintaining a given TLS session. For example, in our environment we have dozens of e-commerce servers behind a load balancer. If a user was interacting with Server A and then suddenly be directed to Server B the next time it hit the load balancer, it would break the way our e-comm folks have the shopping cart setup. I am not saying that I agree with how they have it setup (I am in security architecture, not e-comm), but that is how they do it. In our case I have recommended setting up a load balanced TLS session cache cluster so any server could pick up for any other server’s TLS session as needed, eliminating the need for sticky load balancing.

        There are other cases, I am sure, but that was the first thing that came to mind as I read the post.

        Kurt

        1. Hi Kurt,
          Thanks for the nice explanation about scenario. I seriously had no clue that we can use sticky balancing in e-commerce. Idea of TLS caching cluster is really impressive. Thanks again for comment.

          1. Hi Shahid,

            websockets make use of sticky load balancers like the one mentioned.

            nice article, keep it up.
            Tom

  1. I really liked the way you have explained everything. I found description of explanation small but that clears everything. I really appreciate it. Thanks a lot for the lovely articles.

  2. Hi Shahid,
    One small doubt, in first screen you use the `Cluster.workers[id].process.pid`. Form where do you get this information that process only have `pid`. Because I use any other name, it will return `undefined`. Please explain me and explain me from & how can we gather those info.
    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *