Sandeep
NodejsMadeEasy
Published in
7 min readDec 17, 2019

--

Workers Threads In Node.js (Overview)

Workers thread were introduced in Node v10.x as an experimental API and now stable since v12.x. As we all know Nodejs single-threaded(Though it has Libuv thread pool) model is well suited for non-blocking IO operation. But Node.js shows poor performance when it runs CPU intensive (blocking code) because Node executes blocking code in the main thread and blocks other code for execution. The main purpose of this post is to cover:

  • Overview of worker threads.
  • How are they different than normal threads?
  • Why and when to use worker threads?

CPU Intensive tasks

What happens if we need to do synchronous intense stuff? Such as doing complex calculations in memory in a large dataset? Then we might have a synchronous block of code that takes a lot of time and will block the rest of the code. Imagine that a calculation takes 10s. If we are running a web server that means that all of the other requests get blocked for at least 10s because of that calculation. That’s a disaster. Anything more than 100ms could be too much.

Node.js golden rule
Don’t block the event loop, keep it running and avoid anything that could block the thread-like synchronous network calls or infinite loops.

JavaScript and Node.js were not meant to be used for CPU-bound tasks. Since JavaScript is single-threaded this will freeze the UI in the browser and queue any I/O event in Node.js.

It’s important to differentiate between CPU operations and I/O (input/output) operations. The code of Node.js is NOT executed in parallel. Only I/O operations are run in parallel, because they are executed asynchronously.

Existing solutions

Nodejs already have APIs like ‘cluster’ and ‘child-process’ and are stable can be used to achieve parallelism and execute CPU intensive tasks. But may not be the ideal solution because of:

  1. Cluster API: The cluster module achieves concurrency by spawning worker(child) process across the CPU cores. Worker processes are spawned using child_process.fork() method, so that they can communicate with the parent via IPC and pass server handles back and forth. But What if you have only a single-core CPU.
  2. Child-Process API: The child_process.spawn(),child_process.fork() method creates the child process asynchronously, without blocking the Node.js event loop. The child_process.spawnSync() function provides equivalent functionality in a synchronous manner that blocks the event loop until the spawned process either exits or is terminated.
  3. But creating a process is not always a good solution unless required, because creating a worker(child) process consumes lots of system resources.

Keep in mind:

1. Creating a child process is not cheap operation it is costly in terms of OS resources consumption.

2. Threads are still very lightweight in terms of resources compared to forked processes. And this is the reason why worker threads were born!

So what’s the Solution for Nodejs?

The answer is: Worker Threads

Worker threads are somewhat analogous to WebWorkers in the browser. Browsers have had the concept of Workers for a long time. Worker threads enable the use of threads that execute JavaScript in parallel.

One way of making a blocking code non-blocking is to perform it in a separate thread. This way the main thread is free to continue working on other things as the blocking operation is blocking a completely different thread. This is the solution Node.js uses a Worker Thread.

To understand Workers, first, it’s necessary to understand how Node.js is structured.

When a Node.js process is launched, it runs:

  • One process
  • One thread
  • One event loop
  • One JS Engine Instance
  • One Node.js Instance

One process: a process is a global object that can be accessed anywhere and has information about what’s being executed at a time.

One thread: being single-threaded means that only one set of instructions is executed at a time in a given process.

One event loop: this is one of the most important aspects to understand about Node. It’s what allows Node to be asynchronous and have non-blocking I/O, — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible through callbacks, promises and async/await.

One JS Engine Instance: this is a computer program that executes JavaScript code.

One Node.js Instance: the computer program that executes Node.js code.

In other words, Node runs on a single thread, and there is just one process happening at a time in the event loop. One code, one execution, (the code is not executed in parallel). This is very useful because it simplifies how you use JavaScript without worrying about concurrency issues.

On the other hand, Worker threads have:

  • One process
  • Multiple threads
  • One event loop per thread
  • One JS Engine Instance per thread
  • One Node.js Instance per thread

As we can see in the following image:

Source https://nodesource.com/blog/worker-threads-nodejs/

Worker threads have isolated contexts. They exchange information with the main process using message passing, so we avoid the race conditions problem threads have! But they do live in the same process, so they use a lot less memory.

Main Process and Worker Threads

The worker_threads the module enables the use of threads that execute JavaScript in parallel. To access it:

const worker = require('worker_threads');

What is ideal, is to have multiple Node.js instances inside the same process. With Worker threads, a thread can end at some point and it’s not necessarily the end of the parent process. It’s not a good practice for resources that were allocated by a Worker to hang around when the Worker is gone — that’s a memory leak, and we don’t want that. We want to embed Node.js into itself, give Node.js the ability to create a new thread and then create a new Node.js instance inside that thread; essentially running independent threads inside the same process.

The best solution for CPU intensive computation in Node.js is to run multiple Node.js instances inside the same process, where memory can be shared and there would be no need to pass data via JSON. This is exactly what worker threads do in Node.js.

Isolate represents an isolated instance of the V8 engine and play important role for creating Worker threads. Isolates, as the name suggests, are completely closed to the outside world, so Isolates can run in parallel since they are different instances of V8 entirely.

What makes Worker Threads special:

  • ArrayBuffers to transfer memory from one thread to another
  • SharedArrayBuffer that will be accessible from either thread. It lets you share memory between threads (limited to binary data).
  • Atomics available, it lets you do some processes concurrently, more efficiently and allows you to implement conditions variables in JavaScript
  • MessagePort, used for communicating between different threads. It can be used to transfer structured data, memory regions and other MessagePorts between different Workers.
  • MessageChannel represents an asynchronous, two-way communications channel used for communicating between different threads.
  • WorkerData is used to pass startup data. An arbitrary JavaScript value that contains a clone of the data passed to this thread’s Worker constructor. The data is cloned as if using postMessage()

API

  • const { worker, parentPort } = require(‘worker_threads’) => The worker class represents an independent JavaScript execution thread and the parentPort is an instance of the message port
  • new Worker(filename) or new Worker(code, { eval: true }) => are the two main ways of starting a worker (passing the filename or the code that you want to execute). It’s advisable to use the filename in production.
  • worker.on(‘message’), worker/postMessage(data) => for listening to messages and sending them between the different threads.
  • parentPort.on(‘message’), parentPort.postMessage(data) => Messages sent using parentPort.postMessage() will be available in the parent thread using worker.on('message'), and messages sent from the parent thread using worker.postMessage() will be available in this thread using parentPort.on('message').

An Example: https://github.com/sandeepp2016/Nodejs-worker-threads-examples

Important points:

  • keep in mind that creating a Worker (like threads in any language) even though it’s a lot cheaper than forking a process, can also use too many resources depending on your needs. In that case, the docs recommend you create a pool of workers.
  • Don’t use Workers for parallelizing I/O operations.
  • Don’t think spawning Workers is cheap.
  • There are still some types of race conditions that can occur and this has to do with access to outside resources. For example, if two worker threads or processes are both trying to do their own thing and write to the same file, they can clearly conflict with each other and create problems.

Conclusion

  1. The main goal of Workers is to improve the performance on CPU-intensive operations, not I/O operations. So only use them if you need to do CPU-intensive tasks with large amounts of data.
  2. So Worker Threads will not help much with I/O-intensive work because asynchronous I/O operations are more efficient than Workers can be.
  3. You can share memory with worker threads. You can pass SharedArrayBuffer objects that are specifically meant for that.

References:

In The End

I hope you liked reading it. Feel free to give your feedback. Please share it with your friends if you liked it.

Note: Part2 is now published

--

--