Benefits of Stream when working with large data in Nodejs

Wednesday, 18/03/2020

Tram Ho

1. What is stream?

Streams are data sets – like arrays or strings. The difference is that streams may not be available at the same time, and their dimensions don’t necessarily fit into memory (will not cause memory overflow).

This makes streams really useful when working with large data or chunk data from outside.

However, Stream isn’t just about working with big data. They also provide the ability to combine code.

Just as we can type linux commands by combining smaller Linux commands, we can do exactly the same in Nodejs when using Stream using the pipe () method.

There are four basic types of threads in Node.js: Readable , Writable , Duplex , and Transform .

Readable is used for reading operations, for example, it is fs.createReadStream method.

Writable use for write operations. , for example that is the method fs.createWriteStream.

Duplex can read and write, for example TCP socket.

The Transform is essentially a Duplex that can be used to modify or transform data as it is written and read. An example of that is zlib.createGzip to compress data with gzip. You can think of a Transform as a function in which the input is the writable stream and the output is the readable stream.

2. Pipe method

For example when using pipe

readableSrc.pipe(writableDest)

1 2	readableSrc.pipe(writableDest)

In this simple line of code, we connect the pipe of the Readable Stream – the data source, to the input of the Writable Stream – the destination. Of course, both can be Duplex / Transform Stream.

In Linux, this mechanism is equivalent to the command

$ readableSrc | writableDest

In short, Pipe is a technique. With this technique, we provide the output of one Stream as input for another Stream. There is no limit to this activity, ie the process can still continue

a.pipe(b).pipe(c).pipe(d)

3. Benefits of using Stream when processing large data

Suppose we use the readFile method of Nodejs to read the file and then write to another file

var fs = require('fs');

var read_string = fs.readFile('big_file.txt', 'utf8', function (err, data) {
    if (err) {
        return console.error(err);
    }
    fs.writeFileSync('big_file2.txt', read_string);

var fs = require('fs');

var read_string = fs.readFile('big_file.txt', 'utf8', function (err, data) {

if (err) {

return console.error(err);

}

fs.writeFileSync('big_file2.txt', read_string);

In this case, before we write the file, we push the entire data of the readable file into the memory, which is easy to overflow and even if there is no memory overflow, wasting the memory. Remembering that is also an inefficient way.

Use Stream and pipe:

var fs = require('fs');
 
var readStream = fs.createReadStream('test.txt');
var writeStream = fs.createWriteStream('write_file.txt');

readStream.setEncoding('utf8');
readStream.pipe(writeStream);

var fs = require('fs');

var readStream = fs.createReadStream('test.txt');

var writeStream = fs.createWriteStream('write_file.txt');

readStream.setEncoding('utf8');

readStream.pipe(writeStream);

When using pipe , by default, fs.createReadStream will split the amount of data to transfer to writeStream , the default is 64 * 1024 (64KB)

Another way to record without using pipe:

var fs = require('fs');
 
var readStream = fs.createReadStream('big.file', { highWaterMark: 20 * 1024 });
readStream.setEncoding('utf8');
var writeStream = fs.createWriteStream('write_file.txt', {'flags':'a'});

readStream.on('data', function(chunk) {
   writeStream.write(chunk);
});

var fs = require('fs');

var readStream = fs.createReadStream('big.file', { highWaterMark: 20 * 1024 });

readStream.setEncoding('utf8');

var writeStream = fs.createWriteStream('write_file.txt', {'flags':'a'});

readStream.on('data', function(chunk) {

writeStream.write(chunk);

});

Here we break down the data by chunk of 20 * 1024. pipes are often used to directly transfer data from one stream to another so they cannot be customized in the process. So we should only use pipe when we don’t need to handle events.

Reference source: https://www.udemy.com/course/learn-node-js-complete-from-very-basics-to-advance

https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93/

Share the news now

Source : Viblo

Benefits of Stream when working with large data in Nodejs

1. What is stream?

2. Pipe method

3. Benefits of using Stream when processing large data

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers