Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 267 Vote(s) - 3.52 Average
  • 1
  • 2
  • 3
  • 4
  • 5
What is the fastest way to write a lot of documents to Firestore?

#1
I need to write a large number of documents to Firestore.

What is the fastest way to do this in Node.js?
Reply

#2
As noted in a comment to the OP, I've had the opposite experience when writing documents to Firestore inside a Cloud Function.

**TL;DR: Parallel individual writes are over 5x slower than parallel batch writes when writing 1200 documents to Firestore.**

The only explanation I can think of for this, is some sort of bottleneck or request rate limiting happening between Google cloud functions and Firestore. It's a bit of a mystery.

Here's the code for both methods I benchmarked:

const functions = require('firebase-functions');
const admin = require('firebase-admin');


admin.initializeApp();
const db = admin.firestore();


// Parallel Batch Writes
exports.cloneAppBatch = functions.https.onCall((data, context) => {

return new Promise((resolve, reject) => {

let fromAppKey = data.appKey;
let toAppKey = db.collection('/app').doc().id;


// Clone/copy data from one app subcollection to another
let startTimeMs = Date.now();
let docs = 0;

// Write the app document (and ensure cold start doesn't affect timings below)
db.collection('/app').doc(toAppKey).set({ desc: 'New App' }).then(() => {

// Log Benchmark
functions.logger.info(`[BATCH] 'Write App Config Doc' took ${Date.now() - startTimeMs}ms`);


// Get all documents in app subcollection
startTimeMs = Date.now();

return db.collection(`/app/${fromAppKey}/data`).get();

}).then(appDataQS => {

// Log Benchmark
functions.logger.info(`[BATCH] 'Read App Data' took ${Date.now() - startTimeMs}ms`);


// Batch up documents and write to new app subcollection
startTimeMs = Date.now();

let commits = [];
let bDocCtr = 0;
let batch = db.batch();

appDataQS.forEach(docSnap => {

let doc = docSnap.data();
let docKey = docSnap.id;
docs++;

let docRef = db.collection(`/app/${toAppKey}/data`).doc(docKey);

batch.set(docRef, doc);
bDocCtr++

if (bDocCtr >= 500) {
commits.push(batch.commit());
batch = db.batch();
bDocCtr = 0;
}

});

if (bDocCtr > 0) commits.push(batch.commit());

Promise.all(commits).then(results => {
// Log Benchmark
functions.logger.info(`[BATCH] 'Write App Data - ${docs} docs / ${commits.length} batches' took ${Date.now() - startTimeMs}ms`);
resolve(results);
});

}).catch(err => {
reject(err);
});

});

});


// Parallel Individual Writes
exports.cloneAppNoBatch = functions.https.onCall((data, context) => {

return new Promise((resolve, reject) => {

let fromAppKey = data.appKey;
let toAppKey = db.collection('/app').doc().id;


// Clone/copy data from one app subcollection to another
let startTimeMs = Date.now();
let docs = 0;

// Write the app document (and ensure cold start doesn't affect timings below)
db.collection('/app').doc(toAppKey).set({ desc: 'New App' }).then(() => {

// Log Benchmark
functions.logger.info(`[INDIVIDUAL] 'Write App Config Doc' took ${Date.now() - startTimeMs}ms`);


// Get all documents in app subcollection
startTimeMs = Date.now();

return db.collection(`/app/${fromAppKey}/data`).get();

}).then(appDataQS => {

// Log Benchmark
functions.logger.info(`[INDIVIDUAL] 'Read App Data' took ${Date.now() - startTimeMs}ms`);


// Gather up documents and write to new app subcollection
startTimeMs = Date.now();

let commits = [];

appDataQS.forEach(docSnap => {

let doc = docSnap.data();
let docKey = docSnap.id;
docs++;

// Parallel individual writes
commits.push(db.collection(`/app/${toAppKey}/data`).doc(docKey).set(doc));

});

Promise.all(commits).then(results => {
// Log Benchmark
functions.logger.info(`[INDIVIDUAL] 'Write App Data - ${docs} docs' took ${Date.now() - startTimeMs}ms`);
resolve(results);
});

}).catch(err => {
reject(err);
});

});

});


The specific results were (average of 3 runs each):

**Batch Writes:**

Read 1200 docs - 2.4 secs / Write 1200 docs - 1.8 secs

**Individual Writes:**

Read 1200 docs - 2.4 secs / Write 1200 docs - 10.5 secs


**Note:** These results are a lot better than what I was getting the other day - maybe Google was having a bad day - but the relative performance between batch and individual writes remains the same. Would be good to see if anyone else has had a similar experience.
Reply

#3
I came across this little library that implements the parallelized batch operations @DG mentioned:

[To see links please register here]

. It provides a simple interface, so you can keep adding to the batch without worrying about the 500 op limit.
Reply

#4
**TL;DR: The fastest way to perform bulk date creation on Firestore is by performing parallel individual write operations.**

Writing 1,000 documents to Firestore takes:

1. `~105.4s` when using sequential individual write operations
2. `~ 2.8s` when using (2) batched write operations
3. `~ 1.5s` when using parallel individual write operations


---

There are three common ways to perform a large number of write operations on Firestore.

1. Perform each individual write operation in sequence.
2. Using batched write operations.
3. Performing individual write operations in parallel.

We'll investigate each in turn below, using an array of randomized document data.

---

### Individual sequential write operations

This is the simplest possible solution:

```js
async function testSequentialIndividualWrites(datas) {
while (datas.length) {
await collection.add(datas.shift());
}
}
```

We write each document in turn, until we've written every document. And we wait for each write operation to complete before starting on the next one.

Writing 1,000 documents takes about 105 seconds with this approach, so the throughput is roughly **10 document writes per second**.

---

### Using batched write operations

This is the most complex solution.

```js
async function testBatchedWrites(datas) {
let batch = admin.firestore().batch();
let count = 0;
while (datas.length) {
batch.set(collection.doc(Math.random().toString(36).substring(2, 15)), datas.shift());
if (++count >= 500 || !datas.length) {
await batch.commit();
batch = admin.firestore().batch();
count = 0;
}
}
}
```

You can see that we create a `BatchedWrite` object by calling `batch()`, fill that until its maximum capacity of 500 documents, and then write it to Firestore. We give each document a generated name that is relatively likely to be unique (good enough for this test).

Writing 1,000 document takes about 2.8 seconds with this approach, so the throughput is roughly **357 document writes per second**.

That's quite a bit faster than with the sequential individual writes. In fact: many developers use this approach because they assume it is fastest, but as the results above already showed this is not true. And the code is by far the most complex, due to the size constraint on batches.

---

### Parallel individual write operations

The Firestore documentation says this about the [performance for adding lots of data](

[To see links please register here]

):

> For bulk data entry, use a server client library with parallelized individual writes. Batched writes perform better than serialized writes but not better than parallel writes.

We can put that to the test with this code:

```js
async function testParallelIndividualWrites(datas) {
await Promise.all(datas.map((data) => collection.add(data)));
}
```

This code kicks of the `add` operations as fast as it can, and then uses `Promise.all()` to wait until they're all finished. With this approach the operations can run in parallel.

Writing 1,000 document takes about 1.5 seconds with this approach, so the throughput is roughly **667 document writes per second**.

The difference isn't nearly as great as between the first two approaches, but it still is over 1.8 times faster than batched writes.

---

A few notes:

* You can find the full code of this test on [Github]().
* While the test was done with Node.js, you're likely to get similar results across all platforms that the Admin SDK supports.
* Don't perform bulk inserts using client SDKs though, as the results may be very different and much less predictable.
* As usual the actual performance depends on your machine, the bandwidth and latency of your internet connection, and many other factors. Based on those you may see differences in the differences too, although I expect the ordering to remain the same.
* If you have any outliers in your own tests, or find completely different results, leave a comment below.
* Batched writes are atomic. So if you have dependencies between the documents and all documents must be written, or none of them must be written, you should use a batched write.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through