How to use DataLoaders in GraphQL

Make your GraphQL queries more efficient by batching them

If you’re building a GraphQL powered API in Node, you can’t do it without DataLoader. This library lets you batch your queries and keep your API just as efficient as its REST predecessors. In this article we’ll just go over the raw mechanics of the library, without getting too bogged down in GraphQL.

What are DataLoaders?

Simply put, DataLoaders solve GraphQL’s “N+1 problem”. If you don’t know what that is, read this quick explanation. I’m going to assume you understand what that is from this point on, but to recap, it states:

For every 1 database query that returns N results, you will need to make N additional queries

That many queries is inefficient. We would be better off batching those N queries into 1 so instead of N+1 we would always just have 2. This is where DataLoaders come in.

DataLoaders: The overview

At the highest level, a DataLoader:

  1. Collects an array of keys during one tick of the event loop
  2. Hits the database once with all those keys
  3. Returns a promise which resolves an array of values

All you need to make a DataLoader is a batching function that takes in an array of keys, and resolves to an array of values. Both arrays must be the same length, otherwise it will break when it tries to turn them into a key/value store. Let’s work with the following DataLoader and fake DB:

const DataLoader = require('dataloader');
const fakeDB = ['Tom', 'Bo', 'Kate', 'Sara', 'Gene', 'Noel'];
const batchGetUserById = async (ids) => {
console.log(‘called once per tick:’, ids);
return ids.map(id => fakeDB[id — 1]);
};
const userLoader = new DataLoader(batchGetUserById);

We have an async function that iterates through a given array of ids and then returns an array of usernames. This is mocking out how a real program would await a result from the DB using an ORM. Remember, since it’s async, our function’s return value is automatically wrapped in a Promise. We then take the batch function and pass it into a new DataLoader.

Using a DataLoader with .load()

To actually use a DataLoader, you don’t call the batch function. Instead, you use the method .load() . That is how your DataLoader can collect all the keys it needs. Each .load() saves the key and returns a promise. At the tick of the event loop, it takes all the keys and then passes them into the batch function. The batch function resolves to its values which are then stored with the corresponding keys. Finally, each .load()’s promise resolves to the value of its given key.

A brief aside on the event loop

I keep mentioning ticks, so we should talk about the event loop. If you have 20-ish minutes, watch this video by Phillip Roberts . If not, suffice it to say that each tick of the event loop is what kicks the next resolved async callback into the main Call Stack.

All you really need to know is that DataLoader is using this event loop ticking process as a way of marking when to fire the batch function. It does this because by that point, all the .load() methods will have been fired off for a given query, meaning it knows exactly how many keys to check against the DB.

Back to DataLoaders

OK, now look at this example code:

const DataLoader = require('dataloader');
const fakeDB = ['Tom', 'Bo', 'Kate', 'Sara', 'Gene', 'Noel'];
const batchGetUserById = async (ids) => {
console.log('called once per tick:', ids);
return ids.map(id => fakeDB[id - 1]);
};
const userLoader = new DataLoader(batchGetUserById);
console.log('nEvent Loop Tick 1');
userLoader.load(1);
userLoader.load(2).then((user) => {
console.log('Here is the user: ', user);
});
setTimeout(() => {
console.log('nTick 2');
userLoader.load(3);
userLoader.load(4);
}, 1000);
setTimeout(() => {
console.log('nTick 3');
userLoader.load(5);
userLoader.load(6);
}, 2000);

If you run that, you’ll get:

Event Loop Tick 1
called once per tick: [ 1, 2 ]
Here is the user: Bo
Tick 2
called once per tick: [ 3, 4 ]
Tick 3
called once per tick: [ 5, 6 ]

As you can see, our batch function really is only called once per tick. Remember, setTimeout is async, so it kicks us into the next tick every time.

Also notice that even though the batch functions are called with arrays, you can see that .load(2) resolves to the proper user of Bo and nothing else. Each .load() method will always return the single value. It will likely be a single array in the real world, so lets try for a more realistic example:

Make a Pseudo GraphQL Resolver

Take this GraphQL query:

query {
authors
books {
title
}
}
}

Our loader would be in the books field in this example, and it would find all the books associated with an author_id. Let’s use a for loop to simulate 3 different author parent objects with ids 1, 2, and 3:

const DataLoader = require('dataloader');
const fakeBooksDB = [
{ title: 'book 1', author_id: 1 },
{ title: 'book 2', author_id: 2 },
{ title: 'book 3', author_id: 3 },
{ title: 'book 4', author_id: 3 },
];
const batchGetBooksById = async (ids) => {
const books = ids.map((authorId) => {
return fakeBooksDB
.filter(book => book.author_id === authorId);
});
console.log('I only get fired once');
return books;
};
const bookLoader = new DataLoader(batchGetBooksById);
// loop simulates 3 author parent resolvers,
for (let i = 1; i <= 3; i++) {
bookLoader.load(i).then((res) => {
console.log(`nAuthor #${i} books:`);
console.log(res);
});
}

If you run that you’ll get something like:

I only get fired once
Author #1 books:
[ { title: 'book 1', author_id: 1 } ]
Author #2 books:
[ { title: 'book 2', author_id: 2 } ]
Author #3 books:
[ { title: 'book 3', author_id: 3 },
{ title: 'book 4', author_id: 3 } ]

Since this all occured in the same tick of an event loop, our batch function only fired once. And again, each .load() resolved one value, an array with either one or two books.

Caching

While the batching is the main focus, DataLoaders also have caching. That means if you ever call .load() with the same key twice, it’ll look up the value in the DataLoader’s key/value store without firing the batch function again:

const DataLoader = require('dataloader');
const fakeDB = ['Tom', 'Bo', 'Kate', 'Sara'];
const batchGetUserById = async (ids) => {
console.log('I ran!');
return ids.map(id => fakeDB[id - 1]);
};
const userLoader = new DataLoader(batchGetUserById);
console.log('nEvent Tick 1');
userLoader.load(1);
userLoader.load(2);
setTimeout(() => {
console.log('nEvent Tick 2');
userLoader.load(2).then((res) => {
console.log('cached res: ', res);
});
}, 1000);

which outputs:

Event Tick 1
I ran!
Event Tick 2
cached res: Bo

The batch function wasn’t run, it just looked up the value.

The .loadMany() Method

Sometimes you do want to access more than one key at a time. In those cases, use the .loadMany() method:

userLoader.loadMany([3, 4]).then((res) => {
console.log('Returns an array of values: ', res);
});

If you plugged this in to an example above, you’d get Returns an array of values: [ ‘Kate’, ‘Sara’ ] . Neat right?

Load away!

There you have it, the fundamentals DataLoader. For more info, check out this great article. It covers these topics with more concrete examples. I hope this helped, and if you have any questions don’t hesitate to ask below.

Happy coding everyone,

Mike