Learn How to Use Group in MongoDB Aggregation Pipeline | by Rishabh Rawat | Mar, 2022

Before we jump into the aggregation pipeline and the group stage, we need some data to work with. I’m taking an example Movies collection for understanding the concept here. Again, there’ll be links to the playground for each query throughout the article.

To find the distinct items in a collection we can use the group stage on any field that we want to group by. This field will be unique in the output. Let’s group the movies by their release year:

Movies are grouped by their release year

Similar to grouping by a single field, we might want to group the data with more than one field as per our use case. MongoDB aggregation pipeline allows us to group by as many fields as we want.

The combination of release year and runtime acts as the unique key

There are a lot of accumulator functions available in the group stage which can be used to aggregate the data. They help us carry out some of the most common operations on the grouped data. Let’s take a look at some of them:

$count accumulator

It is used to count the number of documents in the group. This can be combined with our group-by query to get the total number of documents in the group.

Count movies released in a particular year

$sum accumulator

We can use the $sum accumulator to add up all the values ​​in a field. Let’s group the movies by their rating and sum up the reviews to understand if there’s a correlation between movie rating and the number of reviews.

$avg accumulator

We might want to examine which year has the highest average movies rating for analytical purposes. Let’s see how we can get those stats from our data using the $avg accumulator:

$push accumulator

We want to look at all the ratings for every release year. We can use this data to analyze which year had major variations in the movie ratings. Let’s use the $push accumulator for this job:

$addToSet accumulator

You can consider this to be like the $push accumulator. $addToSet only adds the value to the array if it doesn’t already exist. This is the only difference between the two accumulators.

$min accumulator

Let’s say we want to find out successful release years for the movies. A year is considered successful if all the movies released during that year have a rating greater than 7. Let’s use the $min accumulator to get the successful years:

  • minRating field maintains the minimum rating for each release year.
  • Finally, a $match stage to filter out the years which don’t have a minimum rating greater than 7.

$first accumulator

This accumulator is slightly different from the $first array operator which gives the first element in an array. For each grouped document, $first accumulator gives us the first one.

Combine $group stage with $project

The movie rating is a floating-point number. We’ll round that off to the nearest integer to get the movie rating as a whole number. Finally, we’ll group movies by their modified ratings.

  • Then, a $group stage to group the movies by their modified rating.

Sort the results with $sort

The year with the highest movie minutes might give us some insights on movies production and its correlation with audience attention spans over the years. So let’s understand how to achieve that with the following query:

$group vs $project stage

We have an n:1 relationship between input and output documents in the $group stage. But, we have a 1:1 relationship in the $project stage.

$group stage in MongoDB aggregation pipeline
$project stage in MongoDB aggregation pipeline

$group stage has a limit of 100 megabytes of RAM

If you’re working with a massive dataset and you receive an error during group stage execution, you might be hitting the memory limit. If you want to increase it, use the allowDiskUse option to enable the $group stage to write to temporary files on disk.

That’s it! This was the introduction to how the group stage works in the MongoDB aggregation pipeline. We looked at how we can group data using single fields (distinct count), multiple fields, sort them, how we can carry out complex computations by adding conditions to the $group stage, and the subtle difference between $group and $project stage.

Want to Connect? Here's an excerciseTo solidify your understanding, I have curated a couple of questions related to what we’ve learned in this article. You can download the exercise PDF below. It contains MongoDB playground links containing the answers to all the questions:5 quick questions on the group stage with answers

Leave a Comment