Google Sheets QUERY Function Explained
The QUERY function is one of the most powerful functions in Google Sheets.
QUERY can solve both simple problems as well as some of the more complex data analysis problems.
It is this flexibility that can make it difficult to understand when first encountered. Starting with the “wrong” tutorial where more complex situations are discussed can easily lead to confusion and abandonment of the desire to learn the function.
This post will introduce you to the QUERY function in small bites; learning the set of rules in easy to understand ways that will spark excitement and desire to learn more about this amazing function.
The QUERY Function’s Clauses
Let’s begin with an introduction to the clauses used by the QUERY function. These clauses are like the clauses used by SQL (Structured Query Language).
- Select – Selects which columns to return, and in what order. If omitted, all the table’s columns are returned, in their default order.
- Where – Returns only rows that march a condition. If omitted, all rows are returned.
- Group by – Aggregates values across rows.
- Pivot – Transforms distinct values in columns into new columns.
- Order by – Sorts rows by values in columns.
- Limit – Limits the number of returned rows.
- Offset – skips a given number of first rows.
- Label – Sets column labels.
- Format – Formats the values of certain columns using given formatting patterns.
- Options – Sets additional options.
Don’t worry if you have limited or no knowledge of SQL, it’s not a requirement to understand the QUERY function. What’s important is that you respect the order in which the clauses are used.
You don’t have to use all these clauses each time you construct a QUERY formula. The first clause SELECT is the only clause you will likely use in most, if not all your cases.
If you are planning on making the QUERY function a part of your workflow, I recommend studying the Query Language Reference to gain deeper insight into the uses and operations of these clauses.
Studying this reference guide will reveal many functional gems, like the ability to integrate aggregations, like SUM, AVERAGE, MAX, MIN, and COUNT into your queries.
You can also perform scalar functions, like YEAR, MONTH, and DAY as well as arithmetic operations (add, subtract, multiply, divide).
Writing our First Query
We have our data sitting on another sheet named “Data” starting in cell A1.
The syntax for the QUERY function is as follows:
Pointing to the Data
Our first argument is the source of the data. We can select the exact range that our data occupies…
…but this would not allow for the inclusion of additional data. Instead, we will select the full columns.
Forming the Query
This is the part where you use the above-mentioned clauses. The query’s clauses must be enclosed within clauses.
Our first query will select everything. “Everything” can be represented by an asterisk (*).
=QUERY(Data!A:E, “select *”,
Identifying the Header Row(s)
The final argument defines how many rows (starting with the first row) in the data make up the header row(s).
If this argument is omitted or set to -1, the headers will be guessed based on the content.
As our data has but a single header row, we will ensure the QUERY function doesn’t make an incorrect determination about our headings. We’ll use a “1” for this argument.
=QUERY(Data!A:E, “select *”, 1)
The result is a return of all columns and all rows of data from the data source.
If we want to extract specific columns from data in the same workbook, we need to identify the needed columns by column letter (ex: A, B, C).
If you are using the QUERY function within the IMPORTRANGE function to import data from a different workbook, we need to identify the needed columns by column numbers (ex: Col1, Col2, Col3).
We can get creative by placing the column identifiers in our desired order, this way we are not bound by the order of columns in the source data.
=QUERY(Data!A:E, “select A, C, B, E”, 1)
Performing Query Aggregations
In our data, we have a column named “Scenario” that has three possible values: “Actual”, “Budget”, and “Outlook1”.
Suppose we wish to combine all rows for each of these three scenarios and sum the “Revenue” and “Cash” columns. In essence, create a summary report that aggregates by scenario.
Let’s write the following formula:
=QUERY(Data!A:E, “select A, sum(D), sum(E)” )
We would expect to retrieve the contents of Column “A” and aggregate Columns “D” & “E”. Unfortunately, the results are not as expected.
This is because we are returning every row in Column “A” separately. There is nothing to aggregate when each row remains separate.
The key is to group the rows by scenario, thereby combining all “Actual” rows, all “Budget” rows together, and all “Oulook1” rows together.
The clause that is perfect for this grouping action is (you guessed it) GROUP BY. Let’s update the formula to read with a GROUP BY clause.
=QUERY(Data!A:E, “select A, sum(D), sum(E) group by A” )
Notice that our values are not formatted. We could utilize the FORMAT clause to apply formatting to the results, thereby automating the formatting step and potentially making it dynamic.
=QUERY(Data!A:E, “select A, sum(D), sum(E) group by A format sum(D) ‘$#,##0’, sum(E) ‘$#,##0’”)
The formatting codes used are almost identical to the number formatting codes used by Sheets’ number formatting. This can also be used for formatting dates and times.
Dealing with Empty Rows
Notice in the above image, we have a blank row between the heading and the data. This is because we have included all the empty rows below our data. These rows are being treated as an aggregated scenario of “nothing, nothing, and nothing”.
To ensure we don’t include empty rows, we will filter out the empty rows using the WHERE clause.
A “where clause” makes me think of a very hairy man giving out presents once a year. (dumb joke of the year? Possibly.)
We want to ensure that we only include rows that have content; in other words, rows that are not empty (null).
Because a WHERE clause must come before a GROUP BY clause, we insert the following instruction into our existing formula.
=QUERY(Data!A:E, “select A, sum(D), sum(E) where A is not null group by A format sum(D) ‘$#,##0’, sum(E) ‘$#,##0’”)
We see that the blank row below the headings has been removed.
Creating Cross-Tabular Output
Suppose we only wish to have the sum of “Revenue” by “Scenario”, but we want to break each “Scenario” into its separate “Divisions”?
This is where the PIVOT clause can be of use. The PIVOT clause creates a series of distinct columns by which aggregations can be applied.
PIVOT comes after the GROUP BY clause, so we can write a formula like the following.
=QUERY(Data!A:E, “select A, sum(D) where A is not null group by A pivot C format sum(D) ‘$#,##0’”)
Sorting the Results
If you need to sort the results, you can implement an ORDER BY clause.
Suppose we want to produce a list of “Divisions” aggregated by “Cash” using sum where the “Scenario” is “Actual”.
=QUERY(Data!A:E, “select C, sum(E) where A = ‘Actual’ group by C format sum(E) ‘$#,##0’”)
The results are accurate, but we would like the “Divisions” listed in descending order by the sum of “Cash”. Add the following ORDER BY clause after the GROUP BY clause.
NOTE: The default order is Ascending, but since we wish to have the results in Descending order, we place the “desc” modifier at the end of the ORDER BY clause.
=QUERY(Data!A:E, “select C, sum(E) where A = ‘Actual’ group by C order by sum(E) desc format sum(E) ‘$#,##0’”)
Customizing the Aggregation Headings
It is likely that having a heading named “sum Cash” is going to be less than desirable. We can customize the headings using the LABEL clause.
LABEL comes before FORMAT, so we modify the above formula as follows.
=QUERY(Data!A:E, “select C, sum(E) where A = ‘Actual’ group by C order by sum(E) desc label sum(E) ‘Total Cash’ format sum(E) ‘$#,##0’”)
Limiting the Results
Suppose you only wish to retain the top 2 entries based on the “Cash” aggregations: in this case, “Productivity” and “Games”?
This is where the LIMIT clause comes into play. LIMIT comes after ORDER BY, so we can add the following instruction to our query.
=QUERY(Data!A:E, “select C, sum(E) where A = ‘Actual’ group by C order by sum(E) desc limit 2 label sum (E) ‘Total Cash’ format sum(E) ‘$#,##0’”)
“Where can I learn more?”
This post is but a glimpse into the workings of the QUERY function.
If you would like to learn more about QUERY and many other amazing Google Sheets functions, check out my course title “Google Sheets – The Comprehensive Masterclass”.
There is so much more to do with the QUERY function. Let this post serve as the appetizer to what will surely be a stellar four-course meal.
Feel free to get your own copy of the file HERE.
MASTER GOOGLE SHEETS
GET THE COMPLETE COURSE
Learn anytime that fits your schedule.
Complete the Challenges. Practice. Apply.