Cleanup Suggestions and Column Statistics in Google Sheets
In late 2020, Google released a new feature to Sheets called Smart Clean-up.
Smart Clean-up has two main purposes:
- Cleanup Suggestions – Examine the dataset for potential problems, like duplicate rows in a table, extra spaces in the data, apply number formatting, or identifying commonly understood items that may be misspelled.
- Column Stats – View the most or least occurring items in a selected column, count the number of rows and empty cells, and determine the number of unique values in a column.
Let’s see a few examples of what is certain to be one of Google Sheets more distinguishing features.
Column Statistics is activated by clicking in a table, then selecting Data -> Column Stats.
We are presented with a panel on the right of the screen that displays some interesting information about our data.
We can immediately see how many times each item appears in the data by viewing the Count statistics.
We can also see a list of the most and least frequently occurring items in the data.
An interesting behavior occurs when an item is selected from the Frequency list, all matching items in the table are highlighted.
!!! BEWARE !!!
What is likely to be an undesirable behavior is the Column Stats inability to recognize header rows.
A method to educate Sheets as to the presence of a header row (or simply unwanted rows before the data) is to freeze the rows between the header row and the data.
The Frequency and Summary statistics will not refresh automatically, so you will need to close and reopen the Column Stats feature.
Unfortunately, the header row value remains in the column chart for Count and Distribution. Perhaps this will be corrected in a future update.
Analyzing Other Columns
With the Column Stats feature activated, you can analyze the stats of other columns by either clicking in a column of the data or by selecting the < and > buttons at the top of the Column Stats panel.
When selecting a column of numbers, we can see the Distribution of those values.
Hovering your mouse pointer over a column reveals the count of items that qualify in the defined range.
Selecting a column containing numbers reveals additional statistically oriented aggregations.
NOTE: Because the entire column in the worksheet is being analyzed, the Total Rows is likely to be inaccurate when compared with the actual number of used rows.
With any luck, updates to this feature will include the ability to define a specific range of cells instead of defaulting to the entire columns.
Cleanup Suggestions assists with the cleanup of data before analysis. This reduces the number of errors in the final report.
Smart Cleanup is activated by clicking in a table, then selecting Data -> Cleanup Suggestions.
This loads the Cleanup Suggestions panel to the right displaying any areas of the data that it believes may cause issues during the analysis phase.
We can see that we have duplicate rows in the data. If this is an issue, hovering over the entries in the list will reveal which items will be kept (green) and which will be discarded (red) if the REMOVE action is invoked.
We also see if there are any extraneous spaces before, after, or in the middle of the data.
Selecting the green checkmark button will accept all suggested modifications while selecting Trim All on a per-column basis allows for more focused correction.
When working with repetitive data, items that frequently occur but with minor variations in spelling will be flagged as Inconsistent Data.
By selecting the green checkmark, the suggested spelling will be applied to all flagged cells.
BEWARE: Not all instances of inconsistent data are discovered, even when a refresh is performed. This is likely due to the relative infancy of the feature and will no doubt discover more inconsistencies as the feature matures.
MASTER GOOGLE SHEETS
GET THE COMPLETE COURSE
Learn anytime that fits your schedule.
Complete the Challenges. Practice. Apply.