We begin with a data set that is comprised of sales for agents in specific regions.
Before we start removing duplicates, we must ask ourselves, “What constitutes a duplicate?”
- Should a row of the table be removed if the Region on one row matches the Region of another row?
- Should a row of the table be removed if the Sales Agent on one row matches the Sales Agent of another row?
- Should the entire row match in all aspects to another row to be a candidate for removal?
- Should any combination of the above be required for removal?
We will go with the requirement that the entire row of the table must be the same as another row in the same table to be removed.
NOTE: You need to decide whether to retain a copy of the original data set or replace it with the deduplicated result.
Method #1: Excel’s Remove Duplicate Feature
This method of data removal works in all versions of Excel.
Before invoking the Remove Duplicates feature, let’s make a copy of the data set by:
- clicking in the data and pressing CTRL-A two times,
- click CTRL-C to copy the selected data into memory,
- click in a new location on the sheet and press CTRL-V to paste the copy of the data into the new location.
To remove the duplicate rows from the copy of the data set:
- Click in the newly copied data set.
- Select Data (tab) -> Data Tools (group) -> Remove Duplicates.
- In the Remove Duplicates dialog box, ensure all the listed column headings and the option labeled “My data has headers” are checked and press OK.
The system reports back the number of duplicates that were removed along with the number of unique entries that remain.
Removing the duplicate rows based on a single column
If you wanted to remove all duplicated rows of the table based solely on the “Sales Agent” column, you would follow the above process but only check the box for “Sales Agent” in the Remove Duplicates dialog box.
This would result in the following “before and after” snapshot of the data.
NOTE: Only the row for the first encountered “Sales Agent” name is retained. All rows that follow with the same “Sales Agent” name are removed.
If new records are created in the original table that duplicates existing records, the deduplicated version will no longer accurately reflect the source. The “old” list of unique records would need to be deleted and the Remove Duplicates process would need to be re-executed.
Excel Essentials for the Real World
Method #2 – The Excel UNIQUE Function
This method of data removal works for subscribers of Office 365 and users of Excel for the Web and Excel 2021.
We begin by making a copy of our header row from the original table of data.
Select the cell directly below the “Sales Agent” heading (in this case, cell E2) and enter the following formula:
If you have formatted this table as a proper Excel Table, you can use the table’s name in place of the cell range reference. This is easier to understand and easier to write.
The downside of this method is that the number styles (see the “Sales” column) do not carry forward to the result. You need to format the cells to match if you need to retain that comma style.
The UNIQUE function, as opposed to the Remove Duplicates feature, is dynamic. If new records are created that duplicate existing records, the duplicates will be automatically removed.
Master NEW Excel Functions in Office 365 & Office 2021
Method #3 – Power Query
This method of data removal works in all versions of Excel from version 2013 onward.
Power Query is a data cleansing tool available in Excel that can perform simple to highly complex data cleansing procedures on “dirty” data.
To use Power Query to remove the duplicates in the table, perform the following steps:
- Click in the data to be de-duplicated.
- Select Data (tab) -> Get & Transform (group) -> From Sheet.
NOTE: Older versions of Excel will display the button as “From Table/Range”. This is the same feature only under a different name.
This will load the table into Power Query.
- Click any of the column heading and press CTRL-A to select all columns.
- Select Home (tab) -> Reduce Rows (group) -> Remove Rows -> Remove Duplicates.
You are left with only the unique rows.
- Load the results of the query back into the Excel spreadsheet by clicking the lower part of the Close & Load button (far left of Home ribbon) and selecting “Close & Load to…”.
- In the Import Data dialog box, select the option for Table and Existing Worksheet. Select a cell beside the original data (ex: cell E1 of the Data sheet) and click OK.
This method is semi-dynamic, meaning you must refresh the query’s output to see the updated results.
This can be done by right-clicking on the query output and selecting Refresh or by clicking the Refresh button on the Data tab.
Data can be set to automatically refreshed when one of 3 events occur:
- The user opens the file (see Query Properties image below).
- A timer is established to refresh the results every N-minutes (see Query Properties image below).
- Data is changed in the source table (initiated via an event-driven macro using VBA).
Master Excel Power Query – Beginner to Pro
Did you catch the “mistake”?
Look closely at the Power Query output.
Notice that “Stevie Bridge” in “Asia” with “4378” in “Sales” is listed twice.
The reason this was “missed” is because Power Query is case-sensitive when it comes to analyzing data.
Both the Remove Duplicates feature and the UNIQUE function are case-insensitive; they don’t discriminate between differing cases. Power Query will treat these two records as separate and distinct.
Fixing the “Error”
To force Power Query to remove the duplicate record it “missed”, we must add a step to the query that will standardize the casing of the agent names.
- In the Queries & Connections panel, either double-click the “TSales” query or right-click the “TSales” query and select Edit.
- In the Query Settings panel (right) click the step labeled “Changed Type”.
- Select the heading of the “Sales Agent” column.
- Capitalize the first letter of each word by selecting Transform (tab) -> Text Column (group) -> Format -> Capitalize Each Word.
NOTE: You can also do this by right-clicking the column heading and selecting Transform -> Capitalize Each Word.
- In the Insert Step notification, click insert to add the new step to the query instructions.
- Save the updated query back to the original output table by clicking Home (tab) -> Close & Load.
We are now in possession of only a single “Stevie Bridge” in “Asia” with “4378” in “Sales”.
Feel free to Download the Workbook HERE.
I'm a 6x Microsoft MVP with over 15 years of experience implementing and professionals on Management Information Systems of different sizes and nature.
My background is Masters in Economics, Economist, Consultant, Oracle HFM Accounting Systems Expert, SAP BW Project Manager. My passion is teaching, experimenting and sharing. I am also addicted to learning and enjoy taking online courses on a variety of topics.