Remove Duplicates using Microsoft Excel

Removing duplicate items from a list, or duplicate records from a table is one of the most common tasks required of anyone cleaning “dirty” data.

How you remove duplicates is also a common topic brought up during a job interview that requires Excel knowledge.

Knowing how to remove duplicates can not only assist with data cleansing but may also be that “make it or break it” question that lands you your Excel dream job.

Excel has three main mechanisms to assist with duplicate removal.  Understanding the pros and cons of each method will assist with selecting the best method for a given situation.

It’s also important to understand that in special circumstances, one method may return different results than another method.

Let’s take a look at each of these methods.

We begin with a data set that is comprised of sales for agents in specific regions.

Before we start removing duplicates, we must ask ourselves, “What constitutes a duplicate?”

  • Should a row of the table be removed if the Region on one row matches the Region of another row?
  • Should a row of the table be removed if the Sales Agent on one row matches the Sales Agent of another row?
  • Should the entire row match in all aspects to another row to be a candidate for removal?
  • Should any combination of the above be required for removal?

We will go with the requirement that the entire row of the table must be the same as another row in the same table to be removed.

NOTE:  You need to decide whether to retain a copy of the original data set or replace it with the deduplicated result.

Method #1: Excel’s Remove Duplicate Feature

This method of data removal works in all versions of Excel.

Before invoking the Remove Duplicates feature, let’s make a copy of the data set by:

  1. clicking in the data and pressing CTRL-A two times,
  2. click CTRL-C to copy the selected data into memory,
  3. click in a new location on the sheet and press CTRL-V to paste the copy of the data into the new location.

To remove the duplicate rows from the copy of the data set:

  1. Click in the newly copied data set.
  2. Select Data (tab) -> Data Tools (group) -> Remove Duplicates.

  1. In the Remove Duplicates dialog box, ensure all the listed column headings and the option labeled “My data has headers” are checked and press OK.

The system reports back the number of duplicates that were removed along with the number of unique entries that remain.

Removing the duplicate rows based on a single column

If you wanted to remove all duplicated rows of the table based solely on the “Sales Agent” column, you would follow the above process but only check the box for “Sales Agent” in the Remove Duplicates dialog box.

This would result in the following “before and after” snapshot of the data.

NOTE: Only the row for the first encountered “Sales Agent” name is retained.  All rows that follow with the same “Sales Agent” name are removed.

CONS:

If new records are created in the original table that duplicates existing records, the deduplicated version will no longer accurately reflect the source.  The “old” list of unique records would need to be deleted and the Remove Duplicates process would need to be re-executed.

Excel basics course Leila Gharani

Excel Essentials for the Real World - Complete Course

Learn Excel from Scratch or Fill in the Gaps.

Become Confident.

Packed with Challenges & Files (Excel 2016, 2019, Office 365).

GET ACCESS

Method #2 – The Excel UNIQUE Function

This method of data removal works for subscribers of Office 365 and users of Excel for the Web.

We begin by making a copy of our header row from the original table of data.

Select the cell directly below the “Sales Agent” heading (in this case, cell E2) and enter the following formula:

=UNIQUE(A2:C20)

If you have formatted this table as a proper Excel Table, you can use the table’s name in place of the cell range reference.  This is easier to understand and easier to write.

=UNIQUE(TSales)

CONS:

The downside of this method is that the number styles (see the “Sales” column) do not carry forward to the result.  You need to format the cells to match if you need to retain that comma style.

PROS:

The UNIQUE function, as opposed to the Remove Duplicates feature, is dynamic.  If new records are created that duplicate existing records, the duplicates will be automatically removed.

Excel Dynamic Arrays Course Leila Gharani

Master Excel Functions in Office 365 & Office 2021 - Complete Course

Excel has Changed Forever! DON'T MISS OUT!

GET ACCESS

Method #3 – Power Query

This method of data removal works in all versions of Excel from version 2013 forward.

Power Query is a data cleansing tool available in Excel that can perform simple to highly complex data cleansing procedures on “dirty” data.

To use Power Query to remove the duplicates in the table, perform the following steps:

  1. Click in the data to be de-duplicated.
  2. Select Data (tab) -> Get & Transform (group) -> From Sheet.

NOTE: Older versions of Excel will display the button as “From Table/Range”.  This is the same feature only under a different name.

This will load the table into Power Query.

  1. Click any of the column heading and press CTRL-A to select all columns.

  1. Select Home (tab) -> Reduce Rows (group) -> Remove Rows -> Remove Duplicates.

You are left with only the unique rows.

  1. Load the results of the query back into the Excel spreadsheet by clicking the lower part of the Close & Load button (far left of Home ribbon) and selecting “Close & Load to…”.

  1. In the Import Data dialog box, select the option for Table and Existing Worksheet. Select a cell beside the original data (ex: cell E1 of the Data sheet) and click OK.

CONS:

This method is semi-dynamic, meaning you must refresh the query’s output to see the updated results.

This can be done by right-clicking on the query output and selecting Refresh or by clicking the Refresh button on the Data tab.

PROS:

Data can be set to automatically refreshed when one of 3 events occur:

  • The user opens the file (see Query Properties image below).
  • A timer is established to refresh the results every N-minutes (see Query Properties image below).
  • Data is changed in the source table (initiated via an event-driven macro using VBA).

Power Query course Leila Gharani

Master Excel Power Query Course

Beginner to Advanced (including M) 

Collect, Combine and Analyze Data with Ease - Create Pivot Tables with Data Model.

GET ACCESS

Did you catch the “mistake”?

Look closely at the Power Query output.

Notice that “Stevie Bridge” in “Asia” with “4378” in “Sales” is listed twice.

The reason this was “missed” is because Power Query is case-sensitive when it comes to analyzing data.

Both the Remove Duplicates feature and the UNIQUE function are case-insensitive; they don’t discriminate between differing cases.  Power Query will treat these two records as separate and distinct.

Fixing the “Error”

To force Power Query to remove the duplicate record it “missed”, we must add a step to the query that will standardize the casing of the agent names.

  1. In the Queries & Connections panel, either double-click the “TSales” query or right-click the “TSales” query and select Edit.

  1. In the Query Settings panel (right) click the step labeled “Changed Type”.

  1. Select the heading of the “Sales Agent” column.
  2. Capitalize the first letter of each word by selecting Transform (tab) -> Text Column (group) -> Format -> Capitalize Each Word.

NOTE: You can also do this by right-clicking the column heading and selecting Transform -> Capitalize Each Word.

  1. In the Insert Step notification, click insert to add the new step to the query instructions.

  1. Save the updated query back to the original output table by clicking Home (tab) -> Close & Load.

We are now in possession of only a single “Stevie Bridge” in “Asia” with “4378” in “Sales”.

Practice Workbook

Feel free to Download the Workbook HERE.

Excel Download Practice file

Learn Excel from a Microsoft MVP

Check out my bestselling Excel Courses

Learn anytime that fits your schedule.

Download files. Practice. Apply.

Visit Courses
Free Tutorials