Power Query in Excel has a tool that can pull data from PDF files.

It can grab data from tables or take all the data from a page, even if it’s not in a table.

Featured Course

Master Excel Power Query – Beginner to Pro

Power Query is essential for Excel users who work with lots of data. This course teaches you how to use Excel in Power Mode and create meaningful reports with far less effort.
Learn More
Power Query Course cover

How to Convert PDF to Excel using Power Query

Power Query allows you to connect to various data sources, including PDF files. Let’s find out how this works.

Imagine we have a safety report with performance data stored on the second page in a table format.

We could try to just highlight the relevant data in the PDF, then copy and paste the data into Excel, but it’s likely not to go as well as we had hoped.

I wouldn’t say this was a big success.

We could spend time rearranging and fixing the pasted data, but we don’t have that kind of time and it would negate the second requirement of an easily updatable Excel report.

Using Power Query to Import PDF Tables

Power Query in Excel has a built-in connector used to extract information from PDF files.  To use the connector, we perform the following steps (the example uses Excel 365).

  1. Start Excel to create a blank workbook.
  2. Select Data (tab) -> Get & Transform (group) -> Get Data -> From File -> From PDF.
  1. Browse to the folder that contains the PDF, select the PDF, and click IMPORT.

The Navigator window displays a list of every “proper” table in the PDF as well as every page.

If the needed data is in a table, you can select either the table or the page that holds the table.  In most cases, it is best to select the table as it will negate the need to later sanitize the page of unwanted information.

If you are unsure about which listed table contains the needed information, you can single-click any item in the left-hand list to display a preview of the item’s contents.

  1. Select the table or page and click Transform Data.

The data will be brought into the Power Query Editor where it can be cleaned and/or modified to fit your output needs.

  1. Select Close & Load to send the extracted data to an Excel Table.

The new table of PDF information can be used to drive other Excel objects, like charts and Pivot Tables.

If the PDF were to be updated with additional years of statistics, the user needs to merely right-click on the extracted table and select Refresh to receive the updated information.

Featured Course

Fundamentals of Financial Analysis

Whether you’re a newbie or have an MBA in Finance, you’ll FINALLY “get” the big picture. This comprehensive course will equip you with these critical skills – even if you’ve never taken a finance or accounting class.
Learn More
Financial Analysis Course Cover

Example: Import Data from Quarterly Finance Report

Using the Q2 2020 Financial Summary from Tesla, we discover a table on page 4 that has information we need for an Excel report.

If we attempt to highlight and copy/paste the table into Excel, you have probably already guessed what the results will be.

Less than stellar. Let’s use Power Query and the PDF connector to solve the problem.

As witnessed in the previous example, we are presented with a list of every table and every page in the PDF file.

We saw that the table we needed exists on page 4, so we select that page and click Transform Data.

The data did not come in as cleanly as we had hoped, but it’s a good start.

Using common Power Query transformation tools (like Merge ColumnsTrim TextRename ColumnsRemove Blank Rows, etc.) we can quickly and easily fix the improper areas of the table.

Once we Close & Load the data to an Excel table, we can produce any form of report our hearts desire.

Notice that Power Query was smart enough to recognize that numbers stored within parentheses represent negative numbers.  This relieves us of the need to convert those parenthetical numbers to proper negative numbers.

Featured Course

Automate With Power Query – Recipes to solve business data challenges

Do you want to become more confident using Power Query and automate your entire data workflow? Join this course and learn from real-world scenarios.
Learn More

Example: Import a Table of Contents

We have a PDF course manual for a training course called Master New Excel Functions in Office 365 (you should check it out; it will change your Excel life), and we want to extract the table of contents from the PDF file.

The table of contents spans pages 3 through 5 in the PDF file.  This means we have three separate tables that need to be stacked as a single table.

If we select each table separately, we will end up with three queries that have to be appended to one another.

To make the process simpler, we will select the entire PDF, then click the Transform Data button.

With the entire PDF file having been brought into the Power Query Editor, we can perform a series of transformations to produce the needed combined table of contents.  These transformations can include:

  • Filter for tables (discard entire pages)
  • Remove all tables except the tables containing table of content information
  • Remove unneeded columns
  • Combine (append) the tables into a single table
  • Filter unneeded page data
  • Split the topic number from the topic description
  • Rename the columns
  • Set data types

Our saved output to Excel appears as follows:

That’s it! That’s how easy it is, to convert a PDF to Excel.

If you’d like to learn more about the Power of Power Query in Excel, check out my complete course.

Leila Gharani

I'm a 6x Microsoft MVP with over 15 years of experience implementing and professionals on Management Information Systems of different sizes and nature.

My background is Masters in Economics, Economist, Consultant, Oracle HFM Accounting Systems Expert, SAP BW Project Manager. My passion is teaching, experimenting and sharing. I am also addicted to learning and enjoy taking online courses on a variety of topics.