Do you need to extract and combine table data from multiple PDF files into one Excel table? This guide will show you how to do it easily.

Loading the Data Into Power Query

  • Open Excel and create a new workbook.
  • Go to the Data tab, then click Get Data > From File > From Folder.
  • Find your PDF folder and click Open.
  • You’ll see the metadata for each file in the folder. Click Transform Data to load it into Power Query.

Power Query will show the metadata in the Power Query Editor window, ready for you to work with.

Featured Course

Master Excel Power Query – Beginner to Pro

Power Query is essential for Excel users who work with lots of data. This course teaches you how to use Excel in Power Mode and create meaningful reports with far less effort.
Learn More
Power Query Course cover

Extracting the PDF File Contents

We need the contents of each PDF file, not just the metadata.

  • Click the Expand Binaries button in the top-right of the Content column.
  • In the Combine Binaries dialog box, you will see all tables and pages in the first PDF.
  • Since each file has one page with a single table, select either “Table001” or “Page001.” If your PDFs have multiple tables, selecting “Page001” will capture all tables.

Choosing tables helps avoid unwanted content like text and correctly identifies table headers.

Click OK to start the combining process.

Processing the PDF Files

Power Query will create queries, parameters, and functions to process each PDF in the folder, extracting and combining the data into one table.

Identifying Data Origins

Before loading the data into Excel, we need to identify the region for each transaction. Each file’s name contains the region at the end.

  • Click Transform tab -> Text Column group -> Extract -> Text Between Delimiters.
  • In the dialog box, select an underscore (_) as the Start Delimiter and a period (.) as the End Delimiter.
  • The filenames have many underscores. So, we need to search for the underscore from the end. Set Scan for the Start Delimiter to “From the end of the input.”
  • Click OK and rename the column to “Store.”
  • Finally, click Home tab -> Close group -> Close & Load to load the data into a new Excel sheet.

Now, we have successfully combined 351 rows of data into a single, organized table.

Featured Course

Power Excel Bundle

10x your productivity in Excel 💪. This bundle includes Master Excel Power Query course & Master Power Pivot and DAX course. It’s Excel’s Ultimate Power Tools in ONE convenient (cost savings) bundle. Learnings apply to Power BI as well.
Learn More
Power Excel Bundle course cover

Adding New PDF Data to the Folder

If you get new PDFs periodically, you can easily update your Excel table with just two steps:

  1. Add the new PDF to the folder of existing PDF files.
  2. Refresh the output table in Excel.

Updating the Query With Additional Transformations

Need to update your query? It’s simple. For example, to combine First Name and Last Name into Full Name:

  1. Open the query in the Power Query Editor.
  2. Select the FirstName and LastName columns (in that order).
  3. Click Transform (tab) -> Text Column (group) -> Merge Columns.
  1. In the dialog box, choose “Space” as the Separator and name the new column “Full Name.”
  1. Click OK to apply the transformation.
  2. Click Close & Load to send the updated results back to Excel.

Dealing with Non-PDF Files

Sometimes, a non-PDF file might sneak into your PDF folder. Here’s how to handle it:

  • Identify the Issue: If you refresh the data and see an error, it might be due to a non-PDF file in the folder.
  • Filter for PDFs Only:
    • Open the query in the Power Query Editor.
    • Select the SOURCE step.
    • Filter the Extension column to include only “.pdf” files.
  • Click Close & Load to update the results in Excel.

Handling Files that Slip Through

Sometimes, even with a filter, non-PDF files can still cause issues:

Remember how the Microsoft Word document appeared at the bottom of the metadata list?

If we add a Microsoft Excel file…

Refreshing the query shows the newly added filed at the SOURCE step.

The new Excel file appears at the top of the list because of the alphabetical sorting by filename.

Of course, we added a step to filter out all non-PDF files.

Unbeknownst to us, another query that was created for us named “Sample File” is selecting the first file in the metadata list of files for examination and processing.

When the query was created, the first file in the list was a proper PDF file.

Now the metadata list shows the Excel file as the first file.

The “Sample File” query performed its folder scan, independent of the original query’s folder scan.

This causes the “Transform Sample File” query to fail because it is attempting to perform PDF transformations on an Excel file.

Correcting the Error in the Sample File

To correct this issue, we need to apply the same filter step to the “Sample File” as we did to the main query.

Select the SOURCE step of the “Sample File” query and perform the same filter operation as earlier.

Close & Load the updated queries back into Excel.

All is well with the world, once again.

Leila Gharani

I'm a 6x Microsoft MVP with over 15 years of experience implementing and professionals on Management Information Systems of different sizes and nature.

My background is Masters in Economics, Economist, Consultant, Oracle HFM Accounting Systems Expert, SAP BW Project Manager. My passion is teaching, experimenting and sharing. I am also addicted to learning and enjoy taking online courses on a variety of topics.