Watch the video below:
Today we are going to learn how to get PDF data into Excel using Power query. That means we don’t require any VBA. We navigate to the ‘data’ tab, get data from file and select ‘from PDF’. We are working in Excel 365. A new window called ‘Import Data’ pops up. We navigate to the folder that contains our PDF file and we select our PDF file, for example, ‘LTCG.pdf’. We select on ‘Import’. We select ‘Table001(Page 1) and click on ‘Transform’ data. We can now view our data in the Power Query window. On the right-hand side we have the item ‘Properties’. Under the properties item we have the ‘Name’ text-box. We can click inside it and change the name to CalculateLTCG.
On the left-hand side in the table data we observe two headers Column1 and Column2. Next to the Column1 we have a table icon with a drop-down arrow which we can click to get menu items. Here we select ‘Use First Rows as Headers’ which replaces the Column1 and Column2 headers with the appropriate headers ‘Particulars’ and ‘(Rs.)’. The values under the header ‘(Rs.)’ are text so we click on the icon ‘ABC’ and from the options that drop-down we select ‘whole number’. In the new window called ‘Change Column Type’, we select ‘Replace current’ and our data is converted to whole numbers. Against the item number 3 with the description ‘Less: Indexed Cost of Improvement’ which had the value ‘Nil’ we get the text ‘Error’. We can remove this ‘Error’ entry from the table. On the right-hand side under ‘Applied Steps’ you can note all the steps or actions that we undertaken till now.
To remove the row with the ‘Error’ we select the table icon next to the ‘Particulars’ header and click on the drop-down arrow. From the menu options we select ‘Remove Errors’ and the row is removed and we are left with the cleaned table. So now we have the PDF data in a tabular form which can be brought into Excel.
Under the ‘File’ menu click on the drop-down arrow next to the ‘Close & Load’ item and select ‘Close & Load To…’ . This action saves the changes to the query, closes the Query window and allows us to specify where and how to place the table in the Excel worksheet.
A new window called ‘Import Data’ opens and we want to view the data in our workbook. The options available are Table, Pivot Table Report, Pivot Chart and Only Create Connection. We can also specify the range or cell where we wish to place the data in the current worksheet. We can also add a new worksheet to place the data. Also offered is a check-box with the caption ‘Add this data to the Data Model’.
We have decided to place the table data in this case into the existing worksheet starting at Range A1. You can view in real time how the data is being brought into the worksheet and soon the data table is visible. Now we can manipulate the tabular data according to our requirements. But now comes the most interesting part: we can edit the PDF file using appropriate software like Acrobat Reader and then click on ‘Refresh’ or ‘Refrsh All’ in the ‘Queries and Connections’ group to update our data!