How do I scrape text from a PDF to Excel?

How do I scrape text from a PDF to Excel?

I have a PDF with a small amount of text on it.

I'd like to extract the text and write it to an Excel worksheet.

It seems that I can't use the COM interface since the text I want is not on a form or a table, but rather on a picture that's been "inserted" into the PDF. I think I'll need to use IText, but how do I access that data? I don't know if there's a way to get at it, or if I need to do some more work with the document.

I'd like to avoid reading the entire PDF to Excel, as I'm trying to scrape just a couple of specific lines of text. You should be able to use Ole32.dll's IStream. The Ole32 DLL comes with the .NET Framework.

Here is a SO answer describing how to use it.

Related Answers

What are PDF scrapers?

I am using a simple .NET Core application that reads data from a table in a PDF f...

What is the best PDF scraper?

A PDF is a file which is basically a container for the information of the doc...