How do I scrape text from a PDF to Excel?
I have a PDF with a small amount of text on it.
I'd like to extract the text and write it to an Excel worksheet.
It seems that I can't use the COM interface since the text I want is not on a form or a table, but rather on a picture that's been "inserted" into the PDF. I think I'll need to use IText, but how do I access that data? I don't know if there's a way to get at it, or if I need to do some more work with the document.
I'd like to avoid reading the entire PDF to Excel, as I'm trying to scrape just a couple of specific lines of text. You should be able to use Ole32.dll's IStream. The Ole32 DLL comes with the .NET Framework.
Here is a SO answer describing how to use it.
Related Answers
Is there a free program to convert PDF to Excel?
I've seen a few programs that are supposed to be able to c...
How can I open a PDF file in Excel for free?
How to Convert PDF to Excel for Free. Convert PDF to Exce...
What is the best way to extract data from a PDF?
I've been using ScraperWiki for a few years now and love it. If you'r...