Why does my PDF parsing not work?

posted in: AI | 0

image

While Word documents tend to have well defined structures, not all PDFs are created equally. Some are literally images pasted into a document and some are not set up to be accessible. The easiest way I’ve found to tell if your PDF is going to be that extra bit challenging to extract useful information from is to open it in Adobe, goto Document Properties and see if it’s a Tagged PDF. The example above is.