Looking for some Java libraries which will extract the text from a PDF,
retaining white space formatting i.e. paragraphs, newlines etc.
I've looked at, and tested, pdfbox, which does extract the text however
it does not preserve, or insert, paragraphs, newlines into its output.
I've looked at IText but according to the FAQ this will not extract the
text from the PDF.
I'd rather not use an external program like "pdftotext", a pure Java,
library based solution would be better.