Example using the default setup
Parsing HTML using the default setup is as easy as creating a PDF in five steps:
Document objectPdfWriter instance.DocumentXMLWorkerHelper.getInstance().parseXHtml()DocumentLet's take a look at a code snippet that converts the walden.html file to PDF.
In this snippet we use the XMLWorkerHelper class and its parseXHtml() method to do all the work:
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream("results/walden1.pdf"));
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
HTMLParsingDefault.class.getResourceAsStream("/html/walden.html"), null);
document.close();
see HTMLParsingDefault and the resulting PDF walden1.pdf
The HTML was taken from project Gutenberg. It's a book by H.D. Thoreau: Walden, or Life in the Woods.
When we look at the first page that is generated by iText, we see that something went wrong: the first lines on the HTML result in a line of gibberish. What went wrong and how can we fix it?