html - How can I create a basic human readable plain text representation of XHTML using Java? -
given simple xhtml, i'd create human readable plain text version of it. involve removing html tags, adding or preserving whitespace.
for example, input:
<div> <p>this text, <b>bold</b>.</p> <ul> <li>point one</li> <li>point two</li> </ul> </div>
would become:
"this text, bold. point 1 point two"
(commas between lis ideal... :)
jericho html parser. can either strip tags or call on "renderer" class tries mimick (eg bulleted lists tabbed)
Comments
Post a Comment