A
Ajax
Aspect-Oriented
 
B
Bloggers
Build Systems
ByteCode
 
C
CMS
Cache Solutions
Charting & Reporting
Chat Servers
Code Analyzers
Code Beautifiers
Code Coverage
Collections
Connection Pools
Crawlers
 
D
Databases
 
E
EJB Servers
ERP & CRM
ESB
Expression Languages
 
F
Forum Soft
 
G
General Purpose
Groupware
 
H
HTML Parsers
 
I
IDEs
Installers
Inversion of Control
Issue Tracking
 
J
J2EE Frameworks
JDBC
JMS
JMX
JSP Tag Libraries
Job Schedulers
 
L
Logging Tools
 
M
Mail Clients
 
N
Network Clients
Network Servers
 
O
Obfuscators
 
P
PDF Libraries
Parser Generators
Persistence
Portals
Profilers
 
R
RSS & RDF Tools
Rule Engines
 
S
SQL Clients
Scripting Languages
Search Engines
Source Control
 
T
Template Engines
Testing Tools
Text Processing
 
U
UML & Modeling
 
V
Validation
 
W
Web Frameworks
Web Mail
Web Servers
Web Services
Web Testing
Wiki Engines
 
X
XML Parsers
XML UI Toolkits
 

HtmlCleaner

HtmlCleaner is open-source HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring the order to tags, attributes and ordinary text. For the given HTML document, HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web-browsers use in order to create document object model. However, user may provide custom tag and rule set for tag filtering and balancing.

 
Category HTML Parsers
License BSD License
HomePage http://htmlcleaner.sourceforge.net/





Java is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries. This site is independent of Sun Microsystems, Inc.