HTML parser release notes



  • Performance improvements



  • Introduced “at” filter support in BasicElementFilter so matching rules can easily include or discard elements based on their position inside their parent node.

  • Introduced “underHeaderAtRow” filter support in BasicElementFilter to match values at a given row under arbitrary header cells in any position of a table.

  • Added no-varargs method alternatives in BasicElementFilter, including

    • allowing user to provide a space separated sequence of CSS class names in method classes

    • providing multiple possible attribute values of any type in method attribute.
  • Added support for filtering rows of an entity via the newly introduced interface {@@LINK HtmlRecordFilter}.

  • Parser ignores duplicate paths that might have been accidentally defined for the same field.

  • Log messages now display group rules applied to any given matching rule

Bug fixes

  • Fixed handling of matching rules applied over sequence of sibling elements

  • Validate annotation is ignored in some cases

  • Getting list of values matched directly from a HtmlElement may return nulls along with the desired values

  • Persistent fields should have their values cleared if they are defined from a group and it becomes active.



  • Made license manager dialog appear automatically if a license can’t be found or if it is invalid.

  • Adding support for regex validation and custom validations on class attributes and methods annotated with @Validate

Bug fixes

  • Fixed inconsistent result rows produced by link follower fields added in between the parent entity fields.

  • Fixed issues parsing stored files originated from link followers that have been configured to be saved under a non-standard file location.



  • Implemented support for file:// protocol to allow transforming locally stored pages and resources via fetchResources

  • Added support for @Validate annotation on annotated java beans.

  • Added support for including fields from “parent” row into linked entity records: github issue #3

Bug fixes

  • Fetch resources will alter CSS files already downloaded in a previous run which can potentially break the resource paths used in it.

  • Fetch resources does not create daemon threads and keeps main thread alive if users don’t explicitly shut down the executor service.


Bug fixes

  • Group constants not applied when declared last: github issue #1

  • Parser won’t let the JVM shut down without explicitly calling HtmlParserSettings.getExecutorService().shutDown(): github issue #2


  • First public release