I've just released #jsoup 1.22.2.
This release makes editing the DOM during traversal more predictable, refreshes the default HTML tag definitions with newer elements and better text boundaries, and improves reliability in parsing and HTTP transport.
I'm happy to announce the release of #jsoup 1.22.1!
This adds support for the re2j regular expression engine for regex-based CSS selectors, a configurable maximum parser depth, and brings a bunch of bug fixes and improvements.
Happy to announce that #jsoup 1.21.2 is out!
Adds custom SSLContext support in HTTP/2 connections, brings DOM/fragment parsing perf gains, and fixes some edge cases in parsing, traversal, cloning, and concurrent reads.
https://t.co/DYrd0kYaPe
Happy to announce that #jsoup v1.21.1 is out now! Lots of improvements, particularly the ability to directly select nodes (like text, data) with the CSS selectors.
This release also delivers performance optimizations, new API enhancements such as flexible tag definitions via TagSet, concise CSS selectors, and parser thread-safety improvements.
Big thanks to everyone who helped out.
https://t.co/4mBgku2ZFT
Very happy to announce that I've just released jsoup 1.20.1!
Lots of improvements and bug fixes -- improved HTML parse rules to align with modern browsers, improved XML namespace handling, and a redesigned HTML pretty-printer for better consistency and customizability.
@gamesover@grok Not snark; how did you know it was true? I've had a bunch of times where these sound real but have no underlying basis. Great when they link to a source, but I find a regular search already has that.
The implementation uses Java's multi-release JAR feature to make requests via the HttpClient impl if it's available, or will fallback to the current HttpURLConnection. This also gives a path to http/3 support when that PEP lands in Java.
https://t.co/X1BefJff3B
The next version of #jsoup will (finally!) support making http/2 requests, if you're running on Java 11+. It still works down to Java 8 if you need that.
It's a drop-in update with no changes required for existing Jsoup.connect() code, other than setting a system property (jsoup.useHttpClient) to enable.
As a result, heap allocations (bytes/op) are down by -6% to -89%, and throughput has improved by -2% to +143% (with the biggest gains for smaller inputs).
These improvements will be in the next release of jsoup, 1.18.2 (coming soon!)
I've been working on improving parse throughput and reducing memory allocations in jsoup (Java HTML parser) by recycling char[] and byte[] buffers between invocations—avoiding unnecessary heap allocations and garbage collection.
Details: https://t.co/O8AFz85lSc