You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+94-35Lines changed: 94 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ NekoHTML adds missing parent elements; automatically closes elements with option
16
16
✅ **Standards Compliant** - Follows HTML parsing specifications
17
17
✅ **Well Tested** - Over 8,000 test cases
18
18
✅ **No External Dependencies** - Pure Java implementation
19
-
✅ **Java 8+ Compatible** - Works with Java 8, 11, 17, 21 and beyond
19
+
✅ **Java 17 Compatible** - Works with Java 17, 21 and beyond
20
20
✅ **Android Support** - Runs on Android platforms
21
21
22
22
The **Htmlunit-NekoHtml** Parser is used by [HtmlUnit](https://htmlunit.sourceforge.io/).
@@ -31,25 +31,11 @@ The **Htmlunit-NekoHtml** Parser is used by [HtmlUnit](https://htmlunit.sourcefo
31
31
32
32
#### Version 5
33
33
34
-
Work on HtmlUnit-NekoHTML 5.0 has started. This new major version will require **JDK 17 or higher**.
34
+
Starting with version 5.0.0, **JDK 17 or higher is required**.
35
+
If you are still on JDK 8, see [Legacy Support (JDK 8)](#legacy-support-jdk-8) below.
35
36
36
37
37
-
#### Legacy Support (JDK 8)
38
-
39
-
If you need to continue using **JDK 8**, please note that versions 4.x will remain available as-is. However,
40
-
**ongoing maintenance and fixes for JDK 8 compatibility are only available through sponsorship**.
41
-
42
-
Maintaining separate fix versions for JDK 8 requires significant additional effort for __backporting__, testing, and release management.
43
-
44
-
**To enable continued JDK 8 support**, please contact me via email to discuss sponsorship options. Sponsorship provides:
45
-
46
-
-__Backporting__ security and bug fixes to the 4.x branch
47
-
- Maintaining compatibility with older Java versions
48
-
- Timely releases for critical issues
49
-
50
-
Without sponsorship, the 4.x branch will not receive updates. Your support ensures the long-term __sustainability__ of this project across multiple Java versions.
51
-
52
-
### Latest release Version 4.21.0 / December 28, 2025
Version 5.0.0 is a major release. The changes below cover everything you need to update when upgrading from any 4.x release.
360
+
361
+
### Java version requirement
362
+
363
+
5.x requires **JDK 17 or higher**. Java 8 and Java 11 are no longer supported.
364
+
If you cannot upgrade your JDK, stay on the [4.x branch](#legacy-support-jdk-8).
365
+
366
+
### HTMLElements thread-safety fix (since 4.17.0)
367
+
368
+
The shared `HTMLElements` instance no longer caches unknown elements because that cache was not thread-safe. If your code relied on the unknown-element cache, switch to `HTMLElementsWithCache` and create a new instance per parse run:
369
+
370
+
```java
371
+
// 4.x — relied on shared cache in HTMLElements (not thread-safe)
### HTMLScanner document handler is now required (since 4.12.0)
380
+
381
+
`HTMLScanner` now enforces that a document handler is set before parsing. The null check was moved to the setter, so passing `null` will throw immediately rather than failing silently mid-parse. Ensure you always call `setContentHandler` (SAXParser) or provide a handler via the `DOMParser` constructor before calling `parse()`.
382
+
383
+
### Tag name casing for auto-inserted tags (since 4.20.0 / 4.21.0)
384
+
385
+
Auto-inserted tags (e.g. `<html>`, `<head>`, `<body>`) are now consistently produced in **lowercase** by default. If your code compared tag names assuming uppercase auto-inserted tags, update those comparisons or set the `NAMES_ELEMS` property explicitly:
String name = element.getTagName().toLowerCase(Locale.ROOT);
393
+
```
394
+
395
+
### EOF handling changed from exceptions to return codes (since 4.19.0)
396
+
397
+
EOF conditions inside the scanner are now signalled via return codes rather than exceptions. This is an internal change that should not affect typical parser usage, but if you have custom `HTMLScanner` subclasses that override scanning methods and catch internal exceptions for EOF detection, review those overrides.
398
+
399
+
### SAXParser factory added (since 4.15.0)
400
+
401
+
`NekoSAXParserFactory` was added as a standard `javax.xml.parsers.SAXParserFactory` entry point. If you were constructing `SAXParser` directly and want to align with the standard JAXP pattern going forward:
-**`XMLLocator`** was simplified and some internal fields removed in 4.5.0. Custom subclasses that accessed internal locator fields will need to be updated.
412
+
-**`XMLAttributesImpl`** was simplified in 4.5.0; direct field access patterns in custom subclasses should be replaced with the public API.
413
+
414
+
415
+
## Migrating from 3.x to 4.x
357
416
358
417
Version 4.x introduces a major change in the handling of encodings - the mapping from the encoding
359
418
label found in the meta tag to the encoding to be used for parsing the document got some significant
@@ -369,7 +428,7 @@ For this also
369
428
encoding translator if you like to have the old translation behavior (parser.setProperty(HTMLScanner.ENCODING_TRANSLATOR, EncodingMap.INSTANCE))
370
429
371
430
372
-
## Porting from 2.x to 3.x
431
+
## Migrating from 2.x to 3.x
373
432
374
433
Usually the upgrade should be simple:
375
434
@@ -443,18 +502,18 @@ This part is intended for committer who are packaging a release.
443
502
* Create the version on Github
444
503
* login to Github and open project https://github.com/HtmlUnit/htmlunit-neko
445
504
* click Releases > Draft new release
446
-
* fill the tag and title field with the release number (e.g. 4.0.0)
505
+
* fill the tag and title field with the release number (e.g. 5.0.0)
447
506
* append
448
-
* neko-htmlunit-4.xx.jar
449
-
* neko-htmlunit-4.xx.jar.asc
450
-
* neko-htmlunit-4.xx.pom
451
-
* neko-htmlunit-4.xx.pom.asc
452
-
* neko-htmlunit-4.xx-javadoc.jar
453
-
* neko-htmlunit-4.xx-javadoc.jar.asc
454
-
* neko-htmlunit-4.xx-sources.jar
455
-
* neko-htmlunit-4.xx-sources.jar.asc
456
-
* neko-htmlunit-4.xx-tests.jar
457
-
* neko-htmlunit-4.xx-tests.jar.asc
507
+
* neko-htmlunit-5.xx.jar
508
+
* neko-htmlunit-5.xx.jar.asc
509
+
* neko-htmlunit-5.xx.pom
510
+
* neko-htmlunit-5.xx.pom.asc
511
+
* neko-htmlunit-5.xx-javadoc.jar
512
+
* neko-htmlunit-5.xx-javadoc.jar.asc
513
+
* neko-htmlunit-5.xx-sources.jar
514
+
* neko-htmlunit-5.xx-sources.jar.asc
515
+
* neko-htmlunit-5.xx-tests.jar
516
+
* neko-htmlunit-5.xx-tests.jar.asc
458
517
* and publish the release
459
518
460
519
* Update the version number in pom.xml to start next snapshot development
0 commit comments