07-20-2023, 10:27 AM
You might look into [jwht-scraper](
This is a complete __scraping__ framework that has all the features a developper could expect from a web __scraper__ :
- [Proxy support](
- [Warning Sign Support to detect captchas and more](
- [Complex link following features](
- [Multithreading](
- [Various scraping delays when required](
- [Rotating User-Agent](
- [Request auto retry and HTTP redirections supports](
- [HTTP headers, cookies and more support](
- [GET and POST support](
- [Annotation Configuration](
- [Detailed Scraping Metrics](
- [Async handling of the scraper client](
- [jwht-htmltopojo fully featured framework to map HTML to POJO](
- [Custom Input Format handling and built in JSON -> POJO mapping](
- [Full Exception Handling Control](
- [Detailed Logging with log4j](
- [POJO injection](
- Custom processing hooks
- Easy to use and well documented API
It works with (jwht-htmltopojo)[
Together they will help you built awesome scrapers mapping directly HTML to POJOs and bypassing any classical scraping problems in only a matter of minutes!
Hope this might help some people here!
Disclaimer, I am the one who developed it, feel free to let me know your remarks!
[To see links please register here]
)!This is a complete __scraping__ framework that has all the features a developper could expect from a web __scraper__ :
- [Proxy support](
[To see links please register here]
)- [Warning Sign Support to detect captchas and more](
[To see links please register here]
)- [Complex link following features](
[To see links please register here]
)- [Multithreading](
[To see links please register here]
)- [Various scraping delays when required](
[To see links please register here]
)- [Rotating User-Agent](
[To see links please register here]
)- [Request auto retry and HTTP redirections supports](
[To see links please register here]
)- [HTTP headers, cookies and more support](
[To see links please register here]
)- [GET and POST support](
[To see links please register here]
)- [Annotation Configuration](
[To see links please register here]
)- [Detailed Scraping Metrics](
[To see links please register here]
)- [Async handling of the scraper client](
[To see links please register here]
)- [jwht-htmltopojo fully featured framework to map HTML to POJO](
[To see links please register here]
)- [Custom Input Format handling and built in JSON -> POJO mapping](
[To see links please register here]
)- [Full Exception Handling Control](
[To see links please register here]
)- [Detailed Logging with log4j](
[To see links please register here]
)- [POJO injection](
[To see links please register here]
)- Custom processing hooks
- Easy to use and well documented API
It works with (jwht-htmltopojo)[
[To see links please register here]
) lib which itsef uses Jsoup mentionned by several other people here.Together they will help you built awesome scrapers mapping directly HTML to POJOs and bypassing any classical scraping problems in only a matter of minutes!
Hope this might help some people here!
Disclaimer, I am the one who developed it, feel free to let me know your remarks!