Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 339 Vote(s) - 3.58 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web scraping with Java

#11
You might look into [jwht-scraper](

[To see links please register here]

)!

This is a complete __scraping__ framework that has all the features a developper could expect from a web __scraper__ :

- [Proxy support](

[To see links please register here]

)
- [Warning Sign Support to detect captchas and more](

[To see links please register here]

)
- [Complex link following features](

[To see links please register here]

)
- [Multithreading](

[To see links please register here]

)
- [Various scraping delays when required](

[To see links please register here]

)
- [Rotating User-Agent](

[To see links please register here]

)
- [Request auto retry and HTTP redirections supports](

[To see links please register here]

)
- [HTTP headers, cookies and more support](

[To see links please register here]

)
- [GET and POST support](

[To see links please register here]

)
- [Annotation Configuration](

[To see links please register here]

)
- [Detailed Scraping Metrics](

[To see links please register here]

)
- [Async handling of the scraper client](

[To see links please register here]

)
- [jwht-htmltopojo fully featured framework to map HTML to POJO](

[To see links please register here]

)
- [Custom Input Format handling and built in JSON -> POJO mapping](

[To see links please register here]

)
- [Full Exception Handling Control](

[To see links please register here]

)
- [Detailed Logging with log4j](

[To see links please register here]

)
- [POJO injection](

[To see links please register here]

)
- Custom processing hooks
- Easy to use and well documented API

It works with (jwht-htmltopojo)[

[To see links please register here]

) lib which itsef uses Jsoup mentionned by several other people here.

Together they will help you built awesome scrapers mapping directly HTML to POJOs and bypassing any classical scraping problems in only a matter of minutes!

Hope this might help some people here!

Disclaimer, I am the one who developed it, feel free to let me know your remarks!
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through