07-27-2023, 12:32 AM
I am trying to search arabic PDFs in Apache Solr. The problem appears to be that Tika indexes the PDF in reverse order (Left-to-right) instead of (Right-to-left).
I have found references about this problem here:
- [
- [
- [
However, I don't know how to include the latest version of PDFBOX or ICU4J in my apache solr. My `Apache Solr Contrib/extraction/lib` folder contains `pdfbox-1.6.0.jar` and `icu4j-4.8.1.1.jar` . Will removing the mentioned files and replacing them with the latest libraries from their projects pages be satisfactory to force TIKA to use them?
Please explain as I don't have a previous experience with Java servlet. Thanks!
[1]:
I have found references about this problem here:
- [
[To see links please register here]
][1]- [
[To see links please register here]
][2]- [
[To see links please register here]
][3]However, I don't know how to include the latest version of PDFBOX or ICU4J in my apache solr. My `Apache Solr Contrib/extraction/lib` folder contains `pdfbox-1.6.0.jar` and `icu4j-4.8.1.1.jar` . Will removing the mentioned files and replacing them with the latest libraries from their projects pages be satisfactory to force TIKA to use them?
Please explain as I don't have a previous experience with Java servlet. Thanks!
[1]:
[To see links please register here]
[2]:[To see links please register here]
[3]:[To see links please register here]