Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 627 Vote(s) - 3.49 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Solr for Arabic PDF's

#1
I am trying to search arabic PDFs in Apache Solr. The problem appears to be that Tika indexes the PDF in reverse order (Left-to-right) instead of (Right-to-left).

I have found references about this problem here:

- [

[To see links please register here]

][1]
- [

[To see links please register here]

][2]
- [

[To see links please register here]

][3]

However, I don't know how to include the latest version of PDFBOX or ICU4J in my apache solr. My `Apache Solr Contrib/extraction/lib` folder contains `pdfbox-1.6.0.jar` and `icu4j-4.8.1.1.jar` . Will removing the mentioned files and replacing them with the latest libraries from their projects pages be satisfactory to force TIKA to use them?

Please explain as I don't have a previous experience with Java servlet. Thanks!



[1]:

[To see links please register here]

[2]:

[To see links please register here]

[3]:

[To see links please register here]

Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through