Topic: How do I enable full text searching of documents?
Topic type:
This guide walks through the process of turning on the ability to pull in a document's text into the description field, thus making it full text searchable.
To enable the ability to pull the text (no images or tables) from an uploaded PDF, Microsoft Word, Plain Text, or HTML document, you must install some supporting software and then enable it for the site:
- make sure you have installed the wv poppler-utils lynx packages (assuming you are using Debian Lenny, check other "Required Software" guides for other platforms) as mentioned in the installation guide
- login in as site admin with tech admin role (the default admin account should do the trick)
- click on "reconfigure site" in the "Administrator's Toolbox" towards the bottom of any page
- click "Advanced Options"
- click "Documents"
- change the "Enable Converting Documents:" field to true and click "Save"
- restart your server (using the button on the resulting page, or for earlier versions from the command line)
After that each document detail page will have a "Replace description with document text " link that can be used to pull in the text into the description, thus making it full text searchable.
Discuss This Topic
There are 9 comments in this discussion.
Read and join this discussion
Massoud,
The only time I've seen this error is when the dependencies to convert documents are not installed.
Please ensure the following are installed on the system and accessible by the account you are running Kete under
- wv
- poppler-utils
- lynx
Regards
Kieran
hello ,
all of these packages are installed and up to dated
> hello ,
>
> all of these packages are installed and up to dated
Does it work now or do you mean that they were installed successfully before and the error remains?
Have you tried with any other types of uploaded documents? Is there anything in the log? On the command line from the server does this work (put the actual path to file in for "filename"):
wvWare -c utf-8 --nographics -X filename
If this doesn't give you the document formatted as HTML back, then something is wrong with that command (not Kete). If it does work, then we have ruled that out as the cause and it is more likely Kete that is at fault.
yes ,
this command run successfully
and the conversion to html was done successfully
I tried loading pdf files with English only text, and I got the same error message in response to requesting:
Document type: application/pdf ( Replace description with document text )
Then I decided to execute the advice (Please edit the description manually) by copying and pasting text into the description field manually, thru adding an Arabic topic (http://kete.maktabat-online.com/site/topics/show/35) and another time thru adding an English document (http://kete.maktabat-online.com/site/documents/show/6)... Both items were added successfully.
Here is what I found out. The text in both cases was never indexed. I tried searching for words in either items but the message 'No results found. ... ' is always showing no matter which field I search in.
This leads me to think that either certain indexing configuration parameters need to be set, or there are bug issues with Kete 1.3 instalaltion we have,
Thanks, Massoud.
Hey Ahmed & Massoud,
Given the permission problems you had with the translation saving in another topic, I'm thinking the same thing is probably showing up here. It could be that the process trying to read the document doesn't have permission to. Try this:
$ chmod 777 public/documents
Then try convert the document again. If that still doesn't solve the problem, can you please email a document that has this issue to kieran [at] katipo [dot] co [dot] nz and I'll see if I can get the same issue.
As for the indexing, try rebuilding your sites zebra database from the admin toolbox. If that doesn't work, send along the document you copy and pasted in to the email address mentioned above, and I'll give it a shot on my local copy.
Regards
Kieran
hello Kieran ,
i created a new txt file contains only three line
and the error maessage is the same
Helo Kieran,
I will send sample documents (some in English only, some in Arabic only, and some has mixed text) to your (kieran at katipo) email address to try.
The error message:
Still appears.
Thanks, Massoud
KwareTech
said Not able to replace desc field with doc text with Kete 1.3
As we are testing with Kete 1.3, we faced this problem after uploaded a .doc MS Word document:
There were problems converting the text of the uploaded document to the document's description. Please edit the description manually.
The .doc word document uploaded successfully is of Word 2003 type and with English only text. We first faced the message when we tried the link ( Replace description with document text ) using Arabic documents, but then we thought maybe the problem is related to the Arabic text until we realized that the error message comes when using English text as well. I assume this means there is a bug in this function with Kete 1.3
Thanks, Massoud.