Topic: DigitalNZ and other search sources in Kete 1.3
Topic type:
Kete 1.3 introduces a new search sources feature. This article explains how to setup DigitalNZ search sources, and how to configure them to include/exclude certain content.
This article explains how to get the base url and more link urls for DigitalNZ search sources, not how to setup search sources. For how to setup search sources, please refer to [topic coming soon].
Note: though this topic is specific to DigitalNZ, the general techniques of discovering a particular search source's URLs and their format can be used as a rough guide for other websites that provide search services. If you use search on a site and you see an RSS icon, chances are good you can use it as a Kete search source.
What is DigitalNZ?
DigitalNZ "aims to make New Zealand digital content easy to find, share and use" (taken from DigitalNZ's about page). It's essentially a nationwide search engine of New Zealand made content. It gathers information from various providers around New Zealand into one place. Kete can then hook into this information by creating searches and displaying results alongside it's own information. There is an example of this functionality on this website. If you type a search term into the top right search box on this page, you'll see possible related results from DigtalNZ's feed.
Where do I find DigitalNZ?
http://digitalnz.org/ and http://search.digitalnz.org/
When you search for something, you'll see on that search page the following things:
- To the left, results are broken into category. Clicking one will drill down into results with that category.
- In the middle, you of course have the results. Each result has a title (what the result is called), a provider (where the content of this result originates from), a trucated description of the result, and then a link to the content page.
- To the right, you have a bunch of filters that can refine results. In this topic, we'll only focus on the Content Provider and Website filters.
Aquiring a DigitalNZ API Key
DigitalNZ search is open to everyone, but the API (which Kete relies on) is protected by an API key setup. In order to use their service, you'll need to sign up for a key. It's free to do so and only takes a few minutes. You can use the same key for multiple search sources on the same site, but it's advised you sign up for a new api key per site (under the email that users of the site would contact, like support@yoursite.co.nz).
Where ever you see the text [api_key] in this aritcle, the key you just got should replace that text in all cases.
Setting up the search source
For the impatient: if you simply want to search DNZ for "every other content provider but me", here's the pattern of the URL -
"http://api.digitalnz.org/v3/records.rss?api_key=[api_key]&text=NOT content_partner:"your content provider name" AND"
Note: If you get confused or are in a rush, then skip ahead to the last section where I show how to use DigitalNZs hosted searchs feature.
First thing you need to do is find the results you want in a regular search. In this example, I want "All images from the auckland city libraries heritage images collection". By using the DigitalNZ search, I drill down into the images category, the Auckland City Libraries content partner, and finally into the Heritage Images collection, and get the following resulting URL.
An important note here: You don't have to use all or any filters if you don't want to. In this example, I'm using a category and 2 filters, but it would easily work without a category and 1 filter, or with just a category. You just need to find a result set you're happy with and then continue on from here.
Note: if you simply want to use all of DigitalNZ's results that match your search term with no other criteria, there is a quick automatic set up included in the Kete search sources administration page.
The next step is to convert that URL to a url that the search sources can use. For this, we'll use the format of DigitalNZ's RSS Feed. We'll start by extracting the category, content_partner, and collection attributes from the url. We get:
- category:"Images"
- content_partnet:"Auckland City Libraries"
- collection:"Auckland City Libraries Heritage Images Collection"
You'll note that we have stripped out the plus symbols (+) from the original values because the RSS feed does not need them, as we've wrapped them in quotes. Then, apply these values to the RSS feed url. Do this by joining the values above with the word AND, and putting it in the search_text value like so:
http://api.digitalnz.org/v3/records.rss?api_key=[api_key]&text=category:"Images" AND collection:"Auckland City Libraries Heritage Images Collection" AND content_partner:"Auckland City Libraries" AND sun (note: the link won't work because it requires a personal API Key I can't post here. You can get your own one for free. See the section before this one)
Comparing the difference between results url and the rss feed one, we see that all our category, collection and content provider settings are wrapped within the search_text value. The value are wrapped in double quotes, and they are seperated by the word AND. The final word on the end is the search term and matches what we used earlier.
Now when you create your search source (which is covered by another topic), the base url will be the one above (the rss feed url) and the more link will point to the search results url we formed earlier.
Limiting DigitalNZ Results
Limiting results can greatly speed up search source retrieval. If DigitalNZ doesn't need to fetch 100 records, because you only need 5, then let it know. When you setup your search source, enter the amount you want into the limit field, and then select num_results from the limit param setting. The search source will take care of the rest. It will send that value to the DigitalNZ search source each time it gets results, which should reduce how long it takes to get a response back.
Finetuning results using AND, NOT, OR, and * operators
If you're getting too many results, or not enough of them, you can fine tune your results by placing the operators AND, NOT, OR, and * in your search terms. I'll explain each, and below them, show an example, and what would be found by it.
I've also ranked them in order of strictness from strict -> not strict. The more strict something is, the more likely you are to find what you want, but have the risk of finding no results at all. The least strict, the more likely you will have heaps of records you aren't looking for. Experiment to find some middle ground.
Furthermore, DigitalNZ makes some assumptions on your search terms incase operators are not present. Default for links between search terms is the AND operator, so a search term of "this that" will act the same as "this AND that". It also assumes any characters before or after your terms. So the full behind the scenes result is "*this* AND *that*".
- AND - Requires that the content before it and the content after it both be in the result.
e.g. "this AND that" would find "that it should work this way" but not "that does it" - NOT - Requires that the content after is not be in the result.
e.g. "this NOT that" would find "this works" but not "that and that does not work" - OR - Requires that either the content before or the content after or both be in the result.
e.g. "this OR that" would find "that and that" and "has this only" - * - Matches any letters in place of it till the next space character, or fills gaps when placed between words. Can be used multiple times in the same word.
e.g. "he*" would find "hello" but not "there"
e.g. "*he" would find "the" but not "there"
e.g. "*he*" would find "hey", "hello", "there" (and any word with "he" at the start, in between, or at the end of the word)
So how does this apply to DigitalNZ? They support the above four operators on both the search results and RSS feeds. So again, lets create a scenerio. We want "all results about the bright sun". So we start with the basic, and watch as the query string expands. I'll touch on one last concept (using parentheses) afterward.
bright+sun - Start with something basic
bright+AND+sun* - Catch sunlight and sunshine as well.
bright+AND+sun*+NOT+flower - Opps, don't want sunflowers.
bright+AND+sun*+NOT+(flower+OR+autumn+OR+winter+OR+spring) - Only have summer images
Final Result (images category only): http://search.digitalnz.org/en/search?filter[category]=Images&search_text=bright+AND+sun*+NOT+(flower+OR+autumn+OR+winter+OR+spring)
You'll see from above that I've used parentheses in the search term. The operators word on the terms directly before and after it. So If I didn't wrap (flower+OR+autumn+OR+winter+OR+spring) in parenthesis, then instead of it acting as "get all images with bright and sun* in it, but not any with flower, autumn, winter, or spring", it would instead execute as "get all images with bright and sun* or any with autumn, winter, or spring, but none with flower". There is a big difference there, which depending on where parenthese are missed or misused, could get you either none, very few, or far too many more results than you were hoping for.
As with boolean operators, experiement a while to find the middle ground (enough valid results to be acceptable).
And to turn this into a search source, you simply do the same proceedure as mentioned in the previous section. Thus the search for summer images above would be formed as:
Confused?
The above is a lot to take in. Thankfully, DigitalNZ provides a feature called hosted searches.
http://digitalnz.co.nz/customise/create
Select the categories you want, select the keywords, select the content providers, the date range, search name, description, optional logo, and a url. Hit save and you are done. Takes all the work above and does it automatically for you. Whats more, it allows support for mutliple content_providers or website filters in one search source. The insturctions under previous heading allow for this, but it involves even more complicated logic with parentheses.
However, the hosted searches, at this time, require set keyword selection, and while search works on the web interface, the RSS feed does not yet accept the search_text parametre. Thus why this method was not used as the primary way to generate needed URLs.
In the future, if this service should support search_text on hosted search rss feeds, then it would make things much easier. Until then, you'll need to follow the instructions above.
Good news though. While this service won't work with search sources yet, you can use it to display content on ketes basket homepages feed option. See 'Homepage Options' under the baset settings on your Kete Installation.
Conclusion
That concludes this topic. If DigitalNZ should change its hosted searches functionality (which is likely considering its fairly new and in alpha) to a point that makes it usable with search sources, then I will edit this topic to take advantage of that. Till then, feel free to ask any questions in the comments or post to the DigitalNZ discussion list.
Walter McGinnis
said Pulling in results from YouTube
Here's a tip about using YouTube as an external search source; make sure to use the "api" version of the search URL.
What to use:
http://gdata.youtube.com/feeds/api/videos?v=2&q=
Notice the "feeds/api/" bit.
Here's what NOT to use:
http://gdata.youtube.com/feeds/base/videos?v=2&q=
Notice the "feeds/base/" bit.
Using the API version will allow Kete to get the thumbnails for the videos.
Tags: YouTube, External Search Sources, thumbnails