2. Guide to Searching

Becky Bell

WALDO

Nicole C. Engard

Fixed typos, changed content where necessary and added new screenshots. 

October 2008

This brief guide will explain a chart that shows a sample of how a MARC21 database can be configured, as well as a brief introductory searching guide. The indexing fields described in this document relate to the bibliographic data and does not address authority database indexing.

2.1. Indexing and Searching Description

Koha's databases are indexed by the Zebra open-source software. The overview to the documentation describes Zebra as:

"...Zebra is a high-performance, general-purpose structured text indexing and retrieval engine. It reads records in a variety of input formats (eg. email, XML, MARC) and provides access to them through a powerful combination of Boolean search expressions and relevance-ranked free-text queries.

Zebra supports large databases (tens of millions of records, tens of gigabytes of data). It allows safe, incremental database updates on live systems. Because Zebra supports the industry-standard information retrieval protocol, Z39.50, you can search Zebra databases using an enormous variety of programs and toolkits, both commercial and free, which understands this protocol..." Zebra - User's Guide and Reference, p. 1, http://www.indexdata.dk/zebra/doc/zebra.pdf

This brief guide will explain a chart that shows a sample of how a MARC21 database can be configured, as well as a brief introductory searching guide. The indexing fields described in this document relate to the bibliographic data and does not address authority database indexing.

Note

The indexing described in this document is the set used by SouthEastern University. Your local indexing may vary.

2.2. Indexing Configuration

There are three configuration files that Koha uses while indexing.

The first configuration file (etc/zebradb/biblios/etc/bib1.att) contains the Z39.50 bib-1 attribute list, plus the Koha local use attributes for Biblio Indexes, Items Index, and Fixed Fields and other special indexes. The Z39.50 Bib-1 profile is made up of several different types of attributes: Use, Relation, Position, Structure, Truncation, and Completeness. The bib-1 'Use' attribute is represented on the chart; the other attributes are used primarily when doing searches. While there are over 150+ use attributes that could be used to define your indexing set, it's unlikely that you will choose to use them all. The attributes you elect to use are those that become the indexing rules for your database. The other five attribute sets define the various ways that a search can be further defined, and will not specifically be addressed in this document. For a complete list of the standard Bib-1 attributes, go to http://www.loc.gov/z3950/agency/defns/bib1.html.

The second file (etc/zebradb/marc_defs/[marc21|unimarc]/biblios/record.abs) contains the abstract syntax which maps the MARC21 tags to the set of Use Attributes you choose to use. The rules established in this file provides a passable Bath level 0 and 1 service, which includes author, title, subject, keyword and exact services such as standard identifiers (LCCN, ISBN, ISSN, etc.)

The third file (etc/zebradb/ccl.properties) is the Common Command Language (CCL) field mappings. This file combines the bib-1 attribute set file and the abstract file and adds the qualifiers, usually known as index names. The qualifiers, or indexes, for this database are: pn, cpn, cfn, ti, se, ut, nb, ns, sn, lcn, callnum, su, su-to, su-geo, su-ut, yr,pubdate, acqdate, ln, pl, ab, nt, rtype, mc-rtype, mus, au, su-na, kw, pb, ctype, and an.

The Koha Indexing Chart summarizes the contents of all three of these files in a more readable format. The first two columns labeled Z39.50 attribute and Z39.50 name matches the Z39.50 bib-1 attributes file. The third column labeled MARC tags indexed is where you find which MARC tags are mapped to an attribute. The fourth column labeled Qualifiers identifies the search abbreviations used in the internal CCL query. The following description provides a definition for the word 'qualifiers'.

Qualifiers are used to direct the search to a particular searchable index, such as title (ti) and author indexes (au). The CCL standard itself doesn't specify a particular set of qualifiers, but it does suggest a few shorthand notations. You can customize the CCL parser to support a particular set of qualifiers to reflect the current target profile. Traditionally, a qualifier would map to a particular use-attribute within the BIB-1attribute set. It is also possible to set other attributes, such as the structure attribute.

In the MARC tags indexed column, there are some conventions used that have specific meanings. They are:

  • A three digit tag (100) means that all subfields in the tag can be used in a search query. So, if you enter a search for 'Jackson' as an author, you will retrieve records where Jackson could be the last name or the first name.

  • A three digit tag that has a '$' followed by a letter (600$a) means that a search query will only search the 'a' subfield.

  • A three digit tag that is followed by a ':' and a letter (240:w) means that a search query can be further qualified. The letter following the ':' identifies how to conduct the search. The most common values you'll see are 'w' (word), 'p' (phrase), 's' (sort), and 'n' (numeric).

The contents of the MARC tags, subfields, and/or fixed field elements that are listed in this chart are all indexed. You'll see that every attribute line is not mapped to a specific qualifier (index)-LC card number, line 9 is one example. However, every indexed word (a string of characters preceded and succeeded by a space) can be searched using a keyword (kw) search. So, although an LC card number specific index doesn't exist, you can still search by the LCCN since tag 010 is assigned to the LC-card-number attribute. To verify this, enter 72180055 in the persistent search box. You should retrieve The gods themselves, by Isaac Asimov.

Examples of fixed field elements indexing can be seen on the chart between Attribute 8822 and Attribute 8703. These attributes are most commonly used for limiting. The fixed field attributes currently represent the BK codes. Other format codes, if needed, could be defined.

2.3. Basic Searching

The search box that library staff and library patrons will see most often is the persistent search box at the top of the page. Koha interprets the searches as keyword searches.

To start a search, you enter a word or multiple words in the search box. When a single word is entered, a keyword search is performed. You can check this out by typing one word into the form and note the number of results located. Then, repeat the search with a minor change. In front of the search word, type 'kw=' followed by the same search term. The results will be identical.

When you have more than one word in the search box, Koha will still do a keyword search, but a bit differently. Each word will be searched on its own, then the Boolean connector 'and' will narrow your search to those items with all words contained in matching records.

Suppose you want to find material about how libraries are using mashups. You'll select the major words and enter them into the persistent search box.

The response to this search is:

The order of the words does not affect the retrieval results, so you could also enter the search as "mashups library". The response to this search is results

Too many words in the search box will find very few matches, as the following example illustrates:

2.4. Advanced Searching

When you can't find the most appropriate material with a general search, you can move to the Advanced Search page by clicking on the Search option on the persistent toolbar.

The Advanced Search page offers many ways to limit the results of your search. You can search using the Boolean operators AND, OR, and NOT; limit by item type; limit by year and language; limit by subtypes audience, content, format, or additional content types; by location and by availability.

The first limiting section on the Advanced Search page provides a quick and simple way to use the Boolean operators in your search. Note that this display depends on a system preference setting. This option can be found on the Administration > System Preferences > Searching page. The option called expandedSearchOption must be set to 'show' to see the following display.

In this section you can choose among the many indexes by clicking on the arrow in the first box. The blank box that follows is where you enter your first search term or terms. On the second line, you can choose the Boolean operator you want to use in your search. The options are 'and', 'or', and 'not'. Then, you would again choose the index to search, followed by the second term or terms. If you have more concepts you want to include in your search, you can click the [+] to add another line for your search.

A sample search is shown next, followed by its results:

When you use the Boolean operators to broaden or narrow a search, remember the action of each operator. The 'and' operator narrows the results you'll retrieve because the search will retrieve the records that include all your search terms. The 'or' operator expands the results because the search will look for occurrences of all of your search terms. The 'not' operator excludes records with the term that follows the operator.

Note: If you leave this expandedSearchOption set to 'don't show', this is the display you will see:

The Advanced Search page then shows the multiple kinds of limits that can be applied to your search results. Either check a box or select from the drop down menus to narrow your search. You will type the year, year range, or a 'greater than (>)' or 'less than (<)' year.

Finally, you can choose how the results will be sorted. The pre-defined sort options are in the final area of the Advanced Search screen.

The default sort is by relevance, although you can choose to sort by author, by title, by call number, by dates, or by popularity. If you would prefer a different default sort, you can set defaultSortField to one of the other choices in Administration > System Preferences > Searching.

2.5. Common Command Language Searching

Koha uses the Common Command Language (CCL) (ISO 8777) as its internal search protocol. Searches initiated in the graphical interface use this protocol as well, although the searcher doesn't know which indexes, operators and limiters are available and being used to conduct their search. The searcher can use the Advanced Search when more precise results set are desired and where the search indexes are somewhat known. However, some library users and many library staff prefer using a command based structure. This part of the document will present and explain the use of the Koha command based structure. The indexes, operators, and limiters used are identical to those used in the graphical interface.

2.5.1. Indexes

The CCL standard itself doesn't specify a particular set of qualifiers (indexes), but it does suggest a few short-hand notations such as 'ti', 'au', and 'su'. Koha has a default set of indexes; it's possible to customize that set by adding needed indexes based on local requirements. A qualifier (index) maps to a particular use-attribute within the Z39.50 BIB-1 attribute set. The complete Z39.50 Bib-1 Attribute can be viewed at http://www.loc.gov/z3950/agency/defns/bib1.html.

The standard Koha set of indexes is a fairly common example of MARC21 indexing rules. The indexes that are defined in Koha are indexes typically used by other integrated library systems. The defined Z39.50 Bib-1 Attribute mapped to the indexes include:

Table 11.1. Attributes

Bib-1 AttributeQualifier (index)
Personal-namepn
Corporate-namecpn
Conference-namecfn
Titleti
Title-seriesse
Title-uniformut
ISBNnb
ISSNns
Local numbersn
Local-classificationlcn and callnum
Subjectsu, su-to, su-geo, su-ut
Pubdateyr,pubdate
Date-of-Acquisitionacqdate
Languageln
Place-of-publicationpl
Abstractab
Notesnt
Record-typertype, mc-rtype, mus
Authorau, aut
Subject-person-namesu-na
Any (keyword)kw
Publisherpb
Content-typectype
Koha-Auth-Numberan
Author-personal-bibliographyaub
Author-in-orderauo

Refer to the Koha Indexing Chart for the MARC21 tags mapped to each Bib-1 Attribute and index combination.

2.5.1.1. Audience Examples
  • aud:a Easy

  • aud:cc Juvenile

  • aud:d Young adult

  • aud:e Adult

2.5.1.2. Contents Examples
  • fic:1 Fiction

  • fic:0 Non Fiction

  • bio:b Biography

  • mus:j Musical recording

  • mus:I Non musical record

2.5.2. Search Syntax

In the persistent search box, single words generally retrieve large sets. To narrow a search, you can use multiple words. Koha automatically uses the 'and' Boolean operator to create a set of records matching your input. When you want to narrow the search to an author or a title or a subject or some other specific field or use a Boolean operator, there isn't an obvious way to accomplish that specificity. The library user can, of course, go to the Advanced Search page; however, if you know how to construct a CCL search, you can achieve more specificity while using the persistent search box on any page.

There is a specific order to the CCL search syntax. Although it can be used for simple searches, it is an especially effective way to perform complex searches, as it affords you a great deal of control over your search results. To construct a CCL search, first enter a desired index code, then an equal sign, followed by your search word(s). Following are examples of simple CCL searches.

  • ti=principles of accounting

  • au=brown joseph

  • su=poetry

  • su-na=Shakespeare

  • kw=marlin

You can refine your search by combining search terms with Boolean operators 'and', 'or', or 'not'. Following are examples of searches using Boolean operators.

  • ti=principles of accounting and au=brown joseph

  • su=poetry not su-na=Shakespeare

  • kw=communication and su=debate

You can also choose to search for things that start with a character or series of characters

  • ti,first-in-subfield=C (will show you all titles that start with the letter 'C')

Other string location searches can be performed with the following keywords:

  • rtrn : right truncation

  • ltrn : left truncation

  • lrtrn : left and right truncation

  • st-date : type date

  • st-numeric : type number (integer)

  • ext : exact search on whole subfield (does not work with icu)

  • phr : search on expression anywhere in the subfield

  • startswithnt : subfield starts with

Using specific indexes and Boolean operators are not the only way a search can be refined. You can also refine your search as a phrase when looking for a title, author, or subject. The syntax for this search is index,phr=search words.

To illustrate the results of various search types, a search was done for the words 'supreme court'. The results illustrate that the search index and the word order make a difference in search results. Only the results count and the search itself is in these examples. The search executed will always be between the single quotes.

You can also choose to use limiters in your search query. Some common limiters include dates, languages, record types, and item types. In the Advance Search, you can either click a box or key in data to limit your search. You can also apply the same limits with CCL by using the syntax in the following examples. In all

By Date: su=supreme court and yr,st-numeric=>2000

When you limit by date, you can use the '>' (greater than), '<' (less than), '=' (equal), or 'yyyy-yyyy' (range) symbols.

By Item Type: su=nursing and itype:BK

Each library will have a different set of item types defined in their circulation configuration. When you set up item types, you define a code and a name for each one. The name will appear on the Advance Search page. The code you assigned is used as a CCL search limit, formatted as 'itype:x', where 'x' is the assigned code. The initial set of item types in Koha will usually be edited to reflect your collections, so your item type limiters may be different than the initial ones. The initial item type limiters follow.

  • itype:BKS Books, Booklets, Workbooks

  • itype:SR Audio Cassettes, CDs

  • itype:IR Binders

  • itype:CF CD-ROMs, DVD-ROMs, General Online Resources

  • itype:VR DVDs, VHS

  • itype:KT Kit

  • itype:AR Models

  • itype:SER Serials

By format: su=supreme court not l-format:sr

The format limiters are derived from a combination of LDR, 006 and 007 positions. The formats that are currently defined are the following.

  • l-format:ta Regular print

  • l-format:tb Large print

  • l-format:fk Braille

  • l-format:sd CD audio

  • l-format:ss Cassette recording

  • l-format:vf VHS tape

  • l-format:vd DVD video

  • l-format:co CD software

  • l-format:cr Website

By content type: su=supreme court not ctype:l

The content types are taken from the 008 MARC tag, positions 24-27.

There are two other limiter types that are not described in this document. They are: Audience and Content. The only difference in the syntax of the CCL is the actual limiter. They are reproduced here just in case you would like to use these limiters.