Online Catalog Design Models: Are We Moving in the Right Direction?

Charles R. Hildreth, Ph.D.

5. BROWSING AND EXPLORING: A NEW PARADIGM
FOR IR/OPAC SYSTEM DESIGN

A recent line of criticism directed at both conventional and probabilistic retrieval models has to do with their lack of expressive power in representing the wide variety of actual retrieval situations, search aims and behaviors. The probabilistic model is thought to be more expressive than the traditional model because it recognizes, explains, and incorporates the element of uncertainty intrinsic to the retrieval process. But that process is assumed by both models to be an essentially query-based process. Neither model adequately takes into account information seeking that is not query-based or centered, for example, many kinds of browsing and exploratory searching. It will not do to extend the meaning of "query" to include all information seeking behavior, then say that the model represents them all. This would simply blur important distinctions found in actual information seeking behavior. The probabilistic model has helped us better understand and cope with a common retrieval situation and a related set of search aims and activities. With its assumptions about unchanging information needs and the query-centeredness of all information seeking, the probabilistic model is not able to represent the majority of actual retrieval situations and information seeking activities of people.

We now understand that people do not think in terms of formal, boolean queries. Rather, they pick and choose as they go, and the outcome of this activity may be only a redefinition of the original information need. Modern interactive systems can support this kind of non-linear, trial and error thinking process. The growing interest in hypertext retrieval systems (witness the wide popularity of the World Wide Web) indicates this is an attractive approach to users.

Current research and theory views the information seeker as an active problem solver, with an evolving information need that may be searched iteratively. It matters little whether this set of behaviors is referred to as browsing, exploration, berrypicking or non-linear searching, as long as we understand that browsing is not one but many kinds of activities, any one of which may be observed in actual searching behavior. All information retrieval systems including online catalogs support some form of browsing. In traditional, query-oriented systems, browsing plays a subordinate, supporting role in assisting with the formulation or modification of a query that is to be matched exactly or partially with document representations. This probably explains why some people view browsing as a secondary activity, and not real searching. Some forms of browsing are quite different than this and may serve as the primary information seeking method used by most people in real-life searching situations. In light of this, some researchers have suggested that a browsing paradigm for searching replace the query-matching paradigm in the design of information retrieval systems. Before commenting further on this point of view, it will be well to examine more closely the concept and types of browsing.

5.1. The Concept of Browsing

A browse is an edible in the eyes of a young animal. It may be a tender twig, leaf or shoot of a plant that is fit and easy to eat. These delicacies must be sought for and are the object of selective review, that is, browsing. Browsing takes place in a patch of interest and is characterized as tentative nibbling, at least at the start. Human browsing activity has many connotations. In the context of information seeking and library use activities, probably the most visible and commonly understood browsing activity is the behavior of roaming among the shelves of a library or bookstore to scan materials of potential interest or utility. Books and other materials are casually perused in order to decide what we want to buy or borrow, if anything at all. Librarians have long recognized that users who come into the library enjoy browsing among the shelves, and thus they make special efforts to display groups of related books of potential interest in noticeable, easy to browse ways. Research studies of library users confirm this experience and show further that many library browsers prefer to browse the organized materials on the shelves than search and browse in the library catalog (Hyman, 1971, 1982; Hancock-Beaulieu, 1989a).

From our ordinary experiences, we recognize that both the focus of our browsing interests and the strength of our motivation to discover relevant items vary from time to time. When browsing we may employ a variety of techniques ranging from the casual and undirected to the planned and systematic. As Marchionini explains, "These techniques are dependent on the object sought, individual searcher characteristics, the purpose of the search, and the setting and context for conducting the search. The objective of browsing may be well-defined (e.g., a particular antique chair to match a desk), or ill-defined (e.g., an interesting wall hanging for a favorite room)" (Marchionini, 1987). In the latter category, I prefer the example of a tourist on the last day of an island holiday searching about for a souvenir suitable as a memento of the trip.

In his discussion of types of browsing, Apted labels this activity. "general purposive browsing." He describes this activity in the following way:

Contrasted with the aimless, haphazard scanning of publications in a physician's waiting room, browsing is frequently a purposeful activity occasioned by a felt information need or interest. The need may be ill-defined but nonetheless very real. Oddy reminds us that, "It is important to try to come to grips with the problem of serving a library user who is not able to formulate a precise query, and yet will recognize what he has been looking for when he sees it. A man, left to his own devices among the bookshelves, accomplishes searches of this sort by browsing" (Oddy, 1977). Aided by various structural and navigation devices provided by the library, he can well be expected to browse even more efficiently and effectively. A mix of science and art and good fortune might be involved in all successful browsing searches. Cove and Walsh have described browsing as "the art of not knowing what one wants until one finds it" (Cove & Walsh, 1988).

Past studies of browsing as a library use activity have assumed that browsing in the stacks, the direct shelf approach to searching, is the essential form of browsing to locate items of interest or need. In his study of how faculty learned about books they borrowed from the Georgia Tech University Library, Green's "discovery" categories were 1) references in a publication, 2) browsing in the library, 3) from a colleague, 4) from the library catalogs, 5) from memory, and 6) from some other source (Greene, 1977). His findings indicated that browsing in the stacks was the most used method of finding out about new books. Hancock-Beaulieu's research confirms that users have shown a preference for the direct shelf approach over use of the library's catalog and other bibliographic tools (Hancock-Beaulieu, 1989a). She warns us, however, that, "The behavioural aspect of browsing as part of the information seeking activity is far from understood." Her review of shelf browsing studies reveal that users handle a limited number of items from the shelves, and select only a small number. Hancock points out that the searcher who browses only at the shelves may miss other related items scattered elsewhere in the collection. In large university libraries, these related items may be located in different buildings or on different campuses. "Shelf consultation seems to produce not only low recall but also low precision." For these reasons, believes Hancock-Beaulieu, shelf browsing should not be considered as an alternative to catalog use. Browsing support at the catalog can lead to improved searching at the shelves by providing direction and linkage clues.

In his attempt to apply the theories of successful submarine search operations developed during wartime, Morse (1973) explains the behavior of the library browser in this manner:

Morse's early work may be the first attempt to apply the probability model of information theory to shelf browsing.

Lancaster (1968) describes another form of browsing found in the behavior patterns of individuals who conduct literature searches in a variety of bibliographic tools and retrieval systems:

This sort of "personal searching" is what is now popularly known as direct, "end-user" searching, and is contrasted by Lancaster with the searching patterns of trained search intermediaries. Recent studies have shown that much of Lancaster's characterization holds, but end-users of online catalogs seldom guess and consult the "most likely subject headings."

Fox and Palay offer this definition of browsing: "Access to related information is the essence of browsing" (Fox & Palay, 1980). They encourage retrieval system designers to provide systems that allow quick and easy access to related records for the inexperienced or untrained user. Purely random browsing in unorganized, unfamiliar territory makes little sense, and is probably beyond the bounds of normal browsing of any sort. The authors describe the process of browsing as "a heuristic search in a well-connected space of records." They state that browsing in bibliographic retrieval systems should consist of the following iterative, five-step process:

Fox and Palay believe these steps are better supported in a well-designed online retrieval system than in traditional manual systems.

5.2. Types of Browsing

A review of the literature on browsing revealed several attempts to delineate and classify types or forms of browsing behavior. These can be reduced to three broad categories, using the different but corresponding labels of the researchers:

Category 1                     Category 2                            Category 3

undirected browsing        semi-directed browsing          directed browsing  (Herner, 1970)

general browsing             general purposive browsing    specific browsing   (Apted, 1971)

serendipity browsing        general purpose browsing      search browsing    (Cove & Walsh, 1988)

General or serendipity browsing is largely random, unstructured, and undirected activity. The browser may be just passing time, looking over items near to hand while occupied with another activity or aim. Apted includes in this category the perusal of documents in the desire to find anything of interest for informal, recreational reading.

In general purpose browsing, the browser does not know in advance where relevant information may turn up, but selects and scans a specific publication or set of publications on a regular basis in the hope of improving his chance of success, or to insure that nothing of likely interest goes unnoticed. Herner adds that this type of browsing is usually guided by habit, and that the personal scanning of a specific document type follows a predictable pattern.

The search activities described by Lancaster would probably be characterized as "specific browsing" by Apted. Herner calls these activities "directed browsing." The searcher has a specific end in mind, but does not approach the catalog with a well-formulated search strategy. The activity is initially directed toward that end and proceeds in a structured manner. The searcher is deliberate in purpose, but specifically assumes a state of mind that is open to clues and suggestions. The searcher expects guidance from the bibliographic tool he chooses to use and often follows the clues and pointers to items or areas of information relevant to his interests or needs.

Browsing can thus be viewed as a family of information seeking activities. As Herner (1970) concludes, browsing is not one but many things:

5.3. Browsing Aids

This classification of browsing activities is useful because it invites us to expand our traditional, pre-examined understanding of browsing. Browsing may be more or less planned and directed, or proceed from an information need or interest that is more or less well-defined at the start. In addition, browsing may be carried out in a variety of information media, packages, and bibliographic tools, both manual and online. Many of these media and tools have been systematically designed and structured to facilitate browsing. They employ structural, semantic, and navigational aids for this purpose. The library itself can be such a tool if its collection of materials is stored and maintained in any than a random manner. When direct access to the shelves is permitted, the arrangement of books on the shelves according to a subject scheme or some other classification (e.g., author, genre) facilitates browsing by library users.

A book or periodical journal is typically organized and structured to promote browsing. Such devices as the tables of contents, indexes, prefaces or introductions, and lists of references both encourage and enhance browsing. Whatever the user's level, specificity, or area of interest, such devices permit the easy and convenient gathering and perusal of information needed to make preliminary decisions about the relevance or potential usefulness of the documents.

Various forms of library catalogs, and indexing and abstracting publications or services, manual or online, incorporate devices and features that permit browsing of one kind or another. These sources utilize structure, recognition, and navigation devices to assist and guide the user looking about for items of interest or pointers to such items. Browsing is essentially visual and depends more on recognition than on recall or a priori formulations of need. A good browsing tool, source, or system, exploits the human ability to recognize items of interest, a cognitive ability that is faster and easier than juggling concepts to specify a need and describing relevant items in advance (Card, Moran and Newell, 1983).

As Liebscher and Marchionini (1988) remind us, filters are also useful devices for browsing. In large information systems they may be an absolute necessity. Recognition is easier than recall or query formulation, but many, many items of potential interest may be presented to the browser in large systems. Thus greater effort may be required to filter out truly useful items from the large set of items discovered during browsing. "The effectiveness of such a browsing strategy depends largely on the system's ability to facilitate the searcher's filtering activity during browsing" (Liebscher & Marchionini, 1988).

5.4. Searching or Browsing?

Marchionini (1987) discusses three primary reasons why people browse:

Searchers often have difficulty defining and expressing their information needs. The database structure and vocabulary requirements of the search system may be unknown to the searcher. For such searchers, looking is more inviting then formulating. Browsing is inherently active and engaging, and many users seem to prefer action and encounter to reflection and analysis. It could be said that good browsing systems and sources attract such users, but there are not enough good online browsing systems in operation to justify this claim at this time. However, Hancock-Beaulieu's research provides some evidence that the "tool tailors the task" (1989a). Expanded access points and search options in the online catalog probably account for the variety of subject search strategies used by searchers in this medium as compared to the card catalog.

Reflection on the reasons and circumstances in which people browse should yield a new understanding of the importance of this activity. These insights should inform the design of information retrieval systems and lead to improved browsing capabilities in these systems. In the past browsing has often been viewed as a secondary or supplemental search strategy or technique to primary, query-oriented, directed, structured searching. Bates suggests that there may still be a "lingering tendency in information science to see browsing in contrast to directed searching, to see it as a casual, don't-know-what-I-want behavior that one engages in separately from 'regular' searching" (Bates, 1989).

Searching by browsing is a natural, preferred searching technique for many people, especially when they are engaged in "general purposive" information seeking. Ellis' research on the information seeking behavior of social scientists shows that various forms of browsing are a standard component of their research and "keeping aware" activities. He recommends that browsing of a variety of types of information that supplement the standard bibliographic record be provided in online retrieval systems (Ellis, 1989). Liebscher and Marchionini's research has demonstrated that browsing can be as effective in its results as structured, query-oriented Boolean searching, for novice searchers of full-text documents. Marchionini argues that because of the massive amounts of poorly organized information available in electronic form, browsing is even more important in electronic environments than in traditional environments like those presented by open-access libraries (Marchionini, 1987).

Designers of information retrieval systems and online catalogs must expand their knowledge of the browsing requirements of searchers, and provide capabilities and search options in their systems that will support these requirements. Most IR systems support some aspects of browsing, but still implement the paradigm of direct, query-matching retrieval. Browsing also provides a suitable paradigm for information system design, and, perhaps, an even more representative one, given the many varieties of information needs and searching behavior.

5.5. Exporatory Design Models: The Rationale

As stated earlier, most operational OPACs and retrieval systems are based on a design model that is not fully descriptive of much actual information seeking activity. This query-based model assumes the user comes to the system with a known, specified or specifiable information need. A second assumption of this design model is that for any given query there exists a single best output set that should be targeted for retrieval. Bates has called this "The Fallacy of the Perfect Thirty-Item Online Search." (1984) This known-subject, query-oriented, perfect thirty-item output set search paradigm represents, at most, a minority of subject searches conducted, especially now that "end-users" are the primary users of our online catalogs and other automated retrieval systems.

Bates (1989) thinks the classic, traditional model of information retrieval should no longer occupy center-stage in our thinking about retrieval problems, and improving retrieval performance and system design: "It represents some searches, but not all, perhaps not even the majority, and that with respect to those it does represent, it frequently does so inadequately." As an alternative Bates proposes the "berrypicking" model of searching, a model she states is much closer to the actual behavior of information searchers than the "classic" model.

The strength and soundness of Bates' interpretation derive from the fact that she bases it on a consideration of a variety of information seeking activities in which different sources, tools, and retrieval systems, both manual and automated, are used. Close examination of "real life searches" and the literature on the information seeking behavior of scholars, scientists, and other end-users leads to the realization that a frequent, common pattern of searching can be characterized by the berrypicking/evolving search model she describes:

Users employ a number of strategies and techniques when searching in this way, among them various kinds of browsing, in both manual and online sources, as well as at the bookshelves. In fact, a searcher may choose to conduct a formal query/best match search in an online bibliographic database as one step in the evolving, berrypicking process: "It is part of the nature of berrypicking that people adapt the strategy to the particular need at the moment; as the need shifts in part or whole, the strategy often shifts as well - at least for effective searchers" (Bates, 1989). In an evolving/berrypicking search, the search techniques typically change throughout the search (and may or may not include traditional information retrieval techniques), and the sources consulted may include many different in form or content.

Bates concludes her case for an alternative design model with these words:

Peters (1991) points out that in most information retrieval research, the basic unit of a search or a "search session" is defined very narrowly for purposes of measurement. The search unit may range from a single query input to the system and the system's output, to the period from the time the user approaches the system's search entry device to the moment the user departs this device. End users view the situation quite differently:

A model having the expressive power to represent this variety of strategies, techniques, and sources will better guide our thinking about desirable design features for information retrieval systems. In her eloquent plea for alternative design models that reflect the demonstrated need for a variety of searching capabilities to match various information-seeking needs and users' search objectives, Pejtersen states that "design [of future online catalogs] is to be based on the concept of an adequate 'resource envelope' around the search space instead of support of a particular normative procedure for retrieval interactions" (1992).

With the growing interest in hypertext and browsing information systems, this interest fueled in part by an array of new graphical and direct manipulation user interface technologies, a small group of researchers has been describing how these richly interactive, exploratory systems should function, while others have been testing prototypes and building the theoretical and empirical foundations of such information systems.

Ellis (1989) investigated the information seeking activities of a group of social scientists, because, as he states, "there appeared to be no study which attempted to systematically derive recommendations for information retrieval system design from analysis of actual information-seeking behaviour." From an analysis of the information-seeking patterns of these social scientists manifest in their use of a variety of information resources, Ellis derives a behavioral model that he believes can "underpin recommendations for information retrieval system design."

Ellis (1989) identifies six major categories of characteristic patterns of information-seeking behavior:

  1. Starting: initial search activities
  2. Chaining: following citation chains or other referential connections
  3. Browsing: semi-directed searching in an area of potential interest
  4. Differentiating: using differences between sources as a quality filter
  5. Monitoring: current awareness monitoring of selected sources
  6. Extracting: systematically reviewing a source to identify information

Each of these behavioral patterns ("features of the model") are then discussed in detail by Ellis to derive, as he says, "a set of general recommendations for information retrieval system design and to consider the issues involved in implementing the features of the behavioural model on an experimental system." (1989) Ellis suggests that a hypertext retrieval system would be needed to support the wide variety of searching activities carried-out by scholars and scientists.

As we have seen, Bates believes her inductively-arrived-at berrypicking model can serve to inform our thinking about key design features for IR systems (1989). She identifies six search strategies or techniques that information seekers commonly employ in a variety of sources, and, for each of them, suggests a specific equivalent or comparable capability that should be implemented in an online berrypicking search interface. Emphasizing the need for variety and flexibility at the search interface, Bates makes a plea for the adoption of new techniques that do not reflect narrow, rigid assumptions about users' search aims and search styles.

One danger lurking in the behavioral approach to IR system design is the very real possibility that one or more of the methods customarily used by actual information seekers may be inefficient or only partially effective. Other methods in use may simply have become outdated. The card catalog is an example of the latter case. In recent years, online catalog design has broken free of the limits of that outdated model. Actual bookshelf browsing in libraries is a common information-seeking strategy, and it is a strategy that holds the potential for great improvement in the online environment. Reasoned analysis must follow-on the collection of behavioral data, "to assess which aspects of user behaviour could provide feedback to the future design of online interactive catalogues" (Hancock-Beaulieu, 1989).

Marchionini (1987) presents a "framework" for the design of browsable information systems which is based on his research with novice users of electronic encyclopedias. Marchionini reports that these users generally performed in a satisfactory manner, but the research revealed difficulties encountered by the searchers, including disorientation, distraction, and cognitive overload. He outlines the requirements for improved electronic browser systems, "systems that invite and guide browsing," consistent with his research findings.

Marchionini suggests the design framework for browsable information systems includes these five interdependent factors:

STRUCTURAL                                 FUNCTIONAL

- information units (nodes)                    - display
- relational links between nodes            - navigation; - help/learning

Marchionini notes that electronic documents provide few physical cues when displayed, and that the user can benefit from specialized structural or organizational schemes suited to this medium. The author seems to envisage a network-type database of linked nodes, but does not draw a specific representation of a particular organizational scheme. He recommends that the system support both 'coarse' (i.e., broad) information unit nodes and 'finer' nodes to enable the user to focus in on specific information or concepts. A useful suggestion is that of node 'filters'. Broad topics that are linked to large amounts of information could be trimmed and focused through the use of topic-class filters. Marchionini suggests that users could apply such filters optionally to focus their browsing in wide information or document spaces.

Links express the relationships between information unit nodes. Links have both conceptual and physical aspects. They define a relationship between information nodes, and they also impose a physical order on the nodes in the database and thereby enable users to actually navigate among the nodes. If the links have been "activated", the user may traverse them at will, seeking to discover new or related items of interest. The nature of the relationship between nodes (e.g., "is cited by," "is on the related topic") is typically expressed in a particular kind of linkage. It is a challenge for designers to present the meanings of these links in a way easy for a user to grasp and exploit.

Designers are especially challenged to uniquely represent each of a variety of pre-established conceptual linkages and present each to the user as an intuitively meaningful pathway that might be navigated as the user's need, problem, or interest dictates. Marchionini (1987) recommends that navigation for browsing capabilities include support for moving backward and forward (or up and down) between coarse and fine nodes (e.g., broader and narrower term relationships in a thesaurus), and an ample amount of displayed prompts and feedback so that, "In any given state, the user must know what is possible as a next move and what she/he did to arrive in the current state."

Before the popularity of the World Wide Web expanded interest in hypertext retrieval, Noerr and Bivins-Noerr (1985) addressed the challenge of providing new search and flexible browsing capabilities in OPACs at the level of database design. They describe an unconventional "entity-relational" database design for information retrieval systems that supports new forms of exploratory searching and navigation among related and linked items in the information database.

Browsing or scanning ordered lists of informational items or records is not new, of course, but the Noerr database design (known commercially as "Tinman" or "Information Navigator") permits the easy typing and sequencing of any number of such lists as deemed necessary by the database designer to support varied and highly targeted browse searching in lists of items. In addition, TINMAN, through its data modeling and mapping support features, permits the designer to define desired relationships between information entities or nodes in the database and to establish links between these entities or nodes that enable the user to navigate between items and from node to node in the database in search of related information. Tinman has its conceptual roots in Hjerppe's "HyperCat." (1986)

This unique database design has been described as "entity-relational" and "multi-linked." At a minimum, any database design describes a way of identifying its constituent elements (e.g., files, records, key fields, etc.) and how they are organized for one or more data management purposes like storage, updating or access. The Noerr database design (actually a meta-design facility, since a database "owner" may design a specific database and how it is to be accessed and navigated) provides powerful data modeling and search definition facilities. In the multi-linked database, each record represents an information entity which may have associated attributes or properties. These items, entity descriptions and their properties are contained in the fields of the record. Thus records contain data fields as identified and typed by the designer, but also contain link-to fields. Records of the same type or different types are linked by placing the prime key of the "linked-to" record in a linking field. This underlying linking mechanism supports navigation from record to record and is as "flexible" as the specific database designer wants it to be and builds links to support.

The application designer, consider the online catalog, for example, has great freedom and flexibility in modeling the catalog database, from defining the basic types of records, to defining search path access, browsing lists, and the links between records and record types that support exploratory navigation at the user interface.

At the user interface, this database design allows two non-traditional ways of searching: "tree" searching, and linked-item exploration. In addition to the browse scanning of alphabetical or otherwise ordered lists of item keys ( e.g., title keywords, names, subject descriptors, class marks, etc.) not uncommon in today's online catalogs, it is possible to provide the user with pre-structured, multi-level, "tree" searches that require only the same simple search mechanics as scrolling/scanning lists and selecting of the desired item from a displayed list. For example, a browse tree search could be pre-defined by this structure and sequence: subject; language of publication; title. This would permit the user to select a subject entry from a displayed list, and then select the language of the titles indexed by that subject descriptor to be retrieved for display and assessment/selection. The number and depth of tree searches that can be provided is limited only by the number of unique record sets that have been identified by the designer at the initial data modeling stage. Any available data from these records can be pre-coordinated to produce the tree to be searched, and any number of trees can be created, all converging on the same citations if this is desired.

The tree searching made possible by the Noerr database system is in reality pre-coordinated searching that requires only browse-select operations on the part of the user. As Noerr and Bivins-Noerr explain, "The levels of search allow the refinement of producing a final result set without explicit combining, as the combination is performed by the tree structure. Thus, a single method "browse-select' enables extremely complex combined searches, so long as they are pre-defined" (1985). As these pre-structured levels are traversed by the user, a filtering or sifting takes place, narrowing the outcome possibilities, and at each level a more focused context is set for subsequent search selections. While the catalog designer has great flexibility in selecting and structuring these search trees, once they are built-in and made operational they impose restrictions on the search options and search paths available to the user. More research is needed to ascertain the optimal number of either search trees or levels of search trees for a given database access system such as an online library catalog.

In the pre-structured tree search user actions consist of a series of browse-select operations, each select being limited to the set of items created by a select at the former level. Freedom from this structure is made possible by the Noerr database system's multi-linking capability. Any number of pre-defined links between records in the database can be established to allow exploratory navigation along these linked pathways from record-to-record by the user.

Navigational links between record types established at the time of the design of a specific database access application provide flexible and optional search paths for the user. Used together with browse-select searching each "link to" or navigation path selected by the user will lead to a new set of records to browse or select from or to use as the "jumping-off" link to another set of records (see screen display sequence illustrated in Figure 7 below). This allows long paths to be traversed, and, if the design includes many linked-to record types, provides the user great flexibility in deciding the "related-record" search paths to pursue. It should be noted that the use of the navigation feature is not tied to or dependent on the browse-select method of searching the database. Regardless of how the searcher arrives at a record that has been linked to others, including the retrieval of that record by the use of Boolean techniques, the navigation option - moving on directly from that record to related records - may then be chosen by the searcher. (See Figure 6 and Figure 7)

Figure 6 and 7. Hypertext Navigation in an Experimental OPAC

The mechanics of browsing and navigating in this manner are extremely simple and easy to learn and to use. They require only list-scrolling, pointing, and selecting devices and skills on the part of the user. However, as with all "hypertext" database applications, "Systems like this need specially structured databases and a degree of forethought on the part of the designers of the catalogues" (Noerr and Bivins-Noerr, 1985). This is something of an understatement because great care must be exercised during the planning and modeling stages to ensure that the needed and desired search, browse, and navigation functionality is available to a variety of searchers who will bring different search needs and tasks to the retrieval system.

In his paper, "Information retrieval by browsing," Cox (1992) describes a prototype browsing system under development. Cox initially defines his browse system in terms of what it is not, namely, a conventional query-formulation/evaluation, matching-operation IR system. Rather, "In this paper, traditional queries are never formulated in the machine and it is suggested they are unnecessary for efficient information retrieval." Instead, states Cox,

According to Cox, the user best searches through browsing, recognition and discovery, rather than by a formal process of explicit query formulation, entry and modification. He recognizes that this search approach requires careful structuring of the database and requires the user to understand this structure to browse effectively. Cox does not entirely rule out a role for traditional query-document matching techniques, but states that, "the mechanism for the system to formulate and evaluate queries is a peripheral activity to the browsing and user discovery. It can be done as a way to support such activities." Such techniques as Boolean searches and query expansion methods are not excluded from the interface, but, under the user's control, would play a supporting role to the "main structure of the interface."

Searching by recognition and discovery in a well-structured space is only the first stage of browsing in Cox's system. When an item of interest has been recognized, then the user should be presented an array of similarity operations that might be performed by the system to find those items similar or closely related to the item already found. Viewing information retrieval primarily as a gathering function, Cox points out that the user only has to know what each type of similarity means to the system that effects it, and then to choose one appropriate to his need at the moment. The user's ability to choose from among several similarity measures, clustering or classing criteria that may be used in the gathering function is key to Cox's approach: "Any system would implement appropriate similarity measures according to the user's browsing needs."

Cox believes it is easier for users to learn types of relatedness or similarity between items, many of which can be carried across databases, than to learn a complex query logic, syntax and language for each information system or database encountered. Searching is more readily accomplished through browsing, scanning, recognizing, selecting (items of interest), followed by choosing and invoking a similarity operation that will identify and retrieve additional related items for display and further assessment. Providing users several similarity or related-item gathering operations from which to choose would appear to be something novel in end-user system design. Recent research that shows different search strategies or techniques may perform comparably in terms of recall while retrieving different sets of documents would seem to lend some support to this design approach (Croft, 1981 and 1987).

5.6. Discussion: "As We Often Seek"

In light of this research, I think we must recognize that information retrieval theory and methods developed over the past twenty years will play an important but limited role in advancing online catalog development into the third generation of systems. There are at least two reasons for this. First, the extended-Boolean, vector-space, probabilistic, and fuzzy-set retrieval models tested in artificial laboratory conditions (generally excluding the human variable) have led to significant but not large improvements in system performance (using recall and precision measures). Secondly, the "search intermediary" information retrieval paradigm assumed in most IR research, even when embellished with relevance feedback methods, may be the wrong model for representing the information seeking and search process situation and behavior of both scholars and general library and information system users.

The information system requirements of online catalog and IR system end-users differ considerably from those of trained search intermediaries. The behavior of trained search specialists is product/output-oriented. Their aim usually is to produce a high quality list of citations or other references for the end user. The "quality" of the output product is measured by such variables as recall, precision, and search efficiency (minimizing the costs to the user of the retrieval process). The search topic or need is usually well-understood (a "known-subject") and well-expressed in advance of the search, and the search is typically processed in a highly-structured, subject-specific domain.

We find ourselves in disagreement with IR researchers like Keen who believe that retrieval systems designed for professional search intermediaries should serve as the model for the design of end-user systems: "A successful design for a Ranked system for professional skilled searchers should logically precede attempts to design end-user systems because only then can the true performance limits be understood and the devices and tactics appropriate to a Ranked system be adequately identified and tested, and simplified for end-users." (Keen, 1994) As we have seen with the poor performance of implicit Boolean OPACs, a simplified design based on an inappropriate model is still a misguided design. Behind Keen's statement is the assumption that the search aims, search behavior, and the information needs of end-users are the same as those of the professional search intermediaries. It is precisely this assumption that we are calling into question. To Keen's credit he recommends that, "A preliminary step should be to look at human manual feedback and query reformulation, to see whether, even in those conditions, an initial search foray can be improved upon." By "manual feedback," Keen means interactive feedback from the user at the search terminal. We recommend that researchers and designers step back a good deal further and observe a much broader vista of human information seeking behavior.

The overall search situation of end-users is fundamentally different than that narrow domain in which the intermediary typically operates. End-users of OPACs, an expanding heterogeneous community, have a variety of information-seeking needs and behaviors. Furthermore, the document collections available to them for online searching are multi-disciplinary in coverage and, at present, are poorly structured and indexed. Evidence indicates that the process of searching and discovery is more central to end-user searching objectives and satisfaction than the delivery of any pre-defined product. Most end users are not going after a specific "known item", nor do they have a well-defined output product in mind at the outset of their interaction with the online system. Scholars may wish to branch out into new disciplines or unfamiliar approaches to a problem. Typically, end users wish to discover materials on a topic of interest, and they seldom have or wish to present a precise expression of that interest. Both the expression of the topical interest and the interest itself may change dynamically during the search and browsing activity.

At the beginning of the age of mechanized storage and retrieval, Vannevar Bush (1945) criticized the prevailing linear, rule-constrained retrieval ("selection") paradigm described by the conventional model of retrieval:

As a solution, Bush proposed his legendary Memex personal storage and retrieval system, perhaps the first hypertext system envisioned.

Unlike the linear, highly structured, logical search strategy approach pursued by the efficient search intermediary (human or machine), much end-user searching can best be described as exploratory, circuitous, and, yes, fully interactive. The search process is likely to be largely a trial-and-error process, having no particular pre-determined end or outcome. In trial-and-error exploratory searching, both the experience of the search process and its initial results can lead the user to new or altered information needs, or to "illogically relevant" information that may be more valuable to the searcher than logically relevant materials (Harter, 1984).

Conventional IR systems and OPACs provide little flexibility in searching or browsing at will in the information databases they contain. Those that do generally exact a large cost in learning time and effort from the user. But as a look back to earlier information systems should suggest, characteristic factors such as structure, organization and other reading or access devices are not the limiting, restricting factors. There are few options available at the conventional computer system search interface because the designers have put few there. Lack of interface flexibility and search options is imposed neither by today's technology nor by highly-structured databases. In fact, organization, structure, pre-established linkages between information entities or nodes and navigational search methods are the means by which system designers can provide retrieval systems that offer users not only a variety of search options but also increased flexibility in the way they may wish to browse and move about the information source. Hypertext is one way of characterizing this increased, "non-linear" flexibility.

Searchers often have difficulty defining and expressing their information needs. The database structure and vocabulary requirements of the search system may be unknown to the searcher. For such searchers, looking is more inviting then formulating. Browsing is inherently active and engaging, and many users seem to prefer action and encounter to reflection and analysis. It could be said that richly interactive exploratory systems and sources attract such users. This may explain the popularity of the World Wide Web and its "browser" interfaces.

Reflection on the reasons and circumstances in which people browse should yield a new understanding of the importance of this activity. These insights should inform the design of information retrieval systems and lead to improved browsing capabilities in these systems. In the past browsing has often been viewed as a secondary or supplemental search strategy or technique to primary, query-oriented, directed, structured searching. Bates suggests that there may still be a "lingering tendency in information science to see browsing in contrast to directed searching, to see it as a casual, don't-know-what-I-want behavior that one engages in separately from 'regular' searching" (Bates, 1989).

Searching by browsing is a natural, preferred searching technique for many people, especially when they are engaged in "general purposive" information seeking. Ellis' research on the information seeking behavior of social scientists shows that various forms of browsing are a standard component of their research and "keeping aware" activities. He recommends that browsing of a variety of types of information that supplement the standard bibliographic record be provided in online retrieval systems (Ellis, 1989). Liebscher and Marchionini's (1988) research has demonstrated that browsing can be as effective in its results as structured, query-oriented Boolean searching, for novice searchers of full-text documents. Marchionini argues that because of the massive amounts of poorly organized information available in electronic form, browsing is even more important in electronic environments than in traditional environments like those presented by open-access libraries (Marchionini, 1987).

There are a variety of information seeking needs, aims, and strategies that would seem to require searching by semi-directed exploration, recognition, and discovery, rather than searching by explicit query formulation-matching operations, whether aided or not by relevance feedback, query expansion techniques. Thus, it seems self-evident that users would greatly benefit from the development of computer-based information systems that support and encourage searching and exploration of electronic information resources via browsing or "berrypicking."

Designers of information retrieval systems and online catalogs should become more receptive to the browsing requirements of searchers, and begin to provide capabilities and search options in their systems that will support these requirements. Most IR systems support some aspects of browsing, but still implement the paradigm of direct, query-matching retrieval. Exploratory browsing also provides a suitable paradigm for information system design, and, perhaps, an even more representative one, given the many varieties of information needs and searching behavior.