[ Home ] [ Topic Types ] [ Master Index ]

Case Study: Enabling Low-Cost XML-Aware Searching Capable Of Complex Querying

Base Name

Base Name (unscoped)
Case Study: Enabling Low-Cost XML-Aware Searching Capable Of Complex Querying

Instance of

Occurrences

Paper

../papers/03-02-08/03-02-08.html

Date of Presentation

Wednesday, 22 May

Time of Presentation

16.45

Presentation Level

In-The-Middle

Abstract

There is a common need among XML projects to have fast, reliable, full-text searching of XML documents that can be applied to entire content repositories. For all but the most trivial cases, the solution must allow for complex querying of element content and attributes along with the ability to search for structural relationships among elements. For many use cases the ideal solution would minimize cost by using existing open source components as opposed to costly commercial alternatives. This paper describes the integration of XML-aware indexing and searching components with a full-text search engine. It details the approach used, the results, some alternatives, and important lessons learned along the way. The integrated system is designed to meet the following goals:Indexing and searching of both full document and XML-specific content: tagname, ancestor, attribute, and processing instruction searching, as well as treeloc and document id information for post-search navigation. Development using existing open source components. Support for typical search query structure (AND, OR, ...). Easy integration of custom business logic. Very fast search times. The XML-aware searching provided is capable of solving many business needs, allowing for complex XML queries, including:finding an element with specific content finding an element with attributes with specific values finding an element with a specific ancestor or ancestor list finding a particular processing instruction most combinations of the above more... The resulting system is a fast, reliable XML search engine, which has exceeded our expectations in terms of flexibility and low development cost.

Generated from an XML Topic Map with xtm2xhtml. (c) Stefan Mintert