Dieselpoint Search is the ideal tool for searching collections of XML documents. It offers features not found in any XML database, SQL database, full-text search engine, or any combination of the above.
Dieselpoint Search allows a developer to add XML, index it, and retrieve it. In that sense, it functions much like a pure XML database.
The primary difference is in the retrieval mechanism. Most XML databases support XPath as a query language, and a few support XQuery. Both are designed primarily for navigating an XML tag hierarchy, and neither is designed to perform full-text search. Most XML databases will not allow the developer to perform search-engine-like full-text queries into the database.
Dieselpoint Search, by contrast, supports a rich query language which provides both sophisticated full-text retrieval and retrieval of highly-structured data elements. Consider the following query into a database of books:
(“motor cars” AND antique) OR (author:(bob) AND year <= 1990) order by price
and…
Show all categories and the number of hits for each, and
Show all publishers and the number of hits for each.
The query above will return books which contain the exact phrase “motor cars” and the word “antique”, and also books where the author field contains “bob” and the year of publication is earlier than 1990. Results will be sorted by price (we could have chosen relevance or any field in the database). The system will also show the categories and publishers that are associated with items in the result set. Results can come back in the form of an XML document, a standard JDBC result set, or as an HTML page.
- XML databases generally do not provide a good full text search mechanism, linguistic tools, or the ability to show the number of hits by attribute.
- SQL databases cannot execute the query above without taking a large performance hit. Internally, a SQL database must perform at least four queries to get the results above, and then perform a number of very expensive joins between the result sets. Internally, Dieselpoint Search executes only a single query and returns in a few milliseconds, even when gigabytes of text are involved.
- Traditional full-text or web search engines do not provide the ability to handle structured attributes (year <= 1990), or sort (sort by price). They also can't provide the list of categories or publishers and the number of hits.
Dieselpoint Search also provides a full range of linguistic tools, including a thesaurus, stemming, and a “Did you mean…” feature to alert users to possible misspellings.
For most applications, Dieselpoint Search is a much better choice than an XML database.
ECCMA is a special XML format for product data and catalogs. It is the successor to the UN/SPSC product codes. In addition to standard product names and categories, it provides standard product attributes and values.
Dieselpoint Search is ECCMA-aware. It imports ECCMA files and automatically makes all products and product attributes searchable and navigable. It also strips out unnecessary information: ECCMA files tend to be verbose, which can increase the index size and decrease the query performance unless the files are streamlined.
It makes it easy to find products using a browse or full-text interface, and to compare products using product attributes, even across categories.
Creating an ECCMA catalog is easy: it only takes a few clicks to generate a functional prototype, which can then be customized. Dieselpoint Search provides the fastest, easiest, and least-cost way to create online catalogs from ECCMA data.
The Dublin Core Metadata Initiative is an attempt to standardize the way document metadata is stored. For example, it defines specific XML tags for authors, titles and dates.
Dieselpoint Search can import Dublin Core files and make document text and metadata automatically searchable and navigable.
Building a document search and browse application over Dublin Core files is straightforward.
(http://www.adobe.com/products/xmp/main.html)
Adobe’s PDF format works well for unstructured data, but has traditionally supported only limited metadata. Adobe now supports XML files embedded right in the PDF itself to add detailed, structured information behind the scenes.
Dieselpoint Search provides full support for the new format, known as XMP, for Extensible Metadata Platform. XMP data can be searched and navigated just like any other information in a PDF. It can also be extracted for external manipulation.