Have you tried Dieselpoint Search™ yet?

 

Dieselpoint News
 
Dieselpoint Announces OpenPipeline: New open source software for
  crawling, parsing, analyzing and routing documents.

Today, Dieselpoint announces the release of a new pipeline architecture as open source software. Dubbed OpenPipeline, it ties together otherwise incomplete solutions for search and document processing.

Currently, search, text analytics and connector vendors find themselves constantly reengineering their software to integrate with closed, proprietary systems. OpenPipeline works out-of-the-box with Dieselpoint Search, but can be used with any search engine.

Dieselpoint CEO, Chris Cleveland, explains: “75% of an enterprise search implementation is preprocessing data before sending it to the indexer. Unfortunately, we keep reinventing the wheel for each job. OpenPipeline provides an open framework for standardizing integration across all the components.”

Both broader and simpler than IBM’s UIMA (Unstructured Information Management Architecture), OpenPipeline provides a common architecture for connectors to data sources, file filters, text analyzers and modules to distribute documents across a network. It is fully functional out of the box and includes an installer, a job scheduler, file scanner and crawlers, doc filters, and point and click interface with drag and drop module installation.

Document processing can be centralized or parallelized as needed. The transport mechanism is simple, web-services XML over HTTP. RSS/Atom feeds are also possible.

The development philosophy behind OpenPipeline stresses simple, elegant design, and massive scalability. Minimal external dependencies and straightforward plug-in implementation ensure that the learning curve is low.

OpenPipeline Beta is available for free download via the Apache License 2.0. Current Advisory Board members include individuals from enterprise search, text analytics and connector firms and consultancies. More information can be found at OpenPipeline.org