small strigi api changes

Jos van den Oever jvdoever at gmail.com
Sat Aug 4 22:37:46 BST 2007


Hi all,

The Strigi analysis pipeline lacked an important feature: the ability
to analyze only the first part of a stream. This is required for
applications that want quick 'n dirty extraction of metadata. Believe
it or not, but for some applications, speed is more important than
calculating the checksum of a file or extracting all email addresses.

So I've taking the liberty of applying to the code which, like ice,
became fluid again. Don't worry, the changes are small.

AnalyzerConfiguration got an extra function:
    virtual int64_t maximalStreamReadLength(const Strigi::AnalysisResult& ar) {
        return -1;
    }
which returns the maximal length a (particular) stream may be read.



For the analyzers StreamSaxAnalyzer, StreamLineAnalyzer, and
StreamEventAnalyzer the function 'endAnalysis' got an argument:
    void endAnalysis(bool complete);
which is true if the stream was read until the end.
This means that the analyzer can avoid reporting properties like
'linecount' that have no meaning if the file was not read completely.




More information about the kde-core-devel mailing list