Hi, some comments about encoding detection (KEncodingDetector)

Wang Hoi zealot.hoi at gmail.com
Mon Aug 4 09:47:49 BST 2008


a modified patch, introduce KEncodingProber  (KEncodingDetector2 is a bad name)
with a clean and more powerful interface:
class KDECORE_EXPORT KEncodingProber
{
public:
    enum ProberState {
        FoundIt, // sure
        NotMe,   // sure not
        Probing  //initial State or not sure
    };
    enum ProberType {
        Universal,
        Arabic,
        ..........
        Unicode,
        WesternEuropean
    };
    KEncodingProber(ProberType proberType=Universal);
    ~KEncodingProber();

    void reset();
    ProberState feed(const QByteArray &data);
    ProberState feed(const char* data, int len);
    ProberState getState() const;
    const char* getEncoding() const;
    float getConfidence() const;          //  0.0 ~ 0.99
private:
    KEncodingProberPrivate* const d;
};

user can feed data to it continously, until ProberState change from
Probing to FoundIt or NotMe, when ProberState==Probing, user can also
call getConfidence() etc.. to get the most confident encoding it
guessed from feeded data.
it's used to *guess* the encoding of raw text, not able to get the
encoding directly from Html/Xml tags ( such as <?xml encoding="xxx" ?>
).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: encodingDetection.patch.tar.bz2
Type: application/x-bzip2
Size: 126607 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20080804/7507b737/attachment.bin>


More information about the kde-core-devel mailing list