Patch v3: Fix KEncodingProber crashiness

Sat Mar 21 12:52:42 GMT 2009

On Samstag 21 März 2009, Jeff Mitchell wrote:
> Yes, these are some issues which I didn't look at as I was just focused
> on the crash and the encoding variable.  :-)
> 
> Peter Oberndorfer wrote:
> > 4)
> > currentConfidence is set when unicodeTest detects something.
> > But the value is never used.
> > So confidence 0.0 is returned i think?
> 
> No, the confidence returned is the confidence of the prober itself.
> currentConfidence itself is, as you surmised, never used.  It's removed
> in the attached v3 patch.
> 

But what if the current prober has not detected anything?
Then the current confidece in the prober is still 0.0

The intention of the code is IMO when a unicode BOM is found
a confidence of 0.99 is returned.

At least i think it did this until 845343

For example katebuffer.cpp feeds in some data and then checks for confidence > 0.5
without ever looking at the found it state.
Maybe this breaks?

> > 5)
> > I think the prober is leaked on setProberType()?
> 
> Yes, it is.  Fixed in the attached v3 patch.
> 

I think a part of this patch is already applied now 942108 ?

> > 6)
> > uselesss #define MINIMUM_THRESHOLD ?
> 
> Yep.  Also fixed.
> 
> --Jeff
> 
> Index: kencodingprober.cpp
> ===================================================================
> --- kencodingprober.cpp (revision 942070)
> +++ kencodingprober.cpp (working copy)
> @@ -34,15 +34,12 @@
>  
>  #include <string.h>
>  
> -#define MINIMUM_THRESHOLD (float)0.2
> -
>  class KEncodingProberPrivate
>  {
>  public:
> -    KEncodingProberPrivate(): encoding(strdup("")), prober(NULL), mStart(true) {};
> +    KEncodingProberPrivate(): prober(NULL), mStart(true) {};
>      ~KEncodingProberPrivate()
>      {
> -        delete encoding;
>          delete prober;
>      }
>      void setProberType(KEncodingProber::ProberType pType)
> @@ -53,6 +50,10 @@
>          * for single-byte encodings (most western encodings), nsSBCSGroupProber is ok,
>          *   because encoding state machine can detect many such encodings.
>          */ 
> +
> +        if (prober)
> +            delete prober;
> +

Why is the if needed?
it is legal to delete a 0 pointer.
The destructor above does this already.

>          switch (proberType) {
>              case KEncodingProber::None:
>                  prober = NULL;
> @@ -130,17 +131,13 @@
>                          break;
>              }  // switch
>  
> -            if (encoding && strlen(encoding))
> -            {
> +            if (!encoding.isEmpty())
>                  proberState = KEncodingProber::FoundIt;
> -                currentConfidence = 0.99f;
> -            }
>          }
>      }
>      KEncodingProber::ProberType proberType;
>      KEncodingProber::ProberState proberState;
> -    float currentConfidence;
> -    const char *encoding;
> +    QString encoding;

Why is this a QString?
it is only ever assigned static char strings.
and then it is only tested once with isEmpty but never used anywhere else?
I think it should somehow be returned in encodingNameByteArray
when unicodeTest() found something.

>      kencodingprober::nsCharSetProber *prober;
>      bool mStart;
>  };
> @@ -201,12 +198,18 @@
>      return d->proberState;
>  }
>  
> +//DEPRECATED, do *not* use
>  const char* KEncodingProber::encodingName() const
>  {
> +    return 0;

Shouldn't we return a strdup'ed text here?
It leaks memory but still works.

something like (warning untested!):
  return strdup(encodingNameByteArray().constData());

or at least fix katebuffer.cpp to not call the old function?

> +}
> +
> +QByteArray KEncodingProber::encodingNameByteArray() const
> +{
>      if (!d->prober)
> -        return strdup("UTF-8");
> +        return QByteArray("UTF-8");
>  
> -    return strdup(d->prober->GetCharSetName());
> +    return QByteArray(d->prober->GetCharSetName());
>  }

Greetings Peter