how Windows browsers encode URL [Re: why the % cruft?]

Vadim Plessky lucy-ples at mtu-net.ru
Tue Jul 9 17:37:52 BST 2002


On Tuesday 09 July 2002 11:15 am, Waldo Bastian wrote:
|  On Tuesday 09 July 2002 12:01 am, Lars Knoll wrote:
|  > > URLs are spec'ed as a sequence of octets (8-bit values) "Unicode URLs"
|  > > basically don't exist. Despite that we try to handle them anyway and
|  > > appearantly that doesn't always work. (E.g. we need to convert unicode
|  > > to an 8 bit sequence before we can tranfer it to the website but the
|  > > encoding to use for that is unspecified, so we can only guess.)
|  >
|  > As Dirk already pointed out, IE sends URLS in utf8 by default. I'm
|  > pretty sure we could do the same without breaking a lot of web pages
|  > (they'd be broken with IE aswell). Maybe there's an HTTP header field we
|  > can set to indicate this?
|
|  My impression was that many non-latin1 (e.g. russian, japanese, korean,
| etc.) websites use the "local locale" as encoding and not utf8. Maybe Vadim
| can comment on that from the Russian point of view.

I did the same experiment (search for ‘пример’ ) with several Windows browsers 
I have.

Opera 6 / Windows
----------------
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=%3F%3F%3F%3F%3F%3F&btnG=Google+Search 
  --> here Opera fails exactly in a same way as Konqueror
http://www.google.com.ru/search?q=%EF%F0%E8%EC%E5%F0&ie=windows-1251&hl=ru&btnG=%CF%EE%E8%F1%EA+%E2+Google 
 
Netscape 6/Win 
---------------- 
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&btnG=Google+Search 
http://www.google.com.ru/search?q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&ie=UTF-8&oe=UTF-8&hl=ru&btnG=%D0%9F%D0%BE%D0%B8%D1%81%D0%BA+%D0%B2+Google 
 
http://www.yandex.ru/yandsearch?text=%EF%F0%E8%EC%E5%F0 
(URL encoded in windows 1251) 
http://search.rambler.ru/cgi-bin/rambler_search?words=%EF%F0%E8%EC%E5%F0&where=1 
(URL encoded in windows 1251)

MS IE6
----------------
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&btnG=Google+Search
http://www.google.com.ru/search?q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&ie=UTF-8&oe=UTF-8&hl=ru&btnG=%D0%9F%D0%BE%D0%B8%D1%81%D0%BA+%D0%B2+Google


And, finally I extracted some words from the mail I have in Chineese:
老蟹

and searched Google for it using Mozilla:
Results were quite good, 721 matches (don't ask me what those words mean!...) 
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=%E8%80%81%E8%9F%B9&btnG=Google+Search
Again, UTF8.
So, it seems it's rather safe to encode URL to UTF8, as it's common pratice 
and acepted not only by MS IE, but by Mozilla aswell.

|
|  Cheers,
|  Waldo

My Best Regards,
-- 

Vadim Plessky
http://kde2.newmail.ru  (English)
33 Window Decorations and 6 Widget Styles for KDE
http://kde2.newmail.ru/kde_themes.html
KDE mini-Themes
http://kde2.newmail.ru/themes/





More information about the kfm-devel mailing list