how Windows browsers encode URL [Re: why the % cruft?]
Vadim Plessky
lucy-ples at mtu-net.ru
Tue Jul 9 17:37:52 BST 2002
On Tuesday 09 July 2002 11:15 am, Waldo Bastian wrote:
| On Tuesday 09 July 2002 12:01 am, Lars Knoll wrote:
| > > URLs are spec'ed as a sequence of octets (8-bit values) "Unicode URLs"
| > > basically don't exist. Despite that we try to handle them anyway and
| > > appearantly that doesn't always work. (E.g. we need to convert unicode
| > > to an 8 bit sequence before we can tranfer it to the website but the
| > > encoding to use for that is unspecified, so we can only guess.)
| >
| > As Dirk already pointed out, IE sends URLS in utf8 by default. I'm
| > pretty sure we could do the same without breaking a lot of web pages
| > (they'd be broken with IE aswell). Maybe there's an HTTP header field we
| > can set to indicate this?
|
| My impression was that many non-latin1 (e.g. russian, japanese, korean,
| etc.) websites use the "local locale" as encoding and not utf8. Maybe Vadim
| can comment on that from the Russian point of view.
I did the same experiment (search for ‘пример’ ) with several Windows browsers
I have.
Opera 6 / Windows
----------------
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=%3F%3F%3F%3F%3F%3F&btnG=Google+Search
--> here Opera fails exactly in a same way as Konqueror
http://www.google.com.ru/search?q=%EF%F0%E8%EC%E5%F0&ie=windows-1251&hl=ru&btnG=%CF%EE%E8%F1%EA+%E2+Google
Netscape 6/Win
----------------
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&btnG=Google+Search
http://www.google.com.ru/search?q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&ie=UTF-8&oe=UTF-8&hl=ru&btnG=%D0%9F%D0%BE%D0%B8%D1%81%D0%BA+%D0%B2+Google
http://www.yandex.ru/yandsearch?text=%EF%F0%E8%EC%E5%F0
(URL encoded in windows 1251)
http://search.rambler.ru/cgi-bin/rambler_search?words=%EF%F0%E8%EC%E5%F0&where=1
(URL encoded in windows 1251)
MS IE6
----------------
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&btnG=Google+Search
http://www.google.com.ru/search?q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&ie=UTF-8&oe=UTF-8&hl=ru&btnG=%D0%9F%D0%BE%D0%B8%D1%81%D0%BA+%D0%B2+Google
And, finally I extracted some words from the mail I have in Chineese:
老蟹
and searched Google for it using Mozilla:
Results were quite good, 721 matches (don't ask me what those words mean!...)
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=%E8%80%81%E8%9F%B9&btnG=Google+Search
Again, UTF8.
So, it seems it's rather safe to encode URL to UTF8, as it's common pratice
and acepted not only by MS IE, but by Mozilla aswell.
|
| Cheers,
| Waldo
My Best Regards,
--
Vadim Plessky
http://kde2.newmail.ru (English)
33 Window Decorations and 6 Widget Styles for KDE
http://kde2.newmail.ru/kde_themes.html
KDE mini-Themes
http://kde2.newmail.ru/themes/
More information about the kfm-devel
mailing list