[Cuis] Sorting Unicode strings (Re: [Unicode] collation sequences (Re: [squeak-dev] Unicode Support))
Dale Henrichs
dale.henrichs at gemtalksystems.com
Mon Dec 7 14:43:52 CST 2015
Hannes,
For GemStone, we are using the ICU library[1]. We have Unicode7,
Unicode16 and Unicode32 classes (subclasses of CharacterCollection) for
internal Strings and the class Utf8 (a subclass of ByteArray) for Utf8
encoded strings ...
The ICU library provides the primitive implementations for working with
the Unicode* and Utf8 classes
When we started considering Unicode support, we looked at what it would
take to support collation - our main reason for looking at Unicode in
the first place) -- and we saw just how complicated the collation rules
can be[2], we were glad to see that someone had already done the hard
work[1]...
Reconciling our legacy String implementations (String, DoubleByteString,
and QuadByteString) with the Unicode* classes was also interesting,
because the rules for Unicode equality and our legacy equality
implementation were not quite compatible.
If you are interested in more information, I can share additional
details ...
Dale
[1] http://site.icu-project.org/
[2] http://unicode.org/reports/tr10/
On 12/07/2015 11:54 AM, H. Hirzel wrote:
> Hello
>
> According to http://www.unicode.org/cldr/charts/27/collation/de.html the German
> phonebook sort order is
>
> a A ä Ä ą̈ Ą̈ ǟ Ǟ ạ̈ Ạ̈ ḁ̈ Ḁ̈ b B c C d D e E f F g G h H i I j J k K
> l L m M n N o O ö Ö ǫ̈ Ǫ̈ ȫ Ȫ ơ̈ Ơ̈ ợ̈ Ợ̈ ọ̈ Ọ̈ p P q Q r R s S ss ß t
> T u U ü Ü ǘ Ǘ ǜ Ǜ ǚ Ǚ ų̈ Ų̈ ǖ Ǖ ư̈ Ư̈ ự̈ Ự̈ ụ̈ Ụ̈ ṳ̈ Ṳ̈ ṷ̈ Ṷ̈ ṵ̈ Ṵ̈ v
> V w W x X y Y z Z
>
> I wonder why it looks like this. A lot of characters which never
> appear in a German text.
>
>
> For Spanish there is 'traditional' and 'standard'
>
> http://www.unicode.org/cldr/charts/27/collation/es.html
>
> standard a A á Á b B c C d D e E é É f F g G h H i I í Í j J k K l L m
> M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s S t T u U ú Ú
> ü Ü v V w W x X y Y z Z
>
> traditional a A á Á b B c C ch Ch CH cĥ Cĥ CĤ cȟ Cȟ CȞ cḧ Cḧ CḦ cḣ Cḣ
> CḢ cḩ Cḩ CḨ cḥ Cḥ CḤ cḫ Cḫ CḪ cẖ Cẖ d D e E é É f F g G h H i I í Í j
> J k K l L ll Ll LL lĺ Lĺ LĹ lľ Lľ LĽ lļ Lļ LĻ lḷ Lḷ LḶ lḹ Lḹ LḸ lḽ Lḽ
> LḼ lḻ Lḻ LḺ m M n N ñ Ñ ņ̃ Ņ̃ ṇ̃ Ṇ̃ ṋ̃ Ṋ̃ ṉ̃ Ṉ̃ o O ó Ó p P q Q r R s
> S t T u U ú Ú ü Ü v V w W x X y Y z Z
>
> And French is not easily found
> http://www.unicode.org/cldr/charts/27/collation/index.html
> or seems to be defined elsewhere
>
> http://unicode.org/repos/cldr/tags/release-27/common/collation/fr.xml
>
> Suggestions and hints are welcome
>
> --Hannes
> _______________________________________________
> Cuis mailing list
> Cuis at jvuletich.org
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
More information about the Cuis
mailing list