[Cuis] [Pharo-dev] Unicode Support

Juan Vuletich juan at jvuletich.org
Thu Dec 10 07:45:25 CST 2015


On 12/7/2015 7:53 AM, EuanM wrote:
> ...
> Juan: "Cuis: Chose not to use Squeak approach. Chose to make the base
> image include and use only 1-byte strings. Chose to use ISO-8859-15"
>
> I have double-checked - each character encoded in ISO Latin 15 (ISO
> 8859-15) is exactly the character represented by the corresponding
> 1-byte codepoint in Unicode 0000 to 00FF,
>
> with the following exceptions:
>
> codepoint 20ac - Euro symbol
> character code a4 (replaces codepoint 00a4 generic currency symbol)
>
> codepoint 0160 Latin Upper Case S with Caron
> character code a6  (replaces codepoint 00A6 was | Unix pipe character)
>
> codepoint 0161 Latin Lower Case s with Caron
> character code a8 (replaces codepoint 00A8 was dierisis)
>
> codepoint 017d Latin Upper Case Z with Caron
> character code b4 (replaces codepoint 00b4 was Acute accent)
>
> codepoint 017e Latin Lower Case Z with Caron
> character code b8 (replaces codepoint 00b8 was cedilla)
>
> codepoint 0152 Upper Case OE ligature = Ethel
> character code bc (replaces codepoint 00bc was 1/4 symbol)
>
> codepoint 0153 Lower Case oe ligature = ethel
> character code bd (replaces codepoint 00bd was 1/2 symbol)
>
> codepoint 0178 Upper Case Y diaeresis
> character code be (replaces codepoint 00be was 3/4 symbol)
>
> Juan - I don't suppose we could persuade you to change to ISO  Latin-1
> from ISO Latin-9 ?
>
> It means we could run the same 1 byte string encoding across  Cuis,
> Squeak, Pharo, and, as far as I can make out so far, Dolphin Smalltalk
> and Gnu Smalltalk.
>
> The downside would be that French Y diaeresis would lose the ability
> to use that character, along with users of oe, OE, and s, S, z, Z with
> caron.  Along with the Euro.
>
> https://en.wikipedia.org/wiki/ISO/IEC_8859-15.

Yes, you might persuade me of anything. More specifically, you need to 
convince me that the upsides outweigh  the downsides.
The downside would be losing the Euro sign and a couple of glyphs used 
in French.
I don't see the upside. Sharing stuff between different Smalltalks 
should (most likely) be done in UTF-8. Cuis can handle any UTF-8, using 
NCR to represent codepoints outside the set that is directly supported 
(be it ISO8859-15 or ISO8859-1). So, how switching to ISO8859-1 would help?

Cheers,
Juan Vuletich




More information about the Cuis mailing list