[Cuis] About adding a Unicode handling porting layer

H. Hirzel hannes.hirzel at gmail.com
Tue Feb 5 09:40:33 CST 2013


P.S.
the necessity for a Unicode solution becomes visible for example with
the README.md file

https://github.com/hhzl/Cuis-WebClient/blob/master/README.md

" Germán Arduino" appears proplerly there whereas in Cuis it is
displayed as the attached screen shot shows.



On 2/5/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
> Hello all
>
> In the meantime I am investigating how to construct a small library
> which works with WideCharacters and WideStrings and the FileStream and
> UTF8Converter which deals with it.
>
> As a start I filed out String and Character and changed the names and
> class references in it to WideString and WideCharacter. I now can
> create Unicode strings in Cuis. Probably I'll simplify both
> WideCharacter and WideString in order to be able to focus more on the
> problem as such and learn how to implement it in a simple and
> straightforward way. The Unicode-Add-On library then may serve as a
> prerequisite for loading WebClient. Germán Arduino and I have to
> figure out what actually is needed.
>
> Helpful to understand how WideCharacters work was to have a look at
> the class ColorArray.
> It only have 4 methods.
>
> The subclass definition is special
>
> ArrayedCollection variableWordSubclass: #ColorArray
> 	instanceVariableNames: ''
> 	classVariableNames: ''
> 	poolDictionaries: ''
> 	category: 'Collections-Arrayed'
>
> Using
> #variableWordSubclass:
> instead of the regular
> #subClass:
>
> means that the an array of 32bit integers is made available to work with.
>
> A Color is similar to a Unicode character in the sense that an
> instance of the class Color can be completely described with an 32 bit
> integer. So internally the class ColorArray does not actually store
> instances of Color though it is made to appear so as seen from
> outside.
>
> When I want to access a color in aColorArray I do
>    aColorArray at: index
>
> and the aColorArray actually internally accesses a 32bit integer (= a
> word) and converts it to aColor by asking class Integer to do it
>
> Integer>>
>   asColorOfDepth: d
> 	"Return a color value representing the receiver as color of the given
> depth"
> 	^Color colorFromPixelValue: self depth: d
>
> Juan once wrote out that he left out Unicode because he thought it is
> 'too complicated'. Looking at the implementation in Squeak I think
> things could be done differently. It depends on what is actually
> needed. Reviewing the code is surely a good thing. At the moment I'd
> like to go for a relatively thin layer to make web application porting
> straightforward.
>
>
> Regards
> Hannes
>
>
> On 2/4/13, Casey Ransberger <casey.obrien.r at gmail.com> wrote:
>> This is cool. Good start. Someday I want to be able to have a class
>> called
>>>> :D
>>
>> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <hannes.hirzel at gmail.com>
>> wrote:
>>
>>> The attached change set prevents Cuis from silently ignoring
>>> characters which are not in ISO 8859-15.
>>>
>>> For example if you paste a text snippet which contains the letter
>>> Omega (Ω) into a TextWindow it is displayed as Ω
>>>
>>> The part which does it the other way round is not included.
>>>
>>> --Hannes
>>>
>>>
>>>
>>> On 1/22/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
>>> > Hello Germán
>>> >
>>> > On 1/22/13, Germán Arduino <garduino at gmail.com> wrote:
>>> >> Nice if you will develop the needed code!
>>> >>
>>> >> The first need I have is on the methods of Swazoo that I commented in
>>> >> other mail, but I think that is more simple, only that I don't was
>>> >> aware of the already inplace support in Cuis itself.
>>> >
>>> > Yes, that took me as well some time to find out that Cuis indeed has
>>> > some limited Unicode support.
>>> >
>>> > Juan originally wrote that Cuis had dropped Unicode support.
>>> >
>>> > When I have a look at Cuis from outside I cannot say that it is the
>>> > case as Cuis consumes and writes UFT8 text files. Unicode text
>>> > snippets pasted through the clipboard into a Cuis TextEditor also pass
>>> > in well. The only limitation is that internally it only handles the
>>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>> > And if I work in a Cuis workspace  with
>>> >
>>> >     nn asCharacter
>>> >
>>> > where nn is an Integer
>>> >
>>> >    nn must belong to ISO_8859-15
>>> >
>>> >
>>> > ISO_8859-15 is good for most European languages. If we would have an
>>> > Add-On to cater for occasional other characters of Unicode which do
>>> > not fall into the set covered by ISO_8859-15 that would make UTF8 text
>>> > file processing with Cuis safe.
>>> >
>>> >
>>> > --Hannes
>>> >
>>> >
>>> >>
>>> >> Germàn.
>>> >>
>>> >> 2013/1/22 H. Hirzel <hannes.hirzel at gmail.com>:
>>> >>> Hello Germán and Juan
>>> >>>
>>> >>> As we have seen we can say that Cuis handles Unicode to a certain
>>> >>> limited extent.
>>> >>>
>>> >>> I will post summary a writeup of what I know about it later. I am
>>> >>> interested in working/contributing to an add-on which loads Unicode
>>> >>> support into Cuis.
>>> >>>
>>> >>> For general work I need
>>> >>>
>>> >>> a)
>>> >>> an add-on so that Cuis can process arbitrary UFT8 text files.
>>> >>> However
>>> >>> the majority of the content characters will fall into the
>>> >>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>> >>> range. So it is fine if the other characters are rendered as \unnn
>>> >>> or
>>> >>> &#nnn;
>>> >>>
>>> >>> b)
>>> >>> Another more rewarding put maybe more difficult way  would be to
>>> >>> replace the String class with a class which handles 16bit characters
>>> >>> instead of 8 bit characters. In terms of structure all would remain
>>> >>> the same. Characters would be 16bit like in Java.
>>> >>>
>>> >>>
>>> >>> This will come later. At the moment I am working on ContentPack
>>> >>> version 2 which will run on Cuis, Squeak and Pharo.
>>> >>>
>>> >>> Kind regards
>>> >>>
>>> >>> --Hannes
>>> >>>
>>> >>>> 2013/1/22 Germán Arduino <garduino at gmail.com>:
>>> >>>>> Thanks for the comments Hannes / Juan:
>>> >>>>>
>>> >>>>> I will look into it when have time, or if you prefer Hannes and
>>> >>>>> want
>>> >>>>> to help I will integrate it when finish with Aida.
>>> >>>>>
>>> >>>>> Germán.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> 2013/1/21 Juan Vuletich <juan at jvuletich.org>:
>>> >>>>>> Hi Germán,
>>> >>>>>>
>>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>>> >>>>>> for
>>> >>>>>> the
>>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>> >>>>>> alphabets).
>>> >>>>>>
>>> >>>>>> Cheers,
>>> >>>>>> Juan Vuletich
>>> >>>>>>
>>> >>>>>> Germán Arduino wrote:
>>> >>>>>>>
>>> >>>>>>> Hi:
>>> >>>>>>>
>>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>>> >>>>>>> all
>>> >>>>>>> tests green are ready to install.
>>> >>>>>>>
>>> >>>>>>> The changes I did in Swazoo are:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> - Avoid Unicode support that don't exist in Cuis
>>> >>>>>>>
>>> >>> ......
>>> >>>
>>> >>> _______________________________________________
>>> >>> Cuis mailing list
>>> >>> Cuis at jvuletich.org
>>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Sincerely,
>>> >> Germán Arduino
>>> >> about.me/garduino
>>> >>
>>> >> _______________________________________________
>>> >> Cuis mailing list
>>> >> Cuis at jvuletich.org
>>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>> >>
>>> >
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> Cuis at jvuletich.org
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>
>>>
>>
>>
>> --
>> Casey Ransberger
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ScreenShotWithUTF8displayProblem.png
Type: image/png
Size: 52748 bytes
Desc: not available
URL: <http://jvuletich.org/pipermail/cuis_jvuletich.org/attachments/20130205/915f4469/attachment-0004.png>


More information about the Cuis mailing list