[Cuis] About adding a Unicode handling porting layer

H. Hirzel hannes.hirzel at gmail.com
Tue Feb 5 09:47:48 CST 2013


I forgot to mention the explanation

The README.md as used by github file is encoded in UFT8
http://en.wikipedia.org/wiki/UTF8 wheras the Cuis File List browser
assumes that text files are encoded in ISO8859-15.

http://en.wikipedia.org/wiki/ISO/IEC_8859-15

This actually calls for a preference to tell Cuis how to interpret text files

- UTF8 or
- ISO8859-15


On 2/5/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
> P.S.
> the necessity for a Unicode solution becomes visible for example with
> the README.md file
>
> https://github.com/hhzl/Cuis-WebClient/blob/master/README.md
>
> " Germán Arduino" appears proplerly there whereas in Cuis it is
> displayed as the attached screen shot shows.
>
>
>
> On 2/5/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
>> Hello all
>>
>> In the meantime I am investigating how to construct a small library
>> which works with WideCharacters and WideStrings and the FileStream and
>> UTF8Converter which deals with it.
>>
>> As a start I filed out String and Character and changed the names and
>> class references in it to WideString and WideCharacter. I now can
>> create Unicode strings in Cuis. Probably I'll simplify both
>> WideCharacter and WideString in order to be able to focus more on the
>> problem as such and learn how to implement it in a simple and
>> straightforward way. The Unicode-Add-On library then may serve as a
>> prerequisite for loading WebClient. Germán Arduino and I have to
>> figure out what actually is needed.
>>
>> Helpful to understand how WideCharacters work was to have a look at
>> the class ColorArray.
>> It only have 4 methods.
>>
>> The subclass definition is special
>>
>> ArrayedCollection variableWordSubclass: #ColorArray
>> 	instanceVariableNames: ''
>> 	classVariableNames: ''
>> 	poolDictionaries: ''
>> 	category: 'Collections-Arrayed'
>>
>> Using
>> #variableWordSubclass:
>> instead of the regular
>> #subClass:
>>
>> means that the an array of 32bit integers is made available to work with.
>>
>> A Color is similar to a Unicode character in the sense that an
>> instance of the class Color can be completely described with an 32 bit
>> integer. So internally the class ColorArray does not actually store
>> instances of Color though it is made to appear so as seen from
>> outside.
>>
>> When I want to access a color in aColorArray I do
>>    aColorArray at: index
>>
>> and the aColorArray actually internally accesses a 32bit integer (= a
>> word) and converts it to aColor by asking class Integer to do it
>>
>> Integer>>
>>   asColorOfDepth: d
>> 	"Return a color value representing the receiver as color of the given
>> depth"
>> 	^Color colorFromPixelValue: self depth: d
>>
>> Juan once wrote out that he left out Unicode because he thought it is
>> 'too complicated'. Looking at the implementation in Squeak I think
>> things could be done differently. It depends on what is actually
>> needed. Reviewing the code is surely a good thing. At the moment I'd
>> like to go for a relatively thin layer to make web application porting
>> straightforward.
>>
>>
>> Regards
>> Hannes
>>
>>
>> On 2/4/13, Casey Ransberger <casey.obrien.r at gmail.com> wrote:
>>> This is cool. Good start. Someday I want to be able to have a class
>>> called
>>>>>> :D
>>>
>>> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <hannes.hirzel at gmail.com>
>>> wrote:
>>>
>>>> The attached change set prevents Cuis from silently ignoring
>>>> characters which are not in ISO 8859-15.
>>>>
>>>> For example if you paste a text snippet which contains the letter
>>>> Omega (Ω) into a TextWindow it is displayed as Ω
>>>>
>>>> The part which does it the other way round is not included.
>>>>
>>>> --Hannes
>>>>
>>>>
>>>>
>>>> On 1/22/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
>>>> > Hello Germán
>>>> >
>>>> > On 1/22/13, Germán Arduino <garduino at gmail.com> wrote:
>>>> >> Nice if you will develop the needed code!
>>>> >>
>>>> >> The first need I have is on the methods of Swazoo that I commented
>>>> >> in
>>>> >> other mail, but I think that is more simple, only that I don't was
>>>> >> aware of the already inplace support in Cuis itself.
>>>> >
>>>> > Yes, that took me as well some time to find out that Cuis indeed has
>>>> > some limited Unicode support.
>>>> >
>>>> > Juan originally wrote that Cuis had dropped Unicode support.
>>>> >
>>>> > When I have a look at Cuis from outside I cannot say that it is the
>>>> > case as Cuis consumes and writes UFT8 text files. Unicode text
>>>> > snippets pasted through the clipboard into a Cuis TextEditor also
>>>> > pass
>>>> > in well. The only limitation is that internally it only handles the
>>>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>>> > And if I work in a Cuis workspace  with
>>>> >
>>>> >     nn asCharacter
>>>> >
>>>> > where nn is an Integer
>>>> >
>>>> >    nn must belong to ISO_8859-15
>>>> >
>>>> >
>>>> > ISO_8859-15 is good for most European languages. If we would have an
>>>> > Add-On to cater for occasional other characters of Unicode which do
>>>> > not fall into the set covered by ISO_8859-15 that would make UTF8
>>>> > text
>>>> > file processing with Cuis safe.
>>>> >
>>>> >
>>>> > --Hannes
>>>> >
>>>> >
>>>> >>
>>>> >> Germàn.
>>>> >>
>>>> >> 2013/1/22 H. Hirzel <hannes.hirzel at gmail.com>:
>>>> >>> Hello Germán and Juan
>>>> >>>
>>>> >>> As we have seen we can say that Cuis handles Unicode to a certain
>>>> >>> limited extent.
>>>> >>>
>>>> >>> I will post summary a writeup of what I know about it later. I am
>>>> >>> interested in working/contributing to an add-on which loads Unicode
>>>> >>> support into Cuis.
>>>> >>>
>>>> >>> For general work I need
>>>> >>>
>>>> >>> a)
>>>> >>> an add-on so that Cuis can process arbitrary UFT8 text files.
>>>> >>> However
>>>> >>> the majority of the content characters will fall into the
>>>> >>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>> >>> range. So it is fine if the other characters are rendered as \unnn
>>>> >>> or
>>>> >>> &#nnn;
>>>> >>>
>>>> >>> b)
>>>> >>> Another more rewarding put maybe more difficult way  would be to
>>>> >>> replace the String class with a class which handles 16bit
>>>> >>> characters
>>>> >>> instead of 8 bit characters. In terms of structure all would remain
>>>> >>> the same. Characters would be 16bit like in Java.
>>>> >>>
>>>> >>>
>>>> >>> This will come later. At the moment I am working on ContentPack
>>>> >>> version 2 which will run on Cuis, Squeak and Pharo.
>>>> >>>
>>>> >>> Kind regards
>>>> >>>
>>>> >>> --Hannes
>>>> >>>
>>>> >>>> 2013/1/22 Germán Arduino <garduino at gmail.com>:
>>>> >>>>> Thanks for the comments Hannes / Juan:
>>>> >>>>>
>>>> >>>>> I will look into it when have time, or if you prefer Hannes and
>>>> >>>>> want
>>>> >>>>> to help I will integrate it when finish with Aida.
>>>> >>>>>
>>>> >>>>> Germán.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> 2013/1/21 Juan Vuletich <juan at jvuletich.org>:
>>>> >>>>>> Hi Germán,
>>>> >>>>>>
>>>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>>>> >>>>>> for
>>>> >>>>>> the
>>>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>> >>>>>> alphabets).
>>>> >>>>>>
>>>> >>>>>> Cheers,
>>>> >>>>>> Juan Vuletich
>>>> >>>>>>
>>>> >>>>>> Germán Arduino wrote:
>>>> >>>>>>>
>>>> >>>>>>> Hi:
>>>> >>>>>>>
>>>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>>>> >>>>>>> all
>>>> >>>>>>> tests green are ready to install.
>>>> >>>>>>>
>>>> >>>>>>> The changes I did in Swazoo are:
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>> >>>>>>>
>>>> >>> ......
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> Cuis mailing list
>>>> >>> Cuis at jvuletich.org
>>>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Sincerely,
>>>> >> Germán Arduino
>>>> >> about.me/garduino
>>>> >>
>>>> >> _______________________________________________
>>>> >> Cuis mailing list
>>>> >> Cuis at jvuletich.org
>>>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>> >>
>>>> >
>>>>
>>>> _______________________________________________
>>>> Cuis mailing list
>>>> Cuis at jvuletich.org
>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Casey Ransberger
>>>
>>
>




More information about the Cuis mailing list