[Cuis] About adding a Unicode handling porting layer

H. Hirzel hannes.hirzel at gmail.com
Tue Feb 5 03:20:24 CST 2013


Hello all

In the meantime I am investigating how to construct a small library
which works with WideCharacters and WideStrings and the FileStream and
UTF8Converter which deals with it.

As a start I filed out String and Character and changed the names and
class references in it to WideString and WideCharacter. I now can
create Unicode strings in Cuis. Probably I'll simplify both
WideCharacter and WideString in order to be able to focus more on the
problem as such and learn how to implement it in a simple and
straightforward way. The Unicode-Add-On library then may serve as a
prerequisite for loading WebClient. Germán Arduino and I have to
figure out what actually is needed.

Helpful to understand how WideCharacters work was to have a look at
the class ColorArray.
It only have 4 methods.

The subclass definition is special

ArrayedCollection variableWordSubclass: #ColorArray
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'Collections-Arrayed'

Using
#variableWordSubclass:
instead of the regular
#subClass:

means that the an array of 32bit integers is made available to work with.

A Color is similar to a Unicode character in the sense that an
instance of the class Color can be completely described with an 32 bit
integer. So internally the class ColorArray does not actually store
instances of Color though it is made to appear so as seen from
outside.

When I want to access a color in aColorArray I do
   aColorArray at: index

and the aColorArray actually internally accesses a 32bit integer (= a
word) and converts it to aColor by asking class Integer to do it

Integer>>
  asColorOfDepth: d
	"Return a color value representing the receiver as color of the given depth"
	^Color colorFromPixelValue: self depth: d

Juan once wrote out that he left out Unicode because he thought it is
'too complicated'. Looking at the implementation in Squeak I think
things could be done differently. It depends on what is actually
needed. Reviewing the code is surely a good thing. At the moment I'd
like to go for a relatively thin layer to make web application porting
straightforward.


Regards
Hannes


On 2/4/13, Casey Ransberger <casey.obrien.r at gmail.com> wrote:
> This is cool. Good start. Someday I want to be able to have a class called
>> :D
>
> On Tue, Jan 22, 2013 at 8:23 AM, H. Hirzel <hannes.hirzel at gmail.com> wrote:
>
>> The attached change set prevents Cuis from silently ignoring
>> characters which are not in ISO 8859-15.
>>
>> For example if you paste a text snippet which contains the letter
>> Omega (Ω) into a TextWindow it is displayed as Ω
>>
>> The part which does it the other way round is not included.
>>
>> --Hannes
>>
>>
>>
>> On 1/22/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
>> > Hello Germán
>> >
>> > On 1/22/13, Germán Arduino <garduino at gmail.com> wrote:
>> >> Nice if you will develop the needed code!
>> >>
>> >> The first need I have is on the methods of Swazoo that I commented in
>> >> other mail, but I think that is more simple, only that I don't was
>> >> aware of the already inplace support in Cuis itself.
>> >
>> > Yes, that took me as well some time to find out that Cuis indeed has
>> > some limited Unicode support.
>> >
>> > Juan originally wrote that Cuis had dropped Unicode support.
>> >
>> > When I have a look at Cuis from outside I cannot say that it is the
>> > case as Cuis consumes and writes UFT8 text files. Unicode text
>> > snippets pasted through the clipboard into a Cuis TextEditor also pass
>> > in well. The only limitation is that internally it only handles the
>> > code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>> > And if I work in a Cuis workspace  with
>> >
>> >     nn asCharacter
>> >
>> > where nn is an Integer
>> >
>> >    nn must belong to ISO_8859-15
>> >
>> >
>> > ISO_8859-15 is good for most European languages. If we would have an
>> > Add-On to cater for occasional other characters of Unicode which do
>> > not fall into the set covered by ISO_8859-15 that would make UTF8 text
>> > file processing with Cuis safe.
>> >
>> >
>> > --Hannes
>> >
>> >
>> >>
>> >> Germàn.
>> >>
>> >> 2013/1/22 H. Hirzel <hannes.hirzel at gmail.com>:
>> >>> Hello Germán and Juan
>> >>>
>> >>> As we have seen we can say that Cuis handles Unicode to a certain
>> >>> limited extent.
>> >>>
>> >>> I will post summary a writeup of what I know about it later. I am
>> >>> interested in working/contributing to an add-on which loads Unicode
>> >>> support into Cuis.
>> >>>
>> >>> For general work I need
>> >>>
>> >>> a)
>> >>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>> >>> the majority of the content characters will fall into the
>> >>>   https://de.wikipedia.org/wiki/ISO_8859-15
>> >>> range. So it is fine if the other characters are rendered as \unnn or
>> >>> &#nnn;
>> >>>
>> >>> b)
>> >>> Another more rewarding put maybe more difficult way  would be to
>> >>> replace the String class with a class which handles 16bit characters
>> >>> instead of 8 bit characters. In terms of structure all would remain
>> >>> the same. Characters would be 16bit like in Java.
>> >>>
>> >>>
>> >>> This will come later. At the moment I am working on ContentPack
>> >>> version 2 which will run on Cuis, Squeak and Pharo.
>> >>>
>> >>> Kind regards
>> >>>
>> >>> --Hannes
>> >>>
>> >>>> 2013/1/22 Germán Arduino <garduino at gmail.com>:
>> >>>>> Thanks for the comments Hannes / Juan:
>> >>>>>
>> >>>>> I will look into it when have time, or if you prefer Hannes and
>> >>>>> want
>> >>>>> to help I will integrate it when finish with Aida.
>> >>>>>
>> >>>>> Germán.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> 2013/1/21 Juan Vuletich <juan at jvuletich.org>:
>> >>>>>> Hi Germán,
>> >>>>>>
>> >>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>> >>>>>> for
>> >>>>>> the
>> >>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>> >>>>>> alphabets).
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>> Juan Vuletich
>> >>>>>>
>> >>>>>> Germán Arduino wrote:
>> >>>>>>>
>> >>>>>>> Hi:
>> >>>>>>>
>> >>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>> >>>>>>> all
>> >>>>>>> tests green are ready to install.
>> >>>>>>>
>> >>>>>>> The changes I did in Swazoo are:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> - Avoid Unicode support that don't exist in Cuis
>> >>>>>>>
>> >>> ......
>> >>>
>> >>> _______________________________________________
>> >>> Cuis mailing list
>> >>> Cuis at jvuletich.org
>> >>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>> >>
>> >>
>> >>
>> >> --
>> >> Sincerely,
>> >> Germán Arduino
>> >> about.me/garduino
>> >>
>> >> _______________________________________________
>> >> Cuis mailing list
>> >> Cuis at jvuletich.org
>> >> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>> >>
>> >
>>
>> _______________________________________________
>> Cuis mailing list
>> Cuis at jvuletich.org
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>>
>
>
> --
> Casey Ransberger
>




More information about the Cuis mailing list