[Cuis] About adding a Unicode handling porting layer

Germán Arduino garduino at gmail.com
Sat Feb 2 16:27:55 CST 2013


Yes, I agree in that have an installable layer should be the best option.

But Unicode and related stuff are not my expertise field and I do not
have too much value to add :(


2013/2/2 H. Hirzel <hannes.hirzel at gmail.com>:
> Your feedback, Germán,
>
> makes me consider to analyze what it would involve to write a simple
> Unicode porting level.
>
> I think of  an Add-On to Cuis which people can load if they want to
> work more Unicode compliant.
>
> In fact 50% of all HTML files on the internet are encoded in UTF8 and
> the text files with which I work are mostly UFT8. So if I use
> WebClient to download an HTML file and want to futher process it I
> have to deal with workarounds. Even HTML files in major European
> languages often have Unicode characters like special hypens, quotation
> marks, graphical symbols etc.
>
> One idea I'd like to try out is just to replace the class String which
> only stores bytes (8bit) with a String class which stores words (32
> bit). It is a bit a waste in terms of space but conceptually it would
> be straightforward. Space measurement has shown that there are not all
> that many strings in Cuis. The major part is taken by bitmaps.
>
> I just have to figure out how to work with these
> variableByteSubclasses with which I have not done much in the past.
>
> --Hannes
>
>
>
>
>
> On 2/1/13, Germán Arduino <garduino at gmail.com> wrote:
>> Thanks Hannes, this is very useful to me.
>>
>> My next step in porting stuff is polish WebClient and, between other
>> things, Unicode is an issue.
>>
>> Germán.
>>
>> 2013/2/1 H. Hirzel <hannes.hirzel at gmail.com>:
>>> Thank you Juan,
>>> for adding the Unicode fix so that pasting text through the clipboard
>>> does not silently loose characters. More things like this (including
>>> comments) later.
>>>
>>> I have realized that what I wrote earlier is wrong. Cuis reads and
>>> saves files in ISO8859-15 by default and not with Unicode. However it
>>> is not too difficult to read and write a Unicode file.
>>>
>>> I have started some notes on this here
>>> https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md
>>>
>>> Regards
>>> Hannes
>>>
>>> On 1/23/13, Juan Vuletich <juan at jvuletich.org> wrote:
>>>> Thanks Hannes, just integrated this.
>>>>
>>>> Cheers,
>>>> Juan Vuletich
>>>>
>>>> H. Hirzel wrote:
>>>>> The attached change set prevents Cuis from silently ignoring
>>>>> characters which are not in ISO 8859-15.
>>>>>
>>>>> For example if you paste a text snippet which contains the letter
>>>>> Omega (Ω) into a TextWindow it is displayed as Ω
>>>>>
>>>>> The part which does it the other way round is not included.
>>>>>
>>>>> --Hannes
>>>>>
>>>>>
>>>>>
>>>>> On 1/22/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
>>>>>
>>>>>> Hello Germán
>>>>>>
>>>>>> On 1/22/13, Germán Arduino <garduino at gmail.com> wrote:
>>>>>>
>>>>>>> Nice if you will develop the needed code!
>>>>>>>
>>>>>>> The first need I have is on the methods of Swazoo that I commented in
>>>>>>> other mail, but I think that is more simple, only that I don't was
>>>>>>> aware of the already inplace support in Cuis itself.
>>>>>>>
>>>>>> Yes, that took me as well some time to find out that Cuis indeed has
>>>>>> some limited Unicode support.
>>>>>>
>>>>>> Juan originally wrote that Cuis had dropped Unicode support.
>>>>>>
>>>>>> When I have a look at Cuis from outside I cannot say that it is the
>>>>>> case as Cuis consumes and writes UFT8 text files. Unicode text
>>>>>> snippets pasted through the clipboard into a Cuis TextEditor also pass
>>>>>> in well. The only limitation is that internally it only handles the
>>>>>> code points which are in https://de.wikipedia.org/wiki/ISO_8859-15.
>>>>>> And if I work in a Cuis workspace  with
>>>>>>
>>>>>>     nn asCharacter
>>>>>>
>>>>>> where nn is an Integer
>>>>>>
>>>>>>    nn must belong to ISO_8859-15
>>>>>>
>>>>>>
>>>>>> ISO_8859-15 is good for most European languages. If we would have an
>>>>>> Add-On to cater for occasional other characters of Unicode which do
>>>>>> not fall into the set covered by ISO_8859-15 that would make UTF8 text
>>>>>> file processing with Cuis safe.
>>>>>>
>>>>>>
>>>>>> --Hannes
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Germàn.
>>>>>>>
>>>>>>> 2013/1/22 H. Hirzel <hannes.hirzel at gmail.com>:
>>>>>>>
>>>>>>>> Hello Germán and Juan
>>>>>>>>
>>>>>>>> As we have seen we can say that Cuis handles Unicode to a certain
>>>>>>>> limited extent.
>>>>>>>>
>>>>>>>> I will post summary a writeup of what I know about it later. I am
>>>>>>>> interested in working/contributing to an add-on which loads Unicode
>>>>>>>> support into Cuis.
>>>>>>>>
>>>>>>>> For general work I need
>>>>>>>>
>>>>>>>> a)
>>>>>>>> an add-on so that Cuis can process arbitrary UFT8 text files. However
>>>>>>>> the majority of the content characters will fall into the
>>>>>>>>   https://de.wikipedia.org/wiki/ISO_8859-15
>>>>>>>> range. So it is fine if the other characters are rendered as \unnn or
>>>>>>>> &#nnn;
>>>>>>>>
>>>>>>>> b)
>>>>>>>> Another more rewarding put maybe more difficult way  would be to
>>>>>>>> replace the String class with a class which handles 16bit characters
>>>>>>>> instead of 8 bit characters. In terms of structure all would remain
>>>>>>>> the same. Characters would be 16bit like in Java.
>>>>>>>>
>>>>>>>>
>>>>>>>> This will come later. At the moment I am working on ContentPack
>>>>>>>> version 2 which will run on Cuis, Squeak and Pharo.
>>>>>>>>
>>>>>>>> Kind regards
>>>>>>>>
>>>>>>>> --Hannes
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2013/1/22 Germán Arduino <garduino at gmail.com>:
>>>>>>>>>
>>>>>>>>>> Thanks for the comments Hannes / Juan:
>>>>>>>>>>
>>>>>>>>>> I will look into it when have time, or if you prefer Hannes and
>>>>>>>>>> want
>>>>>>>>>> to help I will integrate it when finish with Aida.
>>>>>>>>>>
>>>>>>>>>> Germán.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2013/1/21 Juan Vuletich <juan at jvuletich.org>:
>>>>>>>>>>
>>>>>>>>>>> Hi Germán,
>>>>>>>>>>>
>>>>>>>>>>> Cool! Just a remark: Cuis does include conversion to/from utf-8
>>>>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>> charset it supports (ISO-8859-15, covering nearly all the latin
>>>>>>>>>>> alphabets).
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Juan Vuletich
>>>>>>>>>>>
>>>>>>>>>>> Germán Arduino wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi:
>>>>>>>>>>>>
>>>>>>>>>>>> The first versions of Sport and Swazoo working in Cuis 4.1 with
>>>>>>>>>>>> all
>>>>>>>>>>>> tests green are ready to install.
>>>>>>>>>>>>
>>>>>>>>>>>> The changes I did in Swazoo are:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> - Avoid Unicode support that don't exist in Cuis
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>> ......
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Cuis mailing list
>>>>>>>> Cuis at jvuletich.org
>>>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sincerely,
>>>>>>> Germán Arduino
>>>>>>> about.me/garduino
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Cuis mailing list
>>>>>>> Cuis at jvuletich.org
>>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>>
>>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> _______________________________________________
>>>>>> Cuis mailing list
>>>>>> Cuis at jvuletich.org
>>>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Cuis mailing list
>>>> Cuis at jvuletich.org
>>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>>>
>>>
>>> _______________________________________________
>>> Cuis mailing list
>>> Cuis at jvuletich.org
>>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>> _______________________________________________
>> Cuis mailing list
>> Cuis at jvuletich.org
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>
> _______________________________________________
> Cuis mailing list
> Cuis at jvuletich.org
> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org




More information about the Cuis mailing list