[Cuis] About adding a Unicode handling porting layer

H. Hirzel hannes.hirzel at gmail.com
Wed Feb 6 06:21:56 CST 2013


Hello Angel

On 2/6/13, Angel Java Lopez <ajlopez2000 at gmail.com> wrote:
> Hi people!
>
> I just found:
> http://wiki.squeak.org/squeak/857 Unicode at Squeak
> http://www.is.titech.ac.jp/~ohshima/squeak/
> http://www.is.titech.ac.jp/~ohshima/squeak/squeak-multilingual-e.html(pending,
> to read)

Thank you for reminding us of these documents. They contain
information about the implementation of Unicode in Squeak 3.8 which
was release in 2005.

I have added the references you sent to the UnicodeNotes.md document
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md

> It's not clear to me (I'm not a Smalltalker/Squeaker/Cuiser):
>
> Ohsima work, is a change in the Squeak VM? or in String class using pure
> Smalltalk/Squeak?

It is mainly more Smalltalk code (String, ByteString, WideString,
MultiFileByteStream, TextConverter, UTF8TextConverter, many more
.....) but in addition certain changes had to be made to the virtual
machine. For example the clipboard is now in Unicode (UTF8).

> Why not is that work included in Cuis? It cannot be ported?

That is what we are aiming at here   :-)

The question is what exactly? And how should we adapt/change it? Make
it simpler?

I have started a repository
     https://github.com/hhzl/Cuis-Multilingual-TextConversion

where I copy three classes of Squeak at the moment

    https://github.com/hhzl/Cuis-Multilingual-TextConversion/tree/master/CopiedFromSqueak

Actually I copy only two classes, the abstract class TextConverter I
filed in only the class definition and I am now adding methods one by
one of what I need). Maybe I fold the code later into
	UTF8TextConverter


The reason why it was not ported by Juan is that he wanted to focus on
Morphic and leave out some complex subsystems like Unicode support,
Monticello and others.

The Unicode support in Squeak models 'language'.

For example I Squeak 4.4. we have the TextConverter class refering to
a LanguageEnvironment

TextConverter class>>defaultSystemConverter

	^LanguageEnvironment defaultSystemConverter
defaultSystemConverter


and then

LanguageEnvironment class>>defaultSystemConverter

	SystemConverterClass ifNil: [SystemConverterClass := self
currentPlatform class systemConverterClass].
	^ SystemConverterClass new.


which refers to class Locale in the category 'System-Localization'

So the question is what should be adapt.

The current character class in Cuis is 8 bit only. Not that there
couldn't be more as they are integers which are 32 bit but it
restricted on purpose.

What is named String in Cuis is a ByteString in Squeak. Juan has
reworked the Character / String classes considerably. It is a nice
implementation for ISO8859-15 and in some cases surpasses what is in
Squeak. And it is more 'compact' and 'cleaner'. And it has 'hooks' for
Unicode as outlined here

https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md#implementation-in-cuis-41

At the moment I want to focus on a library which when added permits
Cuis4.1 to read and write UFT8 files. This is possible as of now but
not in the File List (see
https://github.com/hhzl/Cuis-Add-Ons/blob/master/UnicodeNotes.md#implementation-in-cuis-41
with the screen shot here
http://jvuletich.org/pipermail/cuis_jvuletich.org/attachments/20130205/915f4469/attachment-0001.png)

--Hannes


> Angel "Java" Lopez
> @ajlopez
>
> On Wed, Feb 6, 2013 at 8:24 AM, H. Hirzel <hannes.hirzel at gmail.com> wrote:
>
>> Ken
>>
>> Having a comparison of a specification/implementation of a simple
>> Unicode layer in another language is helpful.
>>
>> https://code.google.com/p/chibi-scheme/source/browse/lib/scheme/char.sld
>> [1]
>>
>> So my aim is at doing something similar in the sense that I want to
>> leave Cuis 4.1 more or less as is (maybe minor corrections) and then
>> have an Add-On for more Unicode support.
>>
>> Thank you
>>
>> --Hannes
>>
>>
>>
>> [1]
>> (define-library (scheme char)
>> (import (scheme base))
>> (cond-expand
>> (full-unicode
>> (import (chibi char-set full)
>> (chibi char-set base)
>> (chibi iset base))
>> (include "char/full.scm")
>> (include "char/case-offsets.scm"))
>> (else
>> (include "char/ascii.scm")
>> (import
>> (only (chibi)
>> string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
>> char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
>> char-alphabetic? char-lower-case? char-numeric?
>> char-upper-case? char-whitespace? digit-value
>> char-upcase char-downcase))))
>> (include "digit-value.scm")
>> (export
>> char-alphabetic? char-ci<=? char-ci<? char-ci=? char-ci>=? char-ci>?
>> char-downcase char-foldcase char-lower-case? char-numeric?
>> char-upcase char-upper-case? char-whitespace? digit-value
>> string-ci<=? string-ci<? string-ci=? string-ci>=? string-ci>?
>> string-downcase string-foldcase string-upcase))
>>
>> On 2/6/13, Ken Dickey <Ken.Dickey at whidbey.com> wrote:
>> > On Tue, 5 Feb 2013 09:20:24 +0000
>> > "H. Hirzel" <hannes.hirzel at gmail.com> wrote:
>> >
>> >> Hello all
>> >>
>> >> In the meantime I am investigating how to construct a small library
>> >> which works with WideCharacters and WideStrings and the FileStream and
>> >> UTF8Converter which deals with it.
>> >
>> > Hannes,
>> >
>> > Indeed Unicode is moby complex.
>> >
>> >       http://www.unicode.org/versions/Unicode6.2.0/
>> >
>> > I don't know if it helps, but Scheme has probably the minimal defined
>> > Unicode support -- basically read/write, code points, comparisons, and
>> > up/down-casing. The scheme standards group has argued Unicode
>> implementation
>> > features for years. [See 7th draft]
>> >       http://scheme-reports.org/2012/working-group-1.html
>> >
>> > Chibi-Scheme is a bytecode implementation written in C which implements
>> this
>> > support.
>> >
>> >       https://code.google.com/p/chibi-scheme/
>> >
>> > This might be a stretch, but the implementation strategy has been gone
>> over
>> > by many eyeballs.
>> >
>> > $0.02,
>> > -KenD
>> >
>> > _______________________________________________
>> > Cuis mailing list
>> > Cuis at jvuletich.org
>> > http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>> >
>>
>> _______________________________________________
>> Cuis mailing list
>> Cuis at jvuletich.org
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>




More information about the Cuis mailing list