[Cuis] Ropes & Unicode

Sat Feb 16 12:16:04 CST 2013

On 2/16/13, H. Hirzel <hannes.hirzel at gmail.com> wrote:
> Interesting observation, Ken
>
> This may be considered a confirmation to move on with the
> implementation of  Ropes.
>
> According to http://static.rust-lang.org/doc/0.5/std/rope.html
>
> "Ropes are a high-level representation of text that offers much better
> performance than strings for common operations, and generally reduce
> memory allocations and copies, while only entailing a small
> degradation of less common operations."
>
> .....
> "In addition, the tree structure of ropes makes them suitable as a
> form of index to speed-up access to Unicode characters by index in
> long chunks of text."
>
>
> And the string basic type in Rust contains UTF8 encoded characters
> http://dl.rust-lang.org/doc/0.3/tutorial.html   (in version 0.3)
>
> Should the Rust language Ropes API
>     http://static.rust-lang.org/doc/0.5/std/rope.html#type-rope
> be taken as a model for the Cuis implementation?
>
> So far there are 10 methods in the Cuis Ropes implementation
>
>     Rope selectors
>      a Set(#asString
>               #,
>               #stringRepresentation
>               #first
>               #doesNotUnderstand:
>               #last
>               #asText
>               #printOn:
>               #copyReplaceFrom:to:with:
>               #printString
>               #asRope)

Acutally I realize that the subclasses of Rope
        FlatRope,
        ConcatRope and
        SubRope
have
   #at:
   #at:put
   #size


> Interesting candidates from the Rust language Rope API are
>
> Function append_char - Add one char to the end of the rope
> Function prepend_char - Add one char to the beginning of the rope
> Function append_str - Add one string to the end of the rope
>
> Function char_at - The character at position pos
> Function char_len - The number of character in the rope
> Function cmp - Compare two ropes by Unicode lexicographical order.
> Function eq - Returns true if both ropes have the same content
> (regardless of their structure), false otherwise
> Function ge - # Arguments
> Function gt - # Arguments
> Function iter_chars - Loop through a rope, char by char, until the end.
>
> I would say a high priority is
>
> a) Rope construction (i.e. appending and prepending instances of
> Character and String)
> b) streaming over a Rope. This is a typical operation when you deal
> with a large text file.
> c) finding a subrope
>
> And of course performance tests with random data to see where it
> starts to be more efficient to deal with Ropes than Strings.
>
> --Hannes
>
> On 2/16/13, Ken Dickey <Ken.Dickey at whidbey.com> wrote:
>> BTW,
>>
>> Doing a web search on +Rope +Unicode, I found that Mozilla is developing
>> a
>> programming language called Rust which uses Ropes with packed UTF-8
>> strings.
>>
>> The internal documentation suggests heavy users of strings use ropes
>> instead.
>>
>> Note:
>> 	http://static.rust-lang.org/doc/0.5/std/rope.html
>>
>> FYI,
>> -KenD
>>
>> _______________________________________________
>> Cuis mailing list
>> Cuis at jvuletich.org
>> http://jvuletich.org/mailman/listinfo/cuis_jvuletich.org
>>
>