[Cuis] Why or why not OMeta? (was Re: Brainstorming question: what non-trivial uses can you think of for an object-based parser? (strings not invited))

Fri May 22 14:49:40 CDT 2015

Hi Thierry,

On Fri, 2015-05-22 at 09:56 +0200, Thierry Goubier wrote:
> Hi all,
> 
> 
> first post here about Cuis, and this is a question I am interested
> in... I do believe the viewpoint institutes documents have a few
> answers about that (parsers for network protocols, etc...). But
> still...
> 

I could see network protocols as another time-series application.

> 
> I'm in a strange position about OMeta which is I don't see the
> benefits. I do have the same position about PetitParser, but with even
> worse data points which is I know precisely the performance loss of
> going the petit parser way.
> 

Not strange at all given where it sounds like you're coming from re:
performance being a key requirement.  I won't try to put any spin on it:
everything I've seen indicates that OMeta is among the slowest parsers
out there, but pretty quick given its approach.  Computing power what it
is today, for many applications the response is 'it's fast enough' or
'who cares?' (see the World Wide Web, client- and server-side, for a
perfect example)  I would imagine that if you have heavy data processing
workloads or have very specific response time requirements, then you do
care and OMeta wouldn't work for the application.  However, as a
language for DSLs, at most you're typically only going to see a small
fraction of a second of overhead.  Another way to think of it: if speed
OF the solution is the priority, don't use OMeta.  If speed TO the
solution is the priority, that's what OMeta does well.  I'll get more
specific below...

> 
> I have been writing compiler front-ends for the past 7 years, first
> with Flex / Bison and C, and then with Smalltalk / SmaCC (I maintain
> SmaCC for Pharo). I see the work done by John Brant and Don Roberts
> first hand (RB, SmaCC, generalised refactoring in SmaCC) and I know
> that both OMeta and petit parser are using for me what is a very
> limited form of parsing, with additionally a large performance
> penalty. Moreover, grammars produced in the PetitParser case are as
> long, if not longer than the equivalent SmaCC grammar.
> 

I believe this is one of the areas where OMeta is quite strong: its
grammars are short... very short... 'where did the grammar go?' short.
Consider this example I posted earlier to parse Squeak array
constructors.  Here is the Smalltalk version (i.e. what OMeta is
actually doing behind the scenes):

arrayConstr
        ^ self ometaOr: {[true
                        ifTrue: [self apply: #token withArgs: {'{'}.
                                self apply: #expr.
                                self
                                        many: [true
                                                        ifTrue: [self
apply: #token withArgs: {'.'}.
                                                                self
apply: #expr]].
                                self ometaOr: {[self apply: #token
withArgs: {'.'}]. [self apply:
#empty]}.
                                self apply: #token withArgs: {'}'}]].
[true
                        ifTrue: [self apply: #token withArgs: {'{'}.
                                self apply: #token withArgs: {'}'}]]}

and here's the OMeta version:

arrayConstr =

        "{" expr ("." expr)* ("." | empty) "}"
|       "{" "}"

The only thing that's missing are any semantic predicates and actions so
the ultimate size and readability will be more dictated by how much
Smalltalk code it takes to actually do the work with what OMeta has
parsed.

> 
> So what are the benefits of OMeta? Note that SmaCC would very easily
> do parsing over any kind of objects, not only tokens.
> 

I understand that OMeta isn't unique in being an object parser and I
started this thread mainly because I'm wondering how much value people
can see in parsing things other than text/binary streams.  i.e. is it a
genuinely useful feature or a gimmick/freebie that won't see much use?

As to the first part of your question, here goes:  The fundamental
concept that really grabs me is the OMeta approach of being written in
the host language and using source to source translation to target the
host language while essentially hijacking the host language and
environment to fade into the background of the host environment.  Want a
new DSL?  Subclass OMeta2 and add methods with your rules... done.  Want
a new dialect of said DSL?  Subclass your first DSL and tweak as needed.
Want to write a program in your DSL?  Create a new class and setup the
compiler for that class to use your parser as its 'Language'.  For
example, I could create a subclass called Lisp and write every method in
that class as either pure Lisp or as a hyrid of Lisp/Smalltalk/and any
other DSLs I had created, provided I set up the parsing correctly.  I'm
not aware of any other parser that does it quite so elegantly.

Now here are the downsides:  Alex, the original author or OMeta, is a
parser / languages guy.  This work was related to his employment at VPRI
and subsequent PhD work.  He's since moved on to other things and
there's still a lot missing from OMeta on Smalltalk in terms of tooling
to actually realize the vision.  The lack of debugging support will
drive you nuts until you get used to what it's telling you:  have a
syntax error in your rules? '<-- parse error around here -->'... have
fun! A semantic error in your parser? Get used to looking at your
decompiled code (i.e. the actual Smalltalk it generates) when things go
wrong to figure it out.  Have a logic/runtime error (i.e. your generated
code is sending a message to nil)?  Ditto re: looking at the decompiled
code when it crashes while running.  When everything is correct and
working, OMeta is pure joy.  When it isn't, welcome back to 1980's style
debugging.  Also, if you have an ambiguous grammar look elsewhere...
OMeta won't work for you.  Finally, as I mentioned at the top, OMeta
isn't going to set any new parser speed records.

> 
> Thierry
> 

Hope this helps,
Phil