[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: Help parsing Lojban from Python? (Hey, Riley! :)



Oh, it turns out I was looking in the "wrong" repo:
https://github.com/lojban/camxes-py has the Python 3 stuff done
already, by mezohe

On Sun, Aug 29, 2021 at 10:12:56AM -0400, Riley Lynch wrote:
> I'm going to be spending most of the day driving, but before I do that,
> I'll try to address a few questions here, and then can follow up by mail or
> IRC.
> 
> (1) I want to confirm that camxes-py is the preferred Python option
> these days
> 
> I'm not aware of other parsers in python. I specifically developed the
> parser because I wanted a python implementation to complement your java
> implementation and Masato and Ilmen's javascript parsers.

Well, there's https://github.com/lojban/python-camxes :)

> I notice now that Randall Holmes has developed a Python PEG parser for
> Loglan.
> 
> (2) I want to be able to run …  it in a direct, straightforward way … and
> tree should contain an obvious python representation of the
> parse tree
> (5) Make a mode that collapses productions with only one child
> 
> Running will be the easy part. The representation of the parse tree raises
> some interesting questions.

I'm *far* more interested in someone else doing the running part,
FWIW; I feel competent to play with the parse tree after the fact,
but I don't really know idiomatic Python so if I try to make a
library out of what's there it's going to suck.

> For camxes-py, I created a transformation of the parse tree which
> replicated the output of Ilmentufa. I did this so that I could run against
> the test corpus that you set up for java camxes and verify not only that
> the python parser could accept the same corpus as the java and javascript
> parsers, but that it was comprehending the same structures.
> 
> That said, the output exposes a lot of the mechanics of the parser
> specification and obscures the semantics. Ideally, I'd like for the test
> suites to target compatibility with a semantically-structured
> representation of the parse. There's been some work on Ilmentufa to
> post-process the parse tree into something more palatable. Have you taken a
> look at that?

Nope, I actually didn't realize that ilmentufa was a thing until
this conversation.  (I'd heard of it, but didn't know what it was.)

(Side comment: the "About" page for both camxes and jboski now
points to all alternatives I'm aware of.)

> (3) Update to most-recent parsimonious; it currently breaks on 0.8.1, but
> works on 0.6.2
> 
> I wrote against the most recent version of parsimonious at that time. Glad
> to see work has continued. I remember the author was working on some
> performance enhancements, and one problem with camxes-py in its current
> form is that it is slow.

Again, I'm perfectly happy to do that part, fwiw.

> (4) Update to Python 3
> 
> I agree that this should be done.
> 
> 
> On Sat, Aug 28, 2021 at 12:02 AM Robin Lee Powell <robinleepowell@gmail.com>
> wrote:
> 
> > Feel free to come find me on Libera IRC, or suggest a preferred chat
> > option for you.
> >
> > The stuff I want is actually quite simple, though:
> >
> > (1) I want to confirm that camxes-py is the preferred Python option
> > these days
> >
> > (2) I want to be able to run "run" (see
> > https://github.com/teleological/camxes-py/blob/master/camxes.py#L89
> > ) or something like it in a direct, straightforward way, i.e.:
> >
> >           import camxespy
> >           tree = camxespy.run("mi klama", transformer='camxes-morphology')
> >
> > , and tree should contain an obvious python representation of the
> > parse tree.
> >
> > This requires, AFAICT (I don't actually know Python very well) that
> > camxes-py have a library struture to it that it doesn't currently
> > have and that the options be configurable in some way other than
> > OptionParser.
> >
> > I can actually do all that myself, but I'm not really a pythonista
> > and what I do won't be idiomatic at all.
> >
> > Stretch goals:
> >
> > (3) Update to most-recent parsimonious; it currently breaks on
> > 0.8.1, but works on 0.6.2
> >
> > (4) Update to Python 3, but I'm perfectly capable of making a PR for
> > this myself.
> >
> > (5) Make a mode that collapses productions with only one child, i.e.
> > make the output look like this (in terms of productions not syntax):
> >
> >         rlpowell@stodi> echo "mi klama" | camxes -f
> >         Flat layout requested.
> >          text=(  sentence=(  CMAVO=(  KOhA=( mi )  )  BRIVLA=(  gismu=(
> > klama )  )  )  )
> >
> > Instead of this:
> >
> > root@66324b4aed4b:/src# python camxes.py "mi klama"
> >
> > ["text",["text_1",["paragraphs",["paragraph",["statement",["statement_1",["statement_2",["statement_3",["sentence",[["terms",["terms_1",["terms_2",["abs_term",["abs_term_1",["sumti",["sumti_1",["sumti_2",["sumti_3",["sumti_4",["sumti_5",["sumti_6",["KOhA_clause",[["KOhA","mi"]]]]]]]]]]]]]]],["CU"]],["bridi_tail",["bridi_tail_1",["bridi_tail_2",["bridi_tail_3",["selbri",["selbri_1",["selbri_2",["selbri_3",["selbri_4",["selbri_5",["selbri_6",["tanru_unit",["tanru_unit_1",["tanru_unit_2",["BRIVLA_clause",[["BRIVLA",["gismu","klama"]]]]]]]]]]]]]],["tail_terms",["VAU"]]]]]]]]]]]]]]]
> >
> > , but as I said before this is not hard to do after the fact once
> > you have the parse tree.
> >
> >
> > On Fri, Aug 27, 2021 at 11:15:54PM -0400, Riley Martinez-Lynch
> > wrote:
> > > Robin, I'd be happy to make whatever changes are needed to make it
> > > work. I don't see the CLI interface as an essential part of the
> > > interface, and if I can do something to make it easier to access
> > > programmatically, I'd like to do that. Glad to take cues here, or
> > > if you wanted to jump on a call or chat, can do that too.
> > >
> > > Sent from my iPhone
> > >
> > > > On Aug 26, 2021, at 10:11 PM, Robin Lee Powell <
> > robinleepowell@gmail.com> wrote:
> > > >
> > > > 
> > > > In service to making certain parts of the lojban.org infra a bit
> > > > more resilient, I'm updating some stuff that uses
> > > > https://github.com/lojban/python-camxes .  This relies on java and
> > > > the camxes jar, which, whatever, but it's also built on LEPL, which
> > > > no longer works (see for example
> > > > https://github.com/modoboa/modoboa/issues/1780 ).
> > > >
> > > > https://github.com/teleological/camxes-py is a pure Python
> > > > replacement, but is a CLI program rather than a library; it's really
> > > > not designed to be used as a library.  I'd love it if someone
> > > > updated and fixed that.
> > > >
> > > > Unless there's another option?  What's the state of the art in this
> > > > space?
> > > >
> >

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/20210829175938.GA1107525%40gmail.com.