[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: Help parsing Lojban from Python? (Hey, Riley! :)



I'm going to be spending most of the day driving, but before I do that, I'll try to address a few questions here, and then can follow up by mail or IRC.

(1) I want to confirm that camxes-py is the preferred Python option
these days

I'm not aware of other parsers in python. I specifically developed the parser because I wanted a python implementation to complement your java implementation and Masato and Ilmen's _javascript_ parsers.

I notice now that Randall Holmes has developed a Python PEG parser for Loglan.

(2) I want to be able to run …  it in a direct, straightforward way … and tree should contain an obvious python representation of the
parse tree
(5) Make a mode that collapses productions with only one child

Running will be the easy part. The representation of the parse tree raises some interesting questions.

For camxes-py, I created a transformation of the parse tree which replicated the output of Ilmentufa. I did this so that I could run against the test corpus that you set up for java camxes and verify not only that the python parser could accept the same corpus as the java and _javascript_ parsers, but that it was comprehending the same structures.

That said, the output exposes a lot of the mechanics of the parser specification and obscures the semantics. Ideally, I'd like for the test suites to target compatibility with a semantically-structured representation of the parse. There's been some work on Ilmentufa to post-process the parse tree into something more palatable. Have you taken a look at that?

(3) Update to most-recent parsimonious; it currently breaks on 0.8.1, but works on 0.6.2

I wrote against the most recent version of parsimonious at that time. Glad to see work has continued. I remember the author was working on some performance enhancements, and one problem with camxes-py in its current form is that it is slow.

(4) Update to Python 3

I agree that this should be done.


On Sat, Aug 28, 2021 at 12:02 AM Robin Lee Powell <robinleepowell@gmail.com> wrote:
Feel free to come find me on Libera IRC, or suggest a preferred chat
option for you.

The stuff I want is actually quite simple, though:

(1) I want to confirm that camxes-py is the preferred Python option
these days

(2) I want to be able to run "run" (see
https://github.com/teleological/camxes-py/blob/master/camxes.py#L89
) or something like it in a direct, straightforward way, i.e.:

          import camxespy
          tree = camxespy.run("mi klama", transformer='camxes-morphology')

, and tree should contain an obvious python representation of the
parse tree.

This requires, AFAICT (I don't actually know Python very well) that
camxes-py have a library struture to it that it doesn't currently
have and that the options be configurable in some way other than
OptionParser.

I can actually do all that myself, but I'm not really a pythonista
and what I do won't be idiomatic at all.

Stretch goals:

(3) Update to most-recent parsimonious; it currently breaks on
0.8.1, but works on 0.6.2

(4) Update to Python 3, but I'm perfectly capable of making a PR for
this myself.

(5) Make a mode that collapses productions with only one child, i.e.
make the output look like this (in terms of productions not syntax):

        rlpowell@stodi> echo "mi klama" | camxes -f
        Flat layout requested.
         text=(  sentence=(  CMAVO=(  KOhA=( mi )  )  BRIVLA=(  gismu=( klama )  )  )  )

Instead of this:

root@66324b4aed4b:/src# python camxes.py "mi klama"
["text",["text_1",["paragraphs",["paragraph",["statement",["statement_1",["statement_2",["statement_3",["sentence",[["terms",["terms_1",["terms_2",["abs_term",["abs_term_1",["sumti",["sumti_1",["sumti_2",["sumti_3",["sumti_4",["sumti_5",["sumti_6",["KOhA_clause",[["KOhA","mi"]]]]]]]]]]]]]]],["CU"]],["bridi_tail",["bridi_tail_1",["bridi_tail_2",["bridi_tail_3",["selbri",["selbri_1",["selbri_2",["selbri_3",["selbri_4",["selbri_5",["selbri_6",["tanru_unit",["tanru_unit_1",["tanru_unit_2",["BRIVLA_clause",[["BRIVLA",["gismu","klama"]]]]]]]]]]]]]],["tail_terms",["VAU"]]]]]]]]]]]]]]]

, but as I said before this is not hard to do after the fact once
you have the parse tree.


On Fri, Aug 27, 2021 at 11:15:54PM -0400, Riley Martinez-Lynch
wrote:
> Robin, I'd be happy to make whatever changes are needed to make it
> work. I don't see the CLI interface as an essential part of the
> interface, and if I can do something to make it easier to access
> programmatically, I'd like to do that. Glad to take cues here, or
> if you wanted to jump on a call or chat, can do that too.
>
> Sent from my iPhone
>
> > On Aug 26, 2021, at 10:11 PM, Robin Lee Powell <robinleepowell@gmail.com> wrote:
> >
> > 
> > In service to making certain parts of the lojban.org infra a bit
> > more resilient, I'm updating some stuff that uses
> > https://github.com/lojban/python-camxes .  This relies on java and
> > the camxes jar, which, whatever, but it's also built on LEPL, which
> > no longer works (see for example
> > https://github.com/modoboa/modoboa/issues/1780 ).
> >
> > https://github.com/teleological/camxes-py is a pure Python
> > replacement, but is a CLI program rather than a library; it's really
> > not designed to be used as a library.  I'd love it if someone
> > updated and fixed that.
> >
> > Unless there's another option?  What's the state of the art in this
> > space?
> >

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/CAB-HkawLxFuQZT8_vuZf8wmudq%3DkBkY%3Dy5yC%3DOWom2fNyvQANw%40mail.gmail.com.