[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: Help parsing Lojban from Python? (Hey, Riley! :)



Javascript Camxes (Ilmentufa) is at:

https://github.com/lojban/ilmentufa/ <https://github.com/lojban/ilmentufa/> (repository)

https://lojban.github.io/ilmentufa/camxes.html <https://lojban.github.io/ilmentufa/camxes.html> (one of the HTML interfaces)

https://lojban.github.io/ilmentufa/glosser/glosser.htm <https://lojban.github.io/ilmentufa/glosser/glosser.htm> (another HTML interface, allowing nested boxes output)

Ilmentufa can also be used via command line by running "run_camxes.js" with Node.js (see the readme file for details).

—Ilmen.


On 2021-08-29, Robin Lee Powell wrote:
Oh, it turns out I was looking in the "wrong" repo:
https://github.com/lojban/camxes-py has the Python 3 stuff done
already, by mezohe

On Sun, Aug 29, 2021 at 10:12:56AM -0400, Riley Lynch wrote:
I'm going to be spending most of the day driving, but before I do that,
I'll try to address a few questions here, and then can follow up by mail or
IRC.

(1) I want to confirm that camxes-py is the preferred Python option
these days

I'm not aware of other parsers in python. I specifically developed the
parser because I wanted a python implementation to complement your java
implementation and Masato and Ilmen's javascript parsers.
Well, there's https://github.com/lojban/python-camxes :)

I notice now that Randall Holmes has developed a Python PEG parser for
Loglan.

(2) I want to be able to run …  it in a direct, straightforward way … and
tree should contain an obvious python representation of the
parse tree
(5) Make a mode that collapses productions with only one child

Running will be the easy part. The representation of the parse tree raises
some interesting questions.
I'm *far* more interested in someone else doing the running part,
FWIW; I feel competent to play with the parse tree after the fact,
but I don't really know idiomatic Python so if I try to make a
library out of what's there it's going to suck.

For camxes-py, I created a transformation of the parse tree which
replicated the output of Ilmentufa. I did this so that I could run against
the test corpus that you set up for java camxes and verify not only that
the python parser could accept the same corpus as the java and javascript
parsers, but that it was comprehending the same structures.

That said, the output exposes a lot of the mechanics of the parser
specification and obscures the semantics. Ideally, I'd like for the test
suites to target compatibility with a semantically-structured
representation of the parse. There's been some work on Ilmentufa to
post-process the parse tree into something more palatable. Have you taken a
look at that?
Nope, I actually didn't realize that ilmentufa was a thing until
this conversation.  (I'd heard of it, but didn't know what it was.)

(Side comment: the "About" page for both camxes and jboski now
points to all alternatives I'm aware of.)

(3) Update to most-recent parsimonious; it currently breaks on 0.8.1, but
works on 0.6.2

I wrote against the most recent version of parsimonious at that time. Glad
to see work has continued. I remember the author was working on some
performance enhancements, and one problem with camxes-py in its current
form is that it is slow.
Again, I'm perfectly happy to do that part, fwiw.

(4) Update to Python 3

I agree that this should be done.


On Sat, Aug 28, 2021 at 12:02 AM Robin Lee Powell <robinleepowell@gmail.com>
wrote:

Feel free to come find me on Libera IRC, or suggest a preferred chat
option for you.

The stuff I want is actually quite simple, though:

(1) I want to confirm that camxes-py is the preferred Python option
these days

(2) I want to be able to run "run" (see
https://github.com/teleological/camxes-py/blob/master/camxes.py#L89
) or something like it in a direct, straightforward way, i.e.:

           import camxespy
           tree = camxespy.run("mi klama", transformer='camxes-morphology')

, and tree should contain an obvious python representation of the
parse tree.

This requires, AFAICT (I don't actually know Python very well) that
camxes-py have a library struture to it that it doesn't currently
have and that the options be configurable in some way other than
OptionParser.

I can actually do all that myself, but I'm not really a pythonista
and what I do won't be idiomatic at all.

Stretch goals:

(3) Update to most-recent parsimonious; it currently breaks on
0.8.1, but works on 0.6.2

(4) Update to Python 3, but I'm perfectly capable of making a PR for
this myself.

(5) Make a mode that collapses productions with only one child, i.e.
make the output look like this (in terms of productions not syntax):

         rlpowell@stodi> echo "mi klama" | camxes -f
         Flat layout requested.
          text=(  sentence=(  CMAVO=(  KOhA=( mi )  )  BRIVLA=(  gismu=(
klama )  )  )  )

Instead of this:

root@66324b4aed4b:/src# python camxes.py "mi klama"

["text",["text_1",["paragraphs",["paragraph",["statement",["statement_1",["statement_2",["statement_3",["sentence",[["terms",["terms_1",["terms_2",["abs_term",["abs_term_1",["sumti",["sumti_1",["sumti_2",["sumti_3",["sumti_4",["sumti_5",["sumti_6",["KOhA_clause",[["KOhA","mi"]]]]]]]]]]]]]]],["CU"]],["bridi_tail",["bridi_tail_1",["bridi_tail_2",["bridi_tail_3",["selbri",["selbri_1",["selbri_2",["selbri_3",["selbri_4",["selbri_5",["selbri_6",["tanru_unit",["tanru_unit_1",["tanru_unit_2",["BRIVLA_clause",[["BRIVLA",["gismu","klama"]]]]]]]]]]]]]],["tail_terms",["VAU"]]]]]]]]]]]]]]]

, but as I said before this is not hard to do after the fact once
you have the parse tree.


On Fri, Aug 27, 2021 at 11:15:54PM -0400, Riley Martinez-Lynch
wrote:
Robin, I'd be happy to make whatever changes are needed to make it
work. I don't see the CLI interface as an essential part of the
interface, and if I can do something to make it easier to access
programmatically, I'd like to do that. Glad to take cues here, or
if you wanted to jump on a call or chat, can do that too.

Sent from my iPhone

On Aug 26, 2021, at 10:11 PM, Robin Lee Powell <
robinleepowell@gmail.com> wrote:

In service to making certain parts of the lojban.org infra a bit
more resilient, I'm updating some stuff that uses
https://github.com/lojban/python-camxes .  This relies on java and
the camxes jar, which, whatever, but it's also built on LEPL, which
no longer works (see for example
https://github.com/modoboa/modoboa/issues/1780 ).

https://github.com/teleological/camxes-py is a pure Python
replacement, but is a CLI program rather than a library; it's really
not designed to be used as a library.  I'd love it if someone
updated and fixed that.

Unless there's another option?  What's the state of the art in this
space?


--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/7df1103d-6eab-3aa2-2147-2630325ad11e%40gmail.com.