[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Re: Help parsing Lojban from Python? (Hey, Riley! :)
- To: Riley Lynch <shunpiker@gmail.com>
- Subject: [lojban] Re: Help parsing Lojban from Python? (Hey, Riley! :)
- From: Robin Lee Powell <robinleepowell@gmail.com>
- Date: Sun, 29 Aug 2021 10:59:38 -0700
- Arc-authentication-results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Orb+waGf; spf=pass (google.com: domain of robinleepowell@gmail.com designates 2607:f8b0:4864:20::433 as permitted sender) smtp.mailfrom=robinleepowell@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
- Arc-authentication-results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Orb+waGf; spf=pass (google.com: domain of robinleepowell@gmail.com designates 2607:f8b0:4864:20::433 as permitted sender) smtp.mailfrom=robinleepowell@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
- Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature :dkim-signature; bh=FPNwR5D1Qj/OHGK/mS7RW42VecRPlU4UZJoe4OZpB4w=; b=0Tu7EdgmjlboBGZKsmLq3xGOHajcfnb/hUB5YiTHQOTVPi/Pg2r28RapCb81OIpc8Q KAeKhlI06gHL5r5N3XW5IRbHZl4K53Wlr3ZKGzgxQE4Z5u21tqvjI89LTOdp5N6N+UvN Quy3uOnfPUqJrvlm76cJENW+JpxeCsPfMv+AC32fvmU6IAD5d+H8y7xT1cW7aVtPIRQb 74po2dT4tQreasjkhLmC28WdbTgGei51g1UBUmhso+S52TaJoq5oxkO3h4OQN+wLhsGZ OXOUOTnPtDaSlOKKsLfY/04IyTtrMRgIap0+nh6QBDwzatpnNy89MtuKb1q0dOvgJDDZ eahg==
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=bjSRJKKKRZc9ufVYlZj9DDqPh6v814GDMUFeggXaNIo=; b=sxkHcFBkMt3FxBMpnug8ga/RbFvYzRA762YoVkxLSvTemga8mtNFHw4k08fOI7AZUE bsZ3akXnXnpHU595pc+BfYdG/Q+v0sjtA+Qa4TyPW7NxXzhjSYg5fp3M4PxA7Me8ze6Z hvv+NV7R4MxC6jsCht/XlRrcpZrHKqLZqgAsR2QMov0VF7Jjhz3Mxf02xVZ6bRQRPOeQ 0GrHVwNiwlISSw9khBLeuZeCn1vAmjYieJgWC3e1vHEQRGGbaaPzJfYKbYBtdFnxFtq5 sqrIIUHheua7z0awjsKJGek2zzd4Ry44TaumnIQjB2QFfYoT0wf2SHFGDv29ZplQSJFJ HjNg==
- Arc-seal: i=2; a=rsa-sha256; t=1630259982; cv=pass; d=google.com; s=arc-20160816; b=YTY8yWNf6YFJyVLEV3jEwklPAM99Nqx6YYcyGMFZ93oMyr7v1YoCnEVwLrtYDbkDma Pj6r5VONj2Hju9aeNB+vcf7FUAMaeqjnZIvVZfgrwk+JcVbHKQ3wvQ9LMaWAg2ePnVCi boqQEOKjgrdMwhfh8RynPb1iKoO6qhc39x1n2JkjakfOLuFK/l7N3Teu+7QnjYtnRSN3 OaVSGYzlllPFu93y0pH20XMQegFk8HiCRD1rn1vs/rkI2L2HB2Kjep3A45CEr3Ien2nB iwsyw+jWR858X41j2zIu2bnb0xqzLRkJVTzlSMvIFs36Y9TCQl9iqKXXqfXFbhyOfOjf BF+Q==
- Arc-seal: i=1; a=rsa-sha256; t=1630259981; cv=none; d=google.com; s=arc-20160816; b=shTXsrsm320ISN+mTDnxIh7uwZR8LlYcTs17edkUYPGc0nYOkhl7lIHj/ZouzxEQYE FdpA+BJQNu81cRHiSKv/KXWRX1uHXDrupTdp9qu2DD3GXp8A690NOaUYF5iqCMbWDwjH h1IQyjlh6Ak6CFvdJ8tDDiVJ5e3RoiymEoF7sD6OpOoVbCQyfTE1hHZCRX6S2y12bACG scrcIhakK6NGzYtOyyS+b5TJiQx5AYBmrrSiPDZqtTrJ/aPrzLZM1+dIfBCp81Pkxr1d zLD773dpkwHU+Z+Otl4OnR5PFT9VcZjBVyNxOND2OM4uGlAPCh9hJ73vivhXQWFKsG3o MCVg==
- Cc: lojban@googlegroups.com
- Delivery-date: Sun, 29 Aug 2021 10:59:47 -0700
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=FPNwR5D1Qj/OHGK/mS7RW42VecRPlU4UZJoe4OZpB4w=; b=SYX3/983Mpy5JQIxvN2rqE7lBgOGftdVyTa+i2vB78NRRv25HxewJslWxRq+w04w97 NOpHeI0WzVIfE9DhhATBKDraAs5I3N98nRY87FLT0246ajGrlw2P9XXqOjlYe+tnnupt wxea51SYb5BtmTh/EkTNd/uYVegsbZJsOzSgfnG5Hpe2I6MbfOStKq495oP0/VREIWvc 4DZhPtmJ3OzZ386m6FlsAAJAb+J6v2Wf3e5fH3VTwUGQ+SIdW4r5bVIXh1bo/FZMSRP9 ZUAuVynr40C3a0zd/kqNWVEIRFbYuRL9ybdd0Ft4q3YwH4lzmUcBTygtwIKjhyfP8YFd ROCw==
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=FPNwR5D1Qj/OHGK/mS7RW42VecRPlU4UZJoe4OZpB4w=; b=Gzb72BmhM/A5j+VoUddAd10sVRFuGWzIQdg2At7ocbR9MHKJwwyjQxTqgRQkZs+T3G jKm//ZV9b2hwS7gL3Z4Tnffa7xa7Ud3Du8DZEbhxmQOwx3JREtqIVhpNZzKyjOhlgUke WyD1Z/BYu0IilkglAFhW/OLoc/C8yl9BurmyCopcc7yQO6ADyNeS4hWiUAyJ7HHqxzSd 9D0Id9d7/buFH6xi9L243cpxOUwFZYGasKQeJnHWgilnHPTciADDdcWx9vkWeZNySGvK qdoqloHcA3U3WNqLgNIAAIyhglD7eAeW+I0kfalxGbpADfMS0vCmt0JswACX8dUdUe8h YiPA==
- Envelope-to: lojban-list-archive@lojban.org
- In-reply-to: <CAB-HkawLxFuQZT8_vuZf8wmudq=kBkY=y5yC=OWom2fNyvQANw@mail.gmail.com>
- List-archive: <https://groups.google.com/group/lojba>
- List-help: <https://groups.google.com/support/>, <mailto:lojban+help@googlegroups.com>
- List-id: <lojban.googlegroups.com>
- List-post: <https://groups.google.com/group/lojban/post>, <mailto:lojban@googlegroups.com>
- List-subscribe: <https://groups.google.com/group/lojban/subscribe>, <mailto:lojban+subscribe@googlegroups.com>
- List-unsubscribe: <mailto:googlegroups-manage+1004133512417+unsubscribe@googlegroups.com>, <https://groups.google.com/group/lojban/subscribe>
- Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
- References: <20210827021139.GO309000@gmail.com> <B6AB9786-1025-4F5E-A196-C83556B8EDC6@gmail.com> <20210828040224.GS309000@gmail.com> <CAB-HkawLxFuQZT8_vuZf8wmudq=kBkY=y5yC=OWom2fNyvQANw@mail.gmail.com>
- Reply-to: lojban@googlegroups.com
- Sender: lojban@googlegroups.com
Oh, it turns out I was looking in the "wrong" repo:
https://github.com/lojban/camxes-py has the Python 3 stuff done
already, by mezohe
On Sun, Aug 29, 2021 at 10:12:56AM -0400, Riley Lynch wrote:
> I'm going to be spending most of the day driving, but before I do that,
> I'll try to address a few questions here, and then can follow up by mail or
> IRC.
>
> (1) I want to confirm that camxes-py is the preferred Python option
> these days
>
> I'm not aware of other parsers in python. I specifically developed the
> parser because I wanted a python implementation to complement your java
> implementation and Masato and Ilmen's javascript parsers.
Well, there's https://github.com/lojban/python-camxes :)
> I notice now that Randall Holmes has developed a Python PEG parser for
> Loglan.
>
> (2) I want to be able to run … it in a direct, straightforward way … and
> tree should contain an obvious python representation of the
> parse tree
> (5) Make a mode that collapses productions with only one child
>
> Running will be the easy part. The representation of the parse tree raises
> some interesting questions.
I'm *far* more interested in someone else doing the running part,
FWIW; I feel competent to play with the parse tree after the fact,
but I don't really know idiomatic Python so if I try to make a
library out of what's there it's going to suck.
> For camxes-py, I created a transformation of the parse tree which
> replicated the output of Ilmentufa. I did this so that I could run against
> the test corpus that you set up for java camxes and verify not only that
> the python parser could accept the same corpus as the java and javascript
> parsers, but that it was comprehending the same structures.
>
> That said, the output exposes a lot of the mechanics of the parser
> specification and obscures the semantics. Ideally, I'd like for the test
> suites to target compatibility with a semantically-structured
> representation of the parse. There's been some work on Ilmentufa to
> post-process the parse tree into something more palatable. Have you taken a
> look at that?
Nope, I actually didn't realize that ilmentufa was a thing until
this conversation. (I'd heard of it, but didn't know what it was.)
(Side comment: the "About" page for both camxes and jboski now
points to all alternatives I'm aware of.)
> (3) Update to most-recent parsimonious; it currently breaks on 0.8.1, but
> works on 0.6.2
>
> I wrote against the most recent version of parsimonious at that time. Glad
> to see work has continued. I remember the author was working on some
> performance enhancements, and one problem with camxes-py in its current
> form is that it is slow.
Again, I'm perfectly happy to do that part, fwiw.
> (4) Update to Python 3
>
> I agree that this should be done.
>
>
> On Sat, Aug 28, 2021 at 12:02 AM Robin Lee Powell <robinleepowell@gmail.com>
> wrote:
>
> > Feel free to come find me on Libera IRC, or suggest a preferred chat
> > option for you.
> >
> > The stuff I want is actually quite simple, though:
> >
> > (1) I want to confirm that camxes-py is the preferred Python option
> > these days
> >
> > (2) I want to be able to run "run" (see
> > https://github.com/teleological/camxes-py/blob/master/camxes.py#L89
> > ) or something like it in a direct, straightforward way, i.e.:
> >
> > import camxespy
> > tree = camxespy.run("mi klama", transformer='camxes-morphology')
> >
> > , and tree should contain an obvious python representation of the
> > parse tree.
> >
> > This requires, AFAICT (I don't actually know Python very well) that
> > camxes-py have a library struture to it that it doesn't currently
> > have and that the options be configurable in some way other than
> > OptionParser.
> >
> > I can actually do all that myself, but I'm not really a pythonista
> > and what I do won't be idiomatic at all.
> >
> > Stretch goals:
> >
> > (3) Update to most-recent parsimonious; it currently breaks on
> > 0.8.1, but works on 0.6.2
> >
> > (4) Update to Python 3, but I'm perfectly capable of making a PR for
> > this myself.
> >
> > (5) Make a mode that collapses productions with only one child, i.e.
> > make the output look like this (in terms of productions not syntax):
> >
> > rlpowell@stodi> echo "mi klama" | camxes -f
> > Flat layout requested.
> > text=( sentence=( CMAVO=( KOhA=( mi ) ) BRIVLA=( gismu=(
> > klama ) ) ) )
> >
> > Instead of this:
> >
> > root@66324b4aed4b:/src# python camxes.py "mi klama"
> >
> > ["text",["text_1",["paragraphs",["paragraph",["statement",["statement_1",["statement_2",["statement_3",["sentence",[["terms",["terms_1",["terms_2",["abs_term",["abs_term_1",["sumti",["sumti_1",["sumti_2",["sumti_3",["sumti_4",["sumti_5",["sumti_6",["KOhA_clause",[["KOhA","mi"]]]]]]]]]]]]]]],["CU"]],["bridi_tail",["bridi_tail_1",["bridi_tail_2",["bridi_tail_3",["selbri",["selbri_1",["selbri_2",["selbri_3",["selbri_4",["selbri_5",["selbri_6",["tanru_unit",["tanru_unit_1",["tanru_unit_2",["BRIVLA_clause",[["BRIVLA",["gismu","klama"]]]]]]]]]]]]]],["tail_terms",["VAU"]]]]]]]]]]]]]]]
> >
> > , but as I said before this is not hard to do after the fact once
> > you have the parse tree.
> >
> >
> > On Fri, Aug 27, 2021 at 11:15:54PM -0400, Riley Martinez-Lynch
> > wrote:
> > > Robin, I'd be happy to make whatever changes are needed to make it
> > > work. I don't see the CLI interface as an essential part of the
> > > interface, and if I can do something to make it easier to access
> > > programmatically, I'd like to do that. Glad to take cues here, or
> > > if you wanted to jump on a call or chat, can do that too.
> > >
> > > Sent from my iPhone
> > >
> > > > On Aug 26, 2021, at 10:11 PM, Robin Lee Powell <
> > robinleepowell@gmail.com> wrote:
> > > >
> > > >
> > > > In service to making certain parts of the lojban.org infra a bit
> > > > more resilient, I'm updating some stuff that uses
> > > > https://github.com/lojban/python-camxes . This relies on java and
> > > > the camxes jar, which, whatever, but it's also built on LEPL, which
> > > > no longer works (see for example
> > > > https://github.com/modoboa/modoboa/issues/1780 ).
> > > >
> > > > https://github.com/teleological/camxes-py is a pure Python
> > > > replacement, but is a CLI program rather than a library; it's really
> > > > not designed to be used as a library. I'd love it if someone
> > > > updated and fixed that.
> > > >
> > > > Unless there's another option? What's the state of the art in this
> > > > space?
> > > >
> >
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/20210829175938.GA1107525%40gmail.com.