Return-path: Envelope-to: lojban-list-archive@lojban.org Delivery-date: Sun, 29 Aug 2021 10:59:47 -0700 Received: from mail-pf1-f190.google.com ([209.85.210.190]:42717) by stodi.digitalkingdom.org with esmtps (TLS1.3) tls TLS_AES_128_GCM_SHA256 (Exim 4.94) (envelope-from ) id 1mKP5r-004gDL-AS for lojban-list-archive@lojban.org; Sun, 29 Aug 2021 10:59:47 -0700 Received: by mail-pf1-f190.google.com with SMTP id c7-20020aa788070000b029035630a4b35dsf301392pfo.9 for ; Sun, 29 Aug 2021 10:59:43 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1630259982; cv=pass; d=google.com; s=arc-20160816; b=YTY8yWNf6YFJyVLEV3jEwklPAM99Nqx6YYcyGMFZ93oMyr7v1YoCnEVwLrtYDbkDma Pj6r5VONj2Hju9aeNB+vcf7FUAMaeqjnZIvVZfgrwk+JcVbHKQ3wvQ9LMaWAg2ePnVCi boqQEOKjgrdMwhfh8RynPb1iKoO6qhc39x1n2JkjakfOLuFK/l7N3Teu+7QnjYtnRSN3 OaVSGYzlllPFu93y0pH20XMQegFk8HiCRD1rn1vs/rkI2L2HB2Kjep3A45CEr3Ien2nB iwsyw+jWR858X41j2zIu2bnb0xqzLRkJVTzlSMvIFs36Y9TCQl9iqKXXqfXFbhyOfOjf BF+Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature :dkim-signature; bh=FPNwR5D1Qj/OHGK/mS7RW42VecRPlU4UZJoe4OZpB4w=; b=0Tu7EdgmjlboBGZKsmLq3xGOHajcfnb/hUB5YiTHQOTVPi/Pg2r28RapCb81OIpc8Q KAeKhlI06gHL5r5N3XW5IRbHZl4K53Wlr3ZKGzgxQE4Z5u21tqvjI89LTOdp5N6N+UvN Quy3uOnfPUqJrvlm76cJENW+JpxeCsPfMv+AC32fvmU6IAD5d+H8y7xT1cW7aVtPIRQb 74po2dT4tQreasjkhLmC28WdbTgGei51g1UBUmhso+S52TaJoq5oxkO3h4OQN+wLhsGZ OXOUOTnPtDaSlOKKsLfY/04IyTtrMRgIap0+nh6QBDwzatpnNy89MtuKb1q0dOvgJDDZ eahg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Orb+waGf; spf=pass (google.com: domain of robinleepowell@gmail.com designates 2607:f8b0:4864:20::433 as permitted sender) smtp.mailfrom=robinleepowell@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=FPNwR5D1Qj/OHGK/mS7RW42VecRPlU4UZJoe4OZpB4w=; b=SYX3/983Mpy5JQIxvN2rqE7lBgOGftdVyTa+i2vB78NRRv25HxewJslWxRq+w04w97 NOpHeI0WzVIfE9DhhATBKDraAs5I3N98nRY87FLT0246ajGrlw2P9XXqOjlYe+tnnupt wxea51SYb5BtmTh/EkTNd/uYVegsbZJsOzSgfnG5Hpe2I6MbfOStKq495oP0/VREIWvc 4DZhPtmJ3OzZ386m6FlsAAJAb+J6v2Wf3e5fH3VTwUGQ+SIdW4r5bVIXh1bo/FZMSRP9 ZUAuVynr40C3a0zd/kqNWVEIRFbYuRL9ybdd0Ft4q3YwH4lzmUcBTygtwIKjhyfP8YFd ROCw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=FPNwR5D1Qj/OHGK/mS7RW42VecRPlU4UZJoe4OZpB4w=; b=Gzb72BmhM/A5j+VoUddAd10sVRFuGWzIQdg2At7ocbR9MHKJwwyjQxTqgRQkZs+T3G jKm//ZV9b2hwS7gL3Z4Tnffa7xa7Ud3Du8DZEbhxmQOwx3JREtqIVhpNZzKyjOhlgUke WyD1Z/BYu0IilkglAFhW/OLoc/C8yl9BurmyCopcc7yQO6ADyNeS4hWiUAyJ7HHqxzSd 9D0Id9d7/buFH6xi9L243cpxOUwFZYGasKQeJnHWgilnHPTciADDdcWx9vkWeZNySGvK qdoqloHcA3U3WNqLgNIAAIyhglD7eAeW+I0kfalxGbpADfMS0vCmt0JswACX8dUdUe8h YiPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=FPNwR5D1Qj/OHGK/mS7RW42VecRPlU4UZJoe4OZpB4w=; b=FEWiDGqkc5XkNKdD5KUqmkrTQ0oZ2ARfibkNcJkcqaEFnOwW2g9Cd79vr6ZkyEyPkw NhkoW3kQp4cOdIXCAV4C3Qw9gL3tVMXU1mL8iWormGr+xNl2yuZX1bAmdMv1zn2329GJ xqzp4WqlVDikIl8G7IQlv4jm9+VCYW8JreGGk+KbvCmWCBJYVzxEUQjxVgaFf1xlG7Cu BeCWvRgcYGXEL2z07AkggQP+eW50waipPhcnzMnYofLHumXO2YjA4Z2jxEqpapHZh4vA oauhTYWpgJ2XoQLUQ/cO6rmh8WrzR8ntIRQ9pu3i3OHk3JaEs5E8El5rbY+xTRLzZRSE OwxQ== Sender: lojban@googlegroups.com X-Gm-Message-State: AOAM531Wi6d6lsa9B3JzGgsBhihXUgk2xHJjv1nCoi6pm0AKJHI8JU7c 9OhEnAmB7hRa6TkIhgpsFGA= X-Google-Smtp-Source: ABdhPJyuGlT2i3HmTzYI38TrV5vcZyHBCUuTyp+Wr7V+iybdOyR2Si6K8BTQTE6+/SplaJm6rBs+lA== X-Received: by 2002:a17:902:848c:b029:12c:daf3:94f1 with SMTP id c12-20020a170902848cb029012cdaf394f1mr18258222plo.50.1630259982047; Sun, 29 Aug 2021 10:59:42 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 2002:a63:ff04:: with SMTP id k4ls5736307pgi.11.gmail; Sun, 29 Aug 2021 10:59:41 -0700 (PDT) X-Received: by 2002:a63:fc41:: with SMTP id r1mr8074388pgk.315.1630259981257; Sun, 29 Aug 2021 10:59:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630259981; cv=none; d=google.com; s=arc-20160816; b=shTXsrsm320ISN+mTDnxIh7uwZR8LlYcTs17edkUYPGc0nYOkhl7lIHj/ZouzxEQYE FdpA+BJQNu81cRHiSKv/KXWRX1uHXDrupTdp9qu2DD3GXp8A690NOaUYF5iqCMbWDwjH h1IQyjlh6Ak6CFvdJ8tDDiVJ5e3RoiymEoF7sD6OpOoVbCQyfTE1hHZCRX6S2y12bACG scrcIhakK6NGzYtOyyS+b5TJiQx5AYBmrrSiPDZqtTrJ/aPrzLZM1+dIfBCp81Pkxr1d zLD773dpkwHU+Z+Otl4OnR5PFT9VcZjBVyNxOND2OM4uGlAPCh9hJ73vivhXQWFKsG3o MCVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=bjSRJKKKRZc9ufVYlZj9DDqPh6v814GDMUFeggXaNIo=; b=sxkHcFBkMt3FxBMpnug8ga/RbFvYzRA762YoVkxLSvTemga8mtNFHw4k08fOI7AZUE bsZ3akXnXnpHU595pc+BfYdG/Q+v0sjtA+Qa4TyPW7NxXzhjSYg5fp3M4PxA7Me8ze6Z hvv+NV7R4MxC6jsCht/XlRrcpZrHKqLZqgAsR2QMov0VF7Jjhz3Mxf02xVZ6bRQRPOeQ 0GrHVwNiwlISSw9khBLeuZeCn1vAmjYieJgWC3e1vHEQRGGbaaPzJfYKbYBtdFnxFtq5 sqrIIUHheua7z0awjsKJGek2zzd4Ry44TaumnIQjB2QFfYoT0wf2SHFGDv29ZplQSJFJ HjNg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Orb+waGf; spf=pass (google.com: domain of robinleepowell@gmail.com designates 2607:f8b0:4864:20::433 as permitted sender) smtp.mailfrom=robinleepowell@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com. [2607:f8b0:4864:20::433]) by gmr-mx.google.com with ESMTPS id m1si1255931pjv.1.2021.08.29.10.59.41 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Aug 2021 10:59:41 -0700 (PDT) Received-SPF: pass (google.com: domain of robinleepowell@gmail.com designates 2607:f8b0:4864:20::433 as permitted sender) client-ip=2607:f8b0:4864:20::433; Received: by mail-pf1-x433.google.com with SMTP id v123so10331171pfb.11 for ; Sun, 29 Aug 2021 10:59:41 -0700 (PDT) X-Received: by 2002:a05:6a00:2444:b029:3cd:5af9:821e with SMTP id d4-20020a056a002444b02903cd5af9821emr19635265pfj.40.1630259980701; Sun, 29 Aug 2021 10:59:40 -0700 (PDT) Received: from gmail.com (mail.digitalkingdom.org. [173.13.139.236]) by smtp.gmail.com with ESMTPSA id x8sm2113611pfj.128.2021.08.29.10.59.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Aug 2021 10:59:40 -0700 (PDT) Date: Sun, 29 Aug 2021 10:59:38 -0700 From: Robin Lee Powell To: Riley Lynch Cc: lojban@googlegroups.com Subject: [lojban] Re: Help parsing Lojban from Python? (Hey, Riley! :) Message-ID: <20210829175938.GA1107525@gmail.com> References: <20210827021139.GO309000@gmail.com> <20210828040224.GS309000@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: X-Original-Sender: robinleepowell@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Orb+waGf; spf=pass (google.com: domain of robinleepowell@gmail.com designates 2607:f8b0:4864:20::433 as permitted sender) smtp.mailfrom=robinleepowell@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -2.5 (--) X-Spam_score: -2.5 X-Spam_score_int: -24 X-Spam_bar: -- Oh, it turns out I was looking in the "wrong" repo: https://github.com/lojban/camxes-py has the Python 3 stuff done already, by mezohe On Sun, Aug 29, 2021 at 10:12:56AM -0400, Riley Lynch wrote: > I'm going to be spending most of the day driving, but before I do that, > I'll try to address a few questions here, and then can follow up by mail = or > IRC. >=20 > (1) I want to confirm that camxes-py is the preferred Python option > these days >=20 > I'm not aware of other parsers in python. I specifically developed the > parser because I wanted a python implementation to complement your java > implementation and Masato and Ilmen's javascript parsers. Well, there's https://github.com/lojban/python-camxes :) > I notice now that Randall Holmes has developed a Python PEG parser for > Loglan. >=20 > (2) I want to be able to run =E2=80=A6 it in a direct, straightforward w= ay =E2=80=A6 and > tree should contain an obvious python representation of the > parse tree > (5) Make a mode that collapses productions with only one child >=20 > Running will be the easy part. The representation of the parse tree raise= s > some interesting questions. I'm *far* more interested in someone else doing the running part, FWIW; I feel competent to play with the parse tree after the fact, but I don't really know idiomatic Python so if I try to make a library out of what's there it's going to suck. > For camxes-py, I created a transformation of the parse tree which > replicated the output of Ilmentufa. I did this so that I could run agains= t > the test corpus that you set up for java camxes and verify not only that > the python parser could accept the same corpus as the java and javascript > parsers, but that it was comprehending the same structures. >=20 > That said, the output exposes a lot of the mechanics of the parser > specification and obscures the semantics. Ideally, I'd like for the test > suites to target compatibility with a semantically-structured > representation of the parse. There's been some work on Ilmentufa to > post-process the parse tree into something more palatable. Have you taken= a > look at that? Nope, I actually didn't realize that ilmentufa was a thing until this conversation. (I'd heard of it, but didn't know what it was.) (Side comment: the "About" page for both camxes and jboski now points to all alternatives I'm aware of.) > (3) Update to most-recent parsimonious; it currently breaks on 0.8.1, but > works on 0.6.2 >=20 > I wrote against the most recent version of parsimonious at that time. Gla= d > to see work has continued. I remember the author was working on some > performance enhancements, and one problem with camxes-py in its current > form is that it is slow. Again, I'm perfectly happy to do that part, fwiw. > (4) Update to Python 3 >=20 > I agree that this should be done. >=20 >=20 > On Sat, Aug 28, 2021 at 12:02 AM Robin Lee Powell > wrote: >=20 > > Feel free to come find me on Libera IRC, or suggest a preferred chat > > option for you. > > > > The stuff I want is actually quite simple, though: > > > > (1) I want to confirm that camxes-py is the preferred Python option > > these days > > > > (2) I want to be able to run "run" (see > > https://github.com/teleological/camxes-py/blob/master/camxes.py#L89 > > ) or something like it in a direct, straightforward way, i.e.: > > > > import camxespy > > tree =3D camxespy.run("mi klama", transformer=3D'camxes-morph= ology') > > > > , and tree should contain an obvious python representation of the > > parse tree. > > > > This requires, AFAICT (I don't actually know Python very well) that > > camxes-py have a library struture to it that it doesn't currently > > have and that the options be configurable in some way other than > > OptionParser. > > > > I can actually do all that myself, but I'm not really a pythonista > > and what I do won't be idiomatic at all. > > > > Stretch goals: > > > > (3) Update to most-recent parsimonious; it currently breaks on > > 0.8.1, but works on 0.6.2 > > > > (4) Update to Python 3, but I'm perfectly capable of making a PR for > > this myself. > > > > (5) Make a mode that collapses productions with only one child, i.e. > > make the output look like this (in terms of productions not syntax): > > > > rlpowell@stodi> echo "mi klama" | camxes -f > > Flat layout requested. > > text=3D( sentence=3D( CMAVO=3D( KOhA=3D( mi ) ) BRIVLA=3D= ( gismu=3D( > > klama ) ) ) ) > > > > Instead of this: > > > > root@66324b4aed4b:/src# python camxes.py "mi klama" > > > > ["text",["text_1",["paragraphs",["paragraph",["statement",["statement_1= ",["statement_2",["statement_3",["sentence",[["terms",["terms_1",["terms_2"= ,["abs_term",["abs_term_1",["sumti",["sumti_1",["sumti_2",["sumti_3",["sumt= i_4",["sumti_5",["sumti_6",["KOhA_clause",[["KOhA","mi"]]]]]]]]]]]]]]],["CU= "]],["bridi_tail",["bridi_tail_1",["bridi_tail_2",["bridi_tail_3",["selbri"= ,["selbri_1",["selbri_2",["selbri_3",["selbri_4",["selbri_5",["selbri_6",["= tanru_unit",["tanru_unit_1",["tanru_unit_2",["BRIVLA_clause",[["BRIVLA",["g= ismu","klama"]]]]]]]]]]]]]],["tail_terms",["VAU"]]]]]]]]]]]]]]] > > > > , but as I said before this is not hard to do after the fact once > > you have the parse tree. > > > > > > On Fri, Aug 27, 2021 at 11:15:54PM -0400, Riley Martinez-Lynch > > wrote: > > > Robin, I'd be happy to make whatever changes are needed to make it > > > work. I don't see the CLI interface as an essential part of the > > > interface, and if I can do something to make it easier to access > > > programmatically, I'd like to do that. Glad to take cues here, or > > > if you wanted to jump on a call or chat, can do that too. > > > > > > Sent from my iPhone > > > > > > > On Aug 26, 2021, at 10:11 PM, Robin Lee Powell < > > robinleepowell@gmail.com> wrote: > > > > > > > > =EF=BB=BF > > > > In service to making certain parts of the lojban.org infra a bit > > > > more resilient, I'm updating some stuff that uses > > > > https://github.com/lojban/python-camxes . This relies on java and > > > > the camxes jar, which, whatever, but it's also built on LEPL, which > > > > no longer works (see for example > > > > https://github.com/modoboa/modoboa/issues/1780 ). > > > > > > > > https://github.com/teleological/camxes-py is a pure Python > > > > replacement, but is a CLI program rather than a library; it's reall= y > > > > not designed to be used as a library. I'd love it if someone > > > > updated and fixed that. > > > > > > > > Unless there's another option? What's the state of the art in this > > > > space? > > > > > > --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/= lojban/20210829175938.GA1107525%40gmail.com.