Return-path: Envelope-to: lojban-list-archive@lojban.org Delivery-date: Sun, 29 Aug 2021 14:53:30 -0700 Received: from mail-lj1-f187.google.com ([209.85.208.187]:40481) by stodi.digitalkingdom.org with esmtps (TLS1.3) tls TLS_AES_128_GCM_SHA256 (Exim 4.94) (envelope-from ) id 1mKSk2-005Rni-Hs for lojban-list-archive@lojban.org; Sun, 29 Aug 2021 14:53:30 -0700 Received: by mail-lj1-f187.google.com with SMTP id y5-20020a2e5445000000b001d5e733d4afsf2626630ljd.7 for ; Sun, 29 Aug 2021 14:53:26 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1630274004; cv=pass; d=google.com; s=arc-20160816; b=GPi9hCXQWiIngLnx6CLsdUigxIUyD5u5y2a0bZqpPoNJ7RfJdtDHgL020trkCzrTTk wWuS3rFj6UFt1HuPVbe/0u4ZVJONa4LXYWi69YuDF78cNBDGe+KWK01F6Ey57+l6JqdX T0qZ5VQc1OYUgqNzy0q3ocHs/s8Yyfd18XXECp8kTgbM8hwd616mdX6zPI4d8KYFxctF 3QTiWS+LiB/9zb/nQPwlBXkVZqXKU3+go8xeZ3YEbZuC/6Kcp3jYF8Hkohw/I9J5CrRq 1tUg8zFIcRIWvvPYBKSRn1wfTKSSTtpoHKpudmRWkezWxTz0FL6sBpziMSD7J4gGhp3X pnnw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:sender:dkim-signature :dkim-signature; bh=YLxYzcq7EoWHci0JXXIs481vysQzLLAAffaX856rmK0=; b=zKQNXcY4HAoimrX+KxO4zDtvcBdDtjUUhf7LDsnTI74+9qMA2I2K5qIg5BEn/lhalu D+EEtjokbCVUnn6BbabpxPjldI8+TCQ/qUIlfjya/Bw0jIOIh6Wm7x4RKFVoMBQqPsqo 44XdYFfpP5XImOqh8CPohQJY2BbT4tqVsRtwx4VMbdaAP2+kvDO3VyKA9RUQsmzzgtP0 pCW1kCD1eW0WkfdbH8vRcz77CoVFoBerwFcmccLC+mz9Yq4nU41n7CzUR7OprzYPohLu E27kkCCrFaSw67lPkw4a591cACJLB4QDDuU8480KzEIyQjNh4N3NNw7E1c1yhkjmuAqM rSeQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="hqWCve/X"; spf=pass (google.com: domain of ilmen.pokebip@gmail.com designates 2a00:1450:4864:20::335 as permitted sender) smtp.mailfrom=ilmen.pokebip@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=YLxYzcq7EoWHci0JXXIs481vysQzLLAAffaX856rmK0=; b=gPFhvBbjmiR0+p/3ybxqAsfPxriRf/C/4IRNgtRiJHUq8mPWKvK7Om/iUXZ1uNq4yJ EbG6938qYenk0z0CUUedjVy8THS7bw0rGSsmEKRTPKrzym1Uy8cMjNNoxrlA3Ay1jdpk eECNs/7kvd0x6APrrO2N7i3rJqU5kfSfM1D529uSCe/qsN10Ix8MYHg9JvrGLFmzJ46B XVcLeVi36H9ChwxWBR/GESKrzS4szhL35jRsHkoKlqZ/7cBCe71rTwu5WbAoa+VRghM5 s/E5nUXgwtHWy/+fdZjiWYTKOcSdE4qCzPET6gX2r3IQul3vGSdPu6GN655xMq6zgWaa WbbQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=YLxYzcq7EoWHci0JXXIs481vysQzLLAAffaX856rmK0=; b=AX/hicQiSNtJyeYc8v7TGc1Q9z53khz8MjCwTunAvsW2EaxN5y3tfFs2UJ1UdHwf6A 2nfNsh6lhtpx0NmxzDeAdQJhsJkT77Zilv7gLrQALCqL7A/2h+EkXBWA/peUrcfqhrLu gt1VjDG4Go1tc1JjCsaP/6ye8D+uSjR9Fy8MVK5Iq5iacE7SezYO8RsZ/Ci1jG/2Hghs CnNQIS08tS00CaQhFFGpEmks5v+g3eZQUPsvOVjdtOwy5fGSJ2aBaBUKWsfYhRPoCesu lHJlWtlcOJcxWdYVHU31vrFW8v76SKCRcJquLOt8V416XHLPx1Qd1KVHD5t5oe9eCXL9 eeEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=YLxYzcq7EoWHci0JXXIs481vysQzLLAAffaX856rmK0=; b=tM2Vkfes0o1qlgzymBNz5eILWVcQ7tkOeHzhB+LFKTT68aIFabwxaaDS1/qDoheD59 q4+IDaY4XfBKeCK4eRd93BeGQYDJK1wOj3tNawCC0qzaJH8cHWAmjN60yFpk4YRtITZC W7y1PUpWTzDGcsPyOTS4NfWAcZFIlZse5Ub/35MdcUE5frMpwslxc8xbwz7bEC7v0Rdq erqNuFPv1OYvf1r0adLp1uqH4DGCI0XNqP2l5GNG1a6+oBQuljAebRZhqPzrThWoWtah dBntO0dgi8CPSGxTBvKhOKsQY2eeAnWQ126ERxqzmiK73bj2u6ROLKfT5wSzYVnDdp4e zFZw== Sender: lojban@googlegroups.com X-Gm-Message-State: AOAM531kXv2dIA55xo1Boi+xDyfqUNdkqkXvCLKdWQgrdn6BU1l83KVb QLIDHzoM+5xbDszcFzf0sJ0= X-Google-Smtp-Source: ABdhPJwu4QNGpw8O20MXJPz1cDu5CBRO4ZSK2i+pMowVQ0bM76e/hRPW5s6UcAzdc+ZA1mXJPvto0g== X-Received: by 2002:a05:6512:ad0:: with SMTP id n16mr3695524lfu.294.1630274004763; Sun, 29 Aug 2021 14:53:24 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 2002:a05:6512:ea5:: with SMTP id bi37ls3530330lfb.3.gmail; Sun, 29 Aug 2021 14:53:23 -0700 (PDT) X-Received: by 2002:ac2:5d49:: with SMTP id w9mr14817816lfd.450.1630274003511; Sun, 29 Aug 2021 14:53:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630274003; cv=none; d=google.com; s=arc-20160816; b=f1OErfGYUWjgNYcNjscgUPhFgIUuMkwqINvmPkftMzII6uE3s7Dpuy1bTId3/P1aO0 XpN2oKAEv3ND/B91B0/RgHWLXueuJu0cGrPzpbE6IfokRnTEtmeqtBUE3WPmkEOzo8vb +YDkjyxtBkhUoxrA+LAK3ZHHiFHGYlMc6IqVFeweJ48dsGjjnQi4a0gSgvRhdPQ37bEm Z+/H2fVjiDQKvHskpIvC7NzEkrbgkYN0SQer8i9FQ4M4sIQTcwUYOVVFK8O6o5jN3V/b XB7Db7t/rc6bhmkF8EZSTnTuk2F2sytCI95F/uJyi2mHsWQV6AJGhRI/bwQaRLNwBFMq n5jA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-language:content-transfer-encoding:in-reply-to:mime-version :user-agent:date:message-id:from:references:to:subject :dkim-signature; bh=1sKlB674zPq7IWI1ExaEIJyazuiwLWVL2kr9I05wq3g=; b=dBai6D/YiNT1h5d+qoRnmGKpDzQUGYcq7Z1xqLxqueXYnot+qLjhFJgPu/9N5x656B iqSqVHLTH/02oqaS9Dy7sb5BfH/y0m7FJyrtRTjcRLUiLy4a+pmzjgwBS4w/4HfSmarG PG3FK4pqIjEMWGdBTsLqzkh9KOjpBL6gUPi9jiBkiTNeNrgi+iv9sbzcuz4DCMdjTx/j FApHxJ71rmoUdPOga2meRlmXJ+8U+PlFHiSAEmltUXSZtXFr1hslTsrszm+bFCnUKIGl 3K++EzdYecdp7gIdm22e7JMm1qHAqyre1Hvv8jdDkVA9P4tm0gnWtWGG0l+JhCIfa7aK ImTA== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="hqWCve/X"; spf=pass (google.com: domain of ilmen.pokebip@gmail.com designates 2a00:1450:4864:20::335 as permitted sender) smtp.mailfrom=ilmen.pokebip@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com. [2a00:1450:4864:20::335]) by gmr-mx.google.com with ESMTPS id i12si785635lfc.10.2021.08.29.14.53.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Aug 2021 14:53:23 -0700 (PDT) Received-SPF: pass (google.com: domain of ilmen.pokebip@gmail.com designates 2a00:1450:4864:20::335 as permitted sender) client-ip=2a00:1450:4864:20::335; Received: by mail-wm1-x335.google.com with SMTP id d22-20020a1c1d16000000b002e7777970f0so13322099wmd.3 for ; Sun, 29 Aug 2021 14:53:23 -0700 (PDT) X-Received: by 2002:a05:600c:3556:: with SMTP id i22mr17807206wmq.104.1630274002948; Sun, 29 Aug 2021 14:53:22 -0700 (PDT) Received: from [192.168.0.107] (37-1-174-21.ip.skylogicnet.com. [37.1.174.21]) by smtp.googlemail.com with ESMTPSA id h11sm16186833wrx.9.2021.08.29.14.53.19 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Aug 2021 14:53:22 -0700 (PDT) Subject: [lojban] Re: Help parsing Lojban from Python? (Hey, Riley! :) To: lojban@googlegroups.com References: <20210827021139.GO309000@gmail.com> <20210828040224.GS309000@gmail.com> <20210829175938.GA1107525@gmail.com> From: Ilmen Message-ID: <7df1103d-6eab-3aa2-2147-2630325ad11e@gmail.com> Date: Sun, 29 Aug 2021 23:52:19 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210829175938.GA1107525@gmail.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: fr X-Original-Sender: ilmen.pokebip@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="hqWCve/X"; spf=pass (google.com: domain of ilmen.pokebip@gmail.com designates 2a00:1450:4864:20::335 as permitted sender) smtp.mailfrom=ilmen.pokebip@gmail.com; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -2.5 (--) X-Spam_score: -2.5 X-Spam_score_int: -24 X-Spam_bar: -- Javascript Camxes (Ilmentufa) is at: =E2=80=A2 https://github.com/lojban/ilmentufa/=20 =E2=80=83(repository) =E2=80=A2 https://lojban.github.io/ilmentufa/camxes.html=20 =E2=80=83(one of the HTML= =20 interfaces) =E2=80=A2 https://lojban.github.io/ilmentufa/glosser/glosser.htm=20 =E2=80=83(another H= TML=20 interface, allowing nested boxes output) Ilmentufa can also be used via command line by running "run_camxes.js"=20 with Node.js (see the readme file for details). =E2=80=94Ilmen. On 2021-08-29, Robin Lee Powell wrote: > Oh, it turns out I was looking in the "wrong" repo: > https://github.com/lojban/camxes-py has the Python 3 stuff done > already, by mezohe > > On Sun, Aug 29, 2021 at 10:12:56AM -0400, Riley Lynch wrote: >> I'm going to be spending most of the day driving, but before I do that, >> I'll try to address a few questions here, and then can follow up by mail= or >> IRC. >> >> (1) I want to confirm that camxes-py is the preferred Python option >> these days >> >> I'm not aware of other parsers in python. I specifically developed the >> parser because I wanted a python implementation to complement your java >> implementation and Masato and Ilmen's javascript parsers. > Well, there's https://github.com/lojban/python-camxes :) > >> I notice now that Randall Holmes has developed a Python PEG parser for >> Loglan. >> >> (2) I want to be able to run =E2=80=A6 it in a direct, straightforward = way =E2=80=A6 and >> tree should contain an obvious python representation of the >> parse tree >> (5) Make a mode that collapses productions with only one child >> >> Running will be the easy part. The representation of the parse tree rais= es >> some interesting questions. > I'm *far* more interested in someone else doing the running part, > FWIW; I feel competent to play with the parse tree after the fact, > but I don't really know idiomatic Python so if I try to make a > library out of what's there it's going to suck. > >> For camxes-py, I created a transformation of the parse tree which >> replicated the output of Ilmentufa. I did this so that I could run again= st >> the test corpus that you set up for java camxes and verify not only that >> the python parser could accept the same corpus as the java and javascrip= t >> parsers, but that it was comprehending the same structures. >> >> That said, the output exposes a lot of the mechanics of the parser >> specification and obscures the semantics. Ideally, I'd like for the test >> suites to target compatibility with a semantically-structured >> representation of the parse. There's been some work on Ilmentufa to >> post-process the parse tree into something more palatable. Have you take= n a >> look at that? > Nope, I actually didn't realize that ilmentufa was a thing until > this conversation. (I'd heard of it, but didn't know what it was.) > > (Side comment: the "About" page for both camxes and jboski now > points to all alternatives I'm aware of.) > >> (3) Update to most-recent parsimonious; it currently breaks on 0.8.1, bu= t >> works on 0.6.2 >> >> I wrote against the most recent version of parsimonious at that time. Gl= ad >> to see work has continued. I remember the author was working on some >> performance enhancements, and one problem with camxes-py in its current >> form is that it is slow. > Again, I'm perfectly happy to do that part, fwiw. > >> (4) Update to Python 3 >> >> I agree that this should be done. >> >> >> On Sat, Aug 28, 2021 at 12:02 AM Robin Lee Powell >> wrote: >> >>> Feel free to come find me on Libera IRC, or suggest a preferred chat >>> option for you. >>> >>> The stuff I want is actually quite simple, though: >>> >>> (1) I want to confirm that camxes-py is the preferred Python option >>> these days >>> >>> (2) I want to be able to run "run" (see >>> https://github.com/teleological/camxes-py/blob/master/camxes.py#L89 >>> ) or something like it in a direct, straightforward way, i.e.: >>> >>> import camxespy >>> tree =3D camxespy.run("mi klama", transformer=3D'camxes-morp= hology') >>> >>> , and tree should contain an obvious python representation of the >>> parse tree. >>> >>> This requires, AFAICT (I don't actually know Python very well) that >>> camxes-py have a library struture to it that it doesn't currently >>> have and that the options be configurable in some way other than >>> OptionParser. >>> >>> I can actually do all that myself, but I'm not really a pythonista >>> and what I do won't be idiomatic at all. >>> >>> Stretch goals: >>> >>> (3) Update to most-recent parsimonious; it currently breaks on >>> 0.8.1, but works on 0.6.2 >>> >>> (4) Update to Python 3, but I'm perfectly capable of making a PR for >>> this myself. >>> >>> (5) Make a mode that collapses productions with only one child, i.e. >>> make the output look like this (in terms of productions not syntax): >>> >>> rlpowell@stodi> echo "mi klama" | camxes -f >>> Flat layout requested. >>> text=3D( sentence=3D( CMAVO=3D( KOhA=3D( mi ) ) BRIVLA= =3D( gismu=3D( >>> klama ) ) ) ) >>> >>> Instead of this: >>> >>> root@66324b4aed4b:/src# python camxes.py "mi klama" >>> >>> ["text",["text_1",["paragraphs",["paragraph",["statement",["statement_1= ",["statement_2",["statement_3",["sentence",[["terms",["terms_1",["terms_2"= ,["abs_term",["abs_term_1",["sumti",["sumti_1",["sumti_2",["sumti_3",["sumt= i_4",["sumti_5",["sumti_6",["KOhA_clause",[["KOhA","mi"]]]]]]]]]]]]]]],["CU= "]],["bridi_tail",["bridi_tail_1",["bridi_tail_2",["bridi_tail_3",["selbri"= ,["selbri_1",["selbri_2",["selbri_3",["selbri_4",["selbri_5",["selbri_6",["= tanru_unit",["tanru_unit_1",["tanru_unit_2",["BRIVLA_clause",[["BRIVLA",["g= ismu","klama"]]]]]]]]]]]]]],["tail_terms",["VAU"]]]]]]]]]]]]]]] >>> >>> , but as I said before this is not hard to do after the fact once >>> you have the parse tree. >>> >>> >>> On Fri, Aug 27, 2021 at 11:15:54PM -0400, Riley Martinez-Lynch >>> wrote: >>>> Robin, I'd be happy to make whatever changes are needed to make it >>>> work. I don't see the CLI interface as an essential part of the >>>> interface, and if I can do something to make it easier to access >>>> programmatically, I'd like to do that. Glad to take cues here, or >>>> if you wanted to jump on a call or chat, can do that too. >>>> >>>> Sent from my iPhone >>>> >>>>> On Aug 26, 2021, at 10:11 PM, Robin Lee Powell < >>> robinleepowell@gmail.com> wrote: >>>>> =EF=BB=BF >>>>> In service to making certain parts of the lojban.org infra a bit >>>>> more resilient, I'm updating some stuff that uses >>>>> https://github.com/lojban/python-camxes . This relies on java and >>>>> the camxes jar, which, whatever, but it's also built on LEPL, which >>>>> no longer works (see for example >>>>> https://github.com/modoboa/modoboa/issues/1780 ). >>>>> >>>>> https://github.com/teleological/camxes-py is a pure Python >>>>> replacement, but is a CLI program rather than a library; it's really >>>>> not designed to be used as a library. I'd love it if someone >>>>> updated and fixed that. >>>>> >>>>> Unless there's another option? What's the state of the art in this >>>>> space? >>>>> --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/= lojban/7df1103d-6eab-3aa2-2147-2630325ad11e%40gmail.com.