Received: from mail-wr0-f188.google.com ([209.85.128.188]:33281) by stodi.digitalkingdom.org with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1db7Qg-0007v8-Kq for lojban-list-archive@lojban.org; Fri, 28 Jul 2017 08:43:58 -0700 Received: by mail-wr0-f188.google.com with SMTP id y41sf17622434wrd.0 for ; Fri, 28 Jul 2017 08:43:54 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1501256626; cv=pass; d=google.com; s=arc-20160816; b=Xk15nqI55mzXL/DJiC18ng93Ql6zvxxv4yu8mwMVCDhCB1T79gwchpIKJX0A0s/cPu FX8TaGCb9jYNvLBmadhDBLijzoh/3gKEt27tsPCB7HvXoJITc3E0auFXyJGHJjynN6ZR LSxfo8xEsqWQBw9IgZVrYe9Tr5A67pR6C9QYfCPZi7U7bMmwZ/bydYvsQ1vPzb3IUvcu sJ5vzuqWbwQSpPb/ZUqrsBrLHkqXqRYEvosjNU38hwvl7efkYGLm2b3axfbEh0Bp8/1S zZYUFueKLusYvXgHunHAJJBKLgMvHI+ek40n0LTmDNQdXplvJa6YttfuB1UbTa3MbFhS HA5A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:message-id:from:to:subject :content-transfer-encoding:mime-version:references:in-reply-to :user-agent:date:arc-authentication-results:arc-message-signature :sender:dkim-signature:arc-authentication-results; bh=6iq8Qz65VWBI3ZcjZjb5UdYrIU22ufqV3xtw+lM/n2k=; b=d8uYmOM82cVROLOR8MHh40ME88z7+8YGUNmBXC5wuoyzNrGOR+tDIjhanYRxPM15hO Qa+1UnfFO8pm98Y7CqUi7/LnT5Ft0Y1+0KknkFqzqWW/GEXbkDbXhK5NgmU+RN9b1MGe gmQ1B/HBhiElw/40YwtBkayrTxVaaPzE5pugNoraK6ON/7+pjtJhRQyiE6gUX3snyvtS TUmF4f1pp5fUPGsjVvBLDTyah/3MldDhanq3kYnnXN0cyif2hDuJmkvEElLdJMMMOYLz g+j4Iwh23WzCEWckwAPNMJAYzYVP6+0GFsK+RlmUF47zqY6u4s1g84+o0xjMn1as0QCA cnWg== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: best guess record for domain of sukender@free.fr designates 2a01:e0c:1:1599::14 as permitted sender) smtp.mailfrom=sukender@free.fr DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:user-agent:in-reply-to:references:mime-version :content-transfer-encoding:subject:to:from:message-id :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=6iq8Qz65VWBI3ZcjZjb5UdYrIU22ufqV3xtw+lM/n2k=; b=GOEpDen0bktWqEkCL6XkKi4qR6I49FWZIVzVbAu6vZnIj1ntCYUEiMfK6/HOG5FQQS OgsR3PDMn6FmOaJZCT9nN0ZNyrS5gUuevZ++H2s5OS39QnaFO2/n8mhxsTg3O+KVW6pw qvihObKDgENPDn9J8vzT9wElHEw0jWGfa9wkZGbH3wmTR++KurkO78H59egxKjMfOS12 J/rqnro/YebhR2PQc78CyPAcw8k6dQjrCDYMEp35os3BXVhLxXHQkEA9BcFf0Au/k1AI JyDPtQuG2GNcEPpn/+z1TlyXolo+O1fH1lkBlZlhqhjK5bF0NIuGR49GO59JROx5Balc QXpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:user-agent:in-reply-to:references :mime-version:content-transfer-encoding:subject:to:from:message-id :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=6iq8Qz65VWBI3ZcjZjb5UdYrIU22ufqV3xtw+lM/n2k=; b=HxxXI3WJ4Pr/+j5FNB6byyFhS4vlK0xTRNA5SfKjPApbuwlQlXvfzwfbNq/zC++B0i XGzO5PEShlbpfVYW9akFtkPpxSuAD5foqACib/QcdTGyL462Bem3b39IxL58jJ3eKq88 vcdeD8w4hff7btpK60vQjPK91CoQ0sI3Rs2bvjkxfi6xYrIh0d8+7OwtzOgN/I6h1q0v iaR++XsdNf2jFBta2Mdso69poRGvyrSEQkRZz2A1S8TodnyyUr2RNdRpScOB5W2EP4SX /fNnsTpVaF2dPT0lVtoXVkLpsp3Ozu5nZFi8R21hXyTqXpgM9RMlkn7fTVNZ6yTYJVvl 7aBQ== Sender: lojban@googlegroups.com X-Gm-Message-State: AIVw113g1JX1U9xHFns0/YP2J719nOlnoEx+cJBTG6BEhEu9WMQQvj+z AUEVx+lsfLcOtQ== X-Received: by 10.28.46.14 with SMTP id u14mr9240wmu.13.1501256626221; Fri, 28 Jul 2017 08:43:46 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.28.47.80 with SMTP id v77ls1096584wmv.9.gmail; Fri, 28 Jul 2017 08:43:45 -0700 (PDT) X-Received: by 10.28.109.146 with SMTP id b18mr415086wmi.30.1501256625388; Fri, 28 Jul 2017 08:43:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501256625; cv=none; d=google.com; s=arc-20160816; b=j5ttxh7qqykAAFy6l6MiEfRGFw3W27kj4+P6mfjO4zZur4CzTY1HjvrF0FTn6zfCY4 rSwbp6B9DodupfV8U8O4BZUBm6FAH+9bu6OpyFtTjxZlIUIRIs3tSxuI43s8vNcp2fZv 8oLO0eHkQWG/y7MJA7shX9BOeQfX47epy4ARPUa6fYdhAlrvvW7nsZQkEsdtj02on7Tl 2JnyVDNuPgNIjn/d/si4u6muy+IdHzSxs25+lMuT59Z+0bF5anUG254OzsAlrosqFsTf nZBI1GuBggVPY20nlKnEkYckJE/eOdJs7WTfD7mvqieswnt446fcFadLc8+IGYUE6vIV D4pQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:from:to:subject:content-transfer-encoding:mime-version :references:in-reply-to:user-agent:date:arc-authentication-results; bh=77DoljKgSDY/zfiHcG8+kBbuU53RnUSwyIVz4FcqYpk=; b=EOwvC7pco2OJxxV5nMUrlsOcbnfvMzNXBnhzfAEfgNUUy6MPPqe0WuSaNSEmDjYLDB hJ8RcyY0Exq7480rduLB+52kYMCE4EM7y/ivkWf9Fkb0z9eMNr7CDgTx9RiBAKCKRWyO a3N6gSx7F5PAKqjNIrDpRXhsYdeXbIx7tLUoLtdahECte0Yui3rbIzb67zoUOYETzP7d fXIeNKqn9AKQvsb4jspGFt7G8QP9oURvZFV9psRsZFzzEfXZawBwRpjH0lQWVbDf8dw3 rnP2RW7qVXMlKUBFI99eYR5CNwoKJJC6MQbly6ksw/petrXDRX6dH3a2yFg7r8b/HcT4 iZew== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: best guess record for domain of sukender@free.fr designates 2a01:e0c:1:1599::14 as permitted sender) smtp.mailfrom=sukender@free.fr Received: from smtp5-g21.free.fr (smtp5-g21.free.fr. [2a01:e0c:1:1599::14]) by gmr-mx.google.com with ESMTPS id r67si494726wmg.9.2017.07.28.08.43.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Jul 2017 08:43:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of sukender@free.fr designates 2a01:e0c:1:1599::14 as permitted sender) client-ip=2a01:e0c:1:1599::14; Received: from [10.167.57.184] (unknown [37.167.105.142]) (Authenticated sender: sukender@free.fr) by smtp5-g21.free.fr (Postfix) with ESMTPSA id B6BE15FFA8 for ; Fri, 28 Jul 2017 17:43:43 +0200 (CEST) Date: Fri, 28 Jul 2017 17:43:34 +0200 User-Agent: K-9 Mail for Android In-Reply-To: References: <3c86d96b-e0ea-af6b-2ee8-51d4e0741fe5@gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----D29PDLCRBI31QMOGUJ5NKU33TEUVOL" Content-Transfer-Encoding: 7bit Subject: Re: [lojban] Spaces in jbovlaste To: lojban@googlegroups.com From: Sukender Message-ID: <00784DD2-C6DC-45F9-9DDE-E2B64BD6A1CB@free.fr> X-Original-Sender: sukender@free.fr X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of sukender@free.fr designates 2a01:e0c:1:1599::14 as permitted sender) smtp.mailfrom=sukender@free.fr Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -4.0 (----) X-Spam_score: -4.0 X-Spam_score_int: -39 X-Spam_bar: ---- ------D29PDLCRBI31QMOGUJ5NKU33TEUVOL Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Tanks for the clarification. I didn't even imagine that this big random com= pound cmavo would be valid! You made my evening! ;-) About {lonu} entry, I clearly agree. But I can't filter all them out... Or = can I? If you get any (simple) idea of rule for that, then go ahead! By the way, I already filtered out a few words. I indeed found some of huge= length (even a weird one about Macarena!). As it may be spam, I added an a= rbitrary rule that throws away all that have more than 22 characters. Maybe= a finer rule has to be found... Cheers, --=20 Sukender Le 28 juillet 2017 17:33:14 CEST, Adam Lopresto a = =C3=A9crit : >If you're going to allow cmavo to be combined arbitrarily (which is >probably appropriate), then there's no reason for {lonu} to have its >own >entry. So I'd suggest not adding any cmavo clusters. > >And {lonulonucalo} can be grammatical, you just need the right text >after >it. {lonulonucalo nu jamna kei mi damba cu nandu mi cu se zungi mi}, "I >feel guilty that it was hard for me to fight during the war." As you >said, >a fully grammar checker would be needed to really get things right, and >that's a separate problem. > >On Fri, Jul 28, 2017 at 6:54 AM wrote: > >> coi la .ilmen. >> >> I just applied your idea (added split entries) and added merged >entries... >> And I also found a very simple way to add compound cmavo! >> Indeed: >> >> - I created a script that splits jbovlaste entries into cmavo and >> non-cmavo, by using a simple regex (using rules listed in the CLL, >chapter >> 4.2) >> - Then I tagged all cmavo with a flag "C", and added the Hunspell >rule >> "CCC*" (~=3D "CC+"), which means you can "glue" 2 or more cmavo >together. >> >> Of course, this will allow un-grammatical things such as >"lonulonucalo", >> but once again this is not the spell-checker role. >> >> I tried your example "calonu". It seems the "lonu" entry exists, so >my >> dictionary inteprets that as a "normal word" (=3D non-simple-cmavo) >instead >> of a "compound cmavo". But all following combinations are now valid : >> >> - ca, lo, nu >> - lo nu, lonu, ca lo, calo >> - ca lonu, calo nu, calonu >> >> Only calo & calonu are detected as a compound (remember "lonu" is an >> entry), but anyway that works as expected. >> Experimental cmavo support will be added soon. >> >> Do you know other rules that could be great integrating? >> Please test ( https://github.com/Sukender/lojban-spell-check-dist ) >and >> give feedback! ki'e >> >> I still have issues with dots in LibreOffice (.i .a and such)... And >some >> words of "le cmalu noltru" are not recognized yet. Is there any other >word >> source I can use? >> >> co'o >> >> -- >> Sukender >> >> -- >> You received this message because you are subscribed to the Google >Groups >> "lojban" group. >> To unsubscribe from this group and stop receiving emails from it, >send an >> email to lojban+unsubscribe@googlegroups.com. >> To post to this group, send email to lojban@googlegroups.com. >> Visit this group at https://groups.google.com/group/lojban. >> For more options, visit https://groups.google.com/d/optout. >> > >--=20 >You received this message because you are subscribed to a topic in the >Google Groups "lojban" group. >To unsubscribe from this topic, visit >https://groups.google.com/d/topic/lojban/Gt9PLcYVGuQ/unsubscribe. >To unsubscribe from this group and all its topics, send an email to >lojban+unsubscribe@googlegroups.com. >To post to this group, send email to lojban@googlegroups.com. >Visit this group at https://groups.google.com/group/lojban. >For more options, visit https://groups.google.com/d/optout. --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at https://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. ------D29PDLCRBI31QMOGUJ5NKU33TEUVOL Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Tanks for the clarification. I didn't even ima= gine that this big random compound cmavo would be valid! You made my evenin= g! ;-)

About {lonu} entry, I clearly agree. But I can't filter all them out...= Or can I? If you get any (simple) idea of rule for that, then go ahead!
By the way, I already filtered out a few words. I indeed found some of huge= length (even a weird one about Macarena!). As it may be spam, I added an a= rbitrary rule that throws away all that have more than 22 characters. Maybe= a finer rule has to be found...

Cheers,

--
Sukender



Le 28 juillet 2017 17:33:14 CEST, Adam L= opresto <adamlopresto@gmail.com> a =C3=A9crit :
If you're going to allow cmavo to be combined arbitrarily = (which is probably appropriate), then there's no reason for {lonu} to have = its own entry. So I'd suggest not adding any cmavo clusters.

And {lonulonucalo} can be grammatical, you just need the right text = after it. {lonulonucalo nu jamna kei mi damba cu nandu mi cu se zungi mi}, = "I feel guilty that it was hard for me to fight during the war." = As you said, a fully grammar checker would be needed to really get things r= ight, and that's a separate problem.

On Fri, Jul 28, 2017 at 6:54 AM <sukender1@gmail.com> wrote:
coi la .ilmen.

I just applied your idea (added split entries) and added merged entries.= .. And I also found a very simple way to add compound cmavo!
Inde= ed:
  • I created a script that splits jbovlaste entries into= cmavo and non-cmavo, by using a simple regex (using rules listed in the CL= L, chapter 4.2)
  • Then I tagged all cmavo with a flag "C", = and added the Hunspell rule "CCC*" (~=3D "CC+"), which = means you can "glue" 2 or more cmavo together.
Of course, this will allow un-grammatical things such as "lonu= lonucalo", but once again this is not the spell-checker role.

I tried your example "calonu". It seems the &q= uot;lonu" entry exists, so my dictionary inteprets that as a "nor= mal word" (=3D non-simple-cmavo) instead of a "compound cmavo&quo= t;. But all following combinations are now valid :
  • ca, lo= , nu
  • lo nu, lonu, ca lo, calo
  • ca lonu, calo nu, calon= u
Only calo & calonu are detected as a compound (remembe= r "lonu" is an entry), but anyway that works as expected.
Ex= perimental cmavo support will be added soon.

Do you know other rules that could be great integrating?



--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to
lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at https://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http= s://groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
------D29PDLCRBI31QMOGUJ5NKU33TEUVOL--