Received: from mail-qt0-f184.google.com ([209.85.216.184]:35574) by stodi.digitalkingdom.org with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1db7Yb-0008FN-7k for lojban-list-archive@lojban.org; Fri, 28 Jul 2017 08:52:09 -0700 Received: by mail-qt0-f184.google.com with SMTP id t7sf8008326qta.2 for ; Fri, 28 Jul 2017 08:52:04 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1501257116; cv=pass; d=google.com; s=arc-20160816; b=pWvHvJr61CueNt6MwsigtrXaTUhKkC/0NtTUN8YBrmRXVWQxLr6g4TAAAebEoipJWJ kUiPAdLcU/M1j0VOQmrbxfJeU8ll4Q3tzmq2RGLshW1P65TchmDklFgYcqzh6xsqyaQs dbZumlKi40GrF7ExD1k1RdjPrfowAJWUIbY0Qqar6jvVYrnZfKlf8ESfuWtCcUpnRyPc CYDJ3hNREBDNmVGPiprgB84VQxiLb8t7AMwsVLkyMyf74zAuaJ+2FCGEHpEyjgFvvd5q qJWIFdYBBmq/PwI6nDdikj3Gx3QG4rneAIurUbyuqdtkmNm9sI89yx/VUA6Zd5p3+R/Y Pdlg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:in-reply-to:references:mime-version:arc-authentication-results :arc-message-signature:sender:dkim-signature:dkim-signature :arc-authentication-results; bh=kcO1Fu2hl1MlXCbSWWsZCWZVC6lGJf5yMszvBsJU17g=; b=AZInM9TtpKIRkUtsjvzq+4JD7ZGp0uKZPmEKk8vBLnmG7dGpb0crP61E7C3WM0IQTe p32Dy8MtYFHHeoZDYifRf5dzC+mqyX1kYNv8zbRMdPBUhBVBND6EUnHLkBNMjIAHfk4l wrioNDDSuw6YMydC1VglJ9ihu8qaN4BBEF7cTqEg7WKdS3Wz0tkPto+6xnuwIjTdZl53 EHj3i9V2eFD0c5c4GanOge0klGNrKvWHyocXLilg9k5Xou4TQkoPsWKD2DnWoAxtNRRi Nj/z3/t59O/6WVp2HRWjKEExnZC6HMfE1N3/9c6T7HlxhrRy9P51+nI5LFYmh38iSr0d VNWg== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.b=JsFjl8nV; spf=pass (google.com: domain of adamlopresto@gmail.com designates 2607:f8b0:400c:c08::236 as permitted sender) smtp.mailfrom=adamlopresto@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:mime-version:references:in-reply-to:from:date:message-id :subject:to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=kcO1Fu2hl1MlXCbSWWsZCWZVC6lGJf5yMszvBsJU17g=; b=s9zdHWTJYyJJTiOaHmOzpOVY3EDs+6EnLOXpp/xsZVStvsBX4rBwS7U1L6rHLlA6s4 vits0jp9Ex9JZvIdDyUwuQGrCj7VL/tREfD3zrOLg1VeXPb+TK/NYAx2+E0oZsh5oI2c Q6vrSwCJpnzhU9bgoex7hcHn3b49vPhNtwrRDb70LMSS0X2ix6wFCdHrnHg0hWgNO9eM gV4bXb5Y43+YHPy1cX89q9ZotuNo2lkmtijTiTm2Le1pF7lN6NmhChXqwz4v+JOLeWRF LB2t3xLlf2xyt8NKNlOgk9BaKEtBILkpTVI08N2ttiMcrBk9bHAk4lgZ+hKzA8ZwXfG3 3bGw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=kcO1Fu2hl1MlXCbSWWsZCWZVC6lGJf5yMszvBsJU17g=; b=djOPia4cYUH905Rg0SyTe6wxlR0/OgSRQ73LGTiTsQcWeNjUk65/EIuLFQSkTThOaW pYq/TVPe/V8JnesWQFhjZ7DxE4f/jBjU3BJ9HgPzW/6aIta1OYmQ82FFr0Ql804Zr5xd 5q5QWDM30na93Szn52OaXEZf6d7e7EB8SAftgbXgM0P2Ab9kSrUQQZgry2OHugzVYAaK BxniQEWA2mxOBjDBQ1YHXLLLyZHMe8dt22zj1Fpi6ztlooVkG/GuRUC8uqqb/440v4SS 7Y6AgJgVj08mWj+3xN/HJLSVuehMvd/WMBvUBhw4w5H8cHow8h+2IeppbKazGU7wH/La UFmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:mime-version:references:in-reply-to:from :date:message-id:subject:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=kcO1Fu2hl1MlXCbSWWsZCWZVC6lGJf5yMszvBsJU17g=; b=c/OgudJmMJf4EZcp/3gh67wt8tfQ4zjw7g5qzVHTRKWVC69ZM0qRe0jW/VeMKEff34 KWp8yEZUNZ+g40dpjFxIzVvsSDMMxCGj4DtbXkRIlbmsf5uGbmXa8hJrKCQIZzEgtOKp EhMrhAu7c3DGQbiIF44uwFC9uKTDZjnDgooqkMdVDRqiXp6kNSLJpKulW7Ep+ZcRg8Lz yAAUGl3iauhLxiXvDPQVQ8b15B6rqhEJBj3/kBLr73qVmI+rvlfarKXKUJnFmvnmJovL +390EXEgR9Y/uNNDDk+ViknHOfWFkC+B7Xqtm/detc95CS4C71BMS0l2ULK3TXJ4nnnE hJ0w== Sender: lojban@googlegroups.com X-Gm-Message-State: AIVw112h3WRZ4DDUo/yPMfTBG0d0NYFf1vXiNuf7/CHhDyMoEtcQLtAA WynUy9RCcBGDvQ== X-Received: by 10.36.34.209 with SMTP id o200mr322283ito.0.1501257115996; Fri, 28 Jul 2017 08:51:55 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.36.29.194 with SMTP id 185ls4229972itj.17.gmail; Fri, 28 Jul 2017 08:51:55 -0700 (PDT) X-Received: by 10.31.60.135 with SMTP id j129mr4806315vka.52.1501257115582; Fri, 28 Jul 2017 08:51:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501257115; cv=none; d=google.com; s=arc-20160816; b=Z/G8Ht6GmeVksfhGrqT0gp9URhUPOQOzlYbIR2Gs+Hu6vz19rsLlxFiO0qY/rc35vd pmObScsdTcAS96ptJd5mUavyDpCLSI63pCCqo3z+ALXYXYCUID4X7gdYbxr80RHx8huB VWOxddH7l9qDTw5Gkmp/kO2saibPTPqvvoI9fs9K5Y/jTPG/2Q0Fkjbr/wdnbjv4CuNS UF0jz1OL/d7zvSZxKPMcQANKiBdV/hGe+91bc7qzpURNeLEBj7uou7vS+fa1klItAWwu 1CpQP8WRe/SWlg44Crl9kW7WaDOdSgz7eN87IuTw9/pqzrGK6QSyZ1BTjT3LqmKycsaT rJcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature:arc-authentication-results; bh=EnqjxzBCs1Y1rgbRZ5ab3QEXpyTM4b5uz5YUsbQA7Uc=; b=ItSVr93gcJSAqpzixBzECrc2KOjmp9mlyEJqC8ULpNkJ0s/26M5FOv6ZlhVIsFRzv+ 1CpVtPzcqVO+zOOZ6LETjFX1ihTHl8ebWMrTYVrw0FpGbh+8/F6LDAYHIArpWynd2G1V jsBFV72ZnU7ckK46/d0gz5Njnhw/Y8LqaUAjSG1wQS9bZocy2vPkiA2VJp5vk4ry4Hss UvbeJCNcdrXWKLLRatQrnaILeP4+Sup03oaaS6wR7HQI/v8glijgu/wfWwWJ6XQxOiTq ycwZyERRx7EQ4NaH5wjQ8IUrIvuikPLxLB4u7h0JquKk0blAYe7tVLkmytAD0KbW+rjj Lm5g== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.b=JsFjl8nV; spf=pass (google.com: domain of adamlopresto@gmail.com designates 2607:f8b0:400c:c08::236 as permitted sender) smtp.mailfrom=adamlopresto@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: from mail-ua0-x236.google.com (mail-ua0-x236.google.com. [2607:f8b0:400c:c08::236]) by gmr-mx.google.com with ESMTPS id t21si187711vke.10.2017.07.28.08.51.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 28 Jul 2017 08:51:55 -0700 (PDT) Received-SPF: pass (google.com: domain of adamlopresto@gmail.com designates 2607:f8b0:400c:c08::236 as permitted sender) client-ip=2607:f8b0:400c:c08::236; Received: by mail-ua0-x236.google.com with SMTP id q25so147808757uah.1 for ; Fri, 28 Jul 2017 08:51:55 -0700 (PDT) X-Received: by 10.31.129.210 with SMTP id c201mr5323290vkd.175.1501257115197; Fri, 28 Jul 2017 08:51:55 -0700 (PDT) MIME-Version: 1.0 References: <3c86d96b-e0ea-af6b-2ee8-51d4e0741fe5@gmail.com> <00784DD2-C6DC-45F9-9DDE-E2B64BD6A1CB@free.fr> In-Reply-To: <00784DD2-C6DC-45F9-9DDE-E2B64BD6A1CB@free.fr> From: Adam Lopresto Date: Fri, 28 Jul 2017 15:51:44 +0000 Message-ID: Subject: Re: [lojban] Spaces in jbovlaste To: lojban@googlegroups.com Content-Type: multipart/alternative; boundary="001a1144f3f811667e055562a9a4" X-Original-Sender: adamlopresto@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.b=JsFjl8nV; spf=pass (google.com: domain of adamlopresto@gmail.com designates 2607:f8b0:400c:c08::236 as permitted sender) smtp.mailfrom=adamlopresto@gmail.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -2.0 (--) X-Spam_score: -2.0 X-Spam_score_int: -19 X-Spam_bar: -- --001a1144f3f811667e055562a9a4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable jbovlaste should already be filtered to contain only Lojban, and there are, broadly, three types of Lojban words: cmevla are everything that ends in a consonant brivla all contain a consonant cluster and end in a vowel cmavo optionally start with a single consonant, and consist entirely of vowels and apostrophes after that. So, I think you could filter all cmavo clusters by looking for anything that matches /.+[^aeiou'].*[aeiou]/ but doesn't match /[^aeiou'][^aeiou']/. Contains a non-vowel somewhere after the first letter, ends in a vowel, and doesn't contain a consonant cluster. At least, that seems like a good start. On Fri, Jul 28, 2017 at 10:43 AM Sukender wrote: > Tanks for the clarification. I didn't even imagine that this big random > compound cmavo would be valid! You made my evening! ;-) > > About {lonu} entry, I clearly agree. But I can't filter all them out... O= r > can I? If you get any (simple) idea of rule for that, then go ahead! > > By the way, I already filtered out a few words. I indeed found some of > huge length (even a weird one about Macarena!). As it may be spam, I adde= d > an arbitrary rule that throws away all that have more than 22 characters. > Maybe a finer rule has to be found... > > Cheers, > > > -- > Sukender > > > > > Le 28 juillet 2017 17:33:14 CEST, Adam Lopresto > a =C3=A9crit : >> >> If you're going to allow cmavo to be combined arbitrarily (which is >> probably appropriate), then there's no reason for {lonu} to have its own >> entry. So I'd suggest not adding any cmavo clusters. >> >> And {lonulonucalo} can be grammatical, you just need the right text afte= r >> it. {lonulonucalo nu jamna kei mi damba cu nandu mi cu se zungi mi}, "I >> feel guilty that it was hard for me to fight during the war." As you sai= d, >> a fully grammar checker would be needed to really get things right, and >> that's a separate problem. >> >> On Fri, Jul 28, 2017 at 6:54 AM wrote: >> >>> coi la .ilmen. >>> >>> I just applied your idea (added split entries) and added merged >>> entries... And I also found a very simple way to add compound cmavo! >>> Indeed: >>> >>> - I created a script that splits jbovlaste entries into cmavo and >>> non-cmavo, by using a simple regex (using rules listed in the CLL, c= hapter >>> 4.2) >>> - Then I tagged all cmavo with a flag "C", and added the Hunspell >>> rule "CCC*" (~=3D "CC+"), which means you can "glue" 2 or more cmavo= together. >>> >>> Of course, this will allow un-grammatical things such as "lonulonucalo"= , >>> but once again this is not the spell-checker role. >>> >>> I tried your example "calonu". It seems the "lonu" entry exists, so my >>> dictionary inteprets that as a "normal word" (=3D non-simple-cmavo) ins= tead >>> of a "compound cmavo". But all following combinations are now valid : >>> >>> - ca, lo, nu >>> - lo nu, lonu, ca lo, calo >>> - ca lonu, calo nu, calonu >>> >>> Only calo & calonu are detected as a compound (remember "lonu" is an >>> entry), but anyway that works as expected. >>> Experimental cmavo support will be added soon. >>> >>> Do you know other rules that could be great integrating? >>> Please test ( https://github.com/Sukender/lojban-spell-check-dist ) and >>> give feedback! ki'e >>> >>> I still have issues with dots in LibreOffice (.i .a and such)... And >>> some words of "le cmalu noltru" are not recognized yet. Is there any ot= her >>> word source I can use? >>> >>> co'o >>> >>> -- >>> Sukender >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "lojban" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to lojban+unsubscribe@googlegroups.com. >>> To post to this group, send email to lojban@googlegroups.com. >>> Visit this group at https://groups.google.com/group/lojban. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to lojban+unsubscribe@googlegroups.com. > To post to this group, send email to lojban@googlegroups.com. > Visit this group at https://groups.google.com/group/lojban. > For more options, visit https://groups.google.com/d/optout. > --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at https://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --001a1144f3f811667e055562a9a4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
jbovlaste should already be filtered to contain only Lojba= n, and there are, broadly, three types of Lojban words:
cmevla are ever= ything that ends in a consonant
brivla all contain a consonant cl= uster and end in a vowel
cmavo optionally start with a single con= sonant, and consist entirely of vowels and apostrophes after that.

So, I think you could filter all cmavo clusters by looking= for anything that matches /.+[^aeiou'].*[aeiou]/ but doesn't match= /[^aeiou'][^aeiou']/. Contains a non-vowel somewhere after the fir= st letter, ends in a vowel, and doesn't contain a consonant cluster.

At least, that seems like a good start.=C2=A0
<= /div>
On Fri, Jul 28, 2017 a= t 10:43 AM Sukender <sukender@free.f= r> wrote:
Tanks for the= clarification. I didn't even imagine that this big random compound cma= vo would be valid! You made my evening! ;-)

About {lonu} entry, I clearly agree. But I can't filter all them out...= Or can I? If you get any (simple) idea of rule for that, then go ahead!
By the way, I already filtered out a few words. I indeed found some of huge= length (even a weird one about Macarena!). As it may be spam, I added an a= rbitrary rule that throws away all that have more than 22 characters. Maybe= a finer rule has to be found...

Cheers,


--
Sukender




Le 28 juillet 2017 17:33:14 CEST, Adam L= opresto <ada= mlopresto@gmail.com> a =C3=A9crit :
If you're going to allow cmavo to be combined arbitrar= ily (which is probably appropriate), then there's no reason for {lonu} = to have its own entry. So I'd suggest not adding any cmavo clusters.
And {lonulonucalo} can be grammatical, you just need the r= ight text after it. {lonulonucalo nu jamna kei mi damba cu nandu mi cu se z= ungi mi}, "I feel guilty that it was hard for me to fight during the w= ar." As you said, a fully grammar checker would be needed to really ge= t things right, and that's a separate problem.

On Fri, Jul 28, 2017 at 6:54 AM <sukender1@gmail.com> wrote:
coi l= a .ilmen.

I just applied your idea (added split entries)= and added merged entries... And I also found a very simple way to add comp= ound cmavo!
Indeed:
  • I created a script that spl= its jbovlaste entries into cmavo and non-cmavo, by using a simple regex (us= ing rules listed in the CLL, chapter 4.2)
  • Then I tagged all cmavo w= ith a flag "C", and added the Hunspell rule "CCC*" (~= =3D "CC+"), which means you can "glue" 2 or more cmavo = together.
Of course, this will allow un-grammatical= things such as "lonulonucalo", but once again this is not the sp= ell-checker role.

I tried your example "calon= u". It seems the "lonu" entry exists, so my dictionary intep= rets that as a "normal word" (=3D non-simple-cmavo) instead of a = "compound cmavo". But all following combinations are now valid :<= /div>
  • ca, lo, nu
  • lo nu, lonu, ca lo, calo
  • c= a lonu, calo nu, calonu
Only calo & calonu are detected as= a compound (remember "lonu" is an entry), but anyway that works = as expected.
Experimental cmavo support will be added soon.

Do you know other rules that could be great integrating?

I still have issues with dots in LibreOffice (.i .a and such)... And some = words of "le cmalu noltru" are not recognized yet. Is there any o= ther word source I can use?

co'o

--=C2=A0
Sukender

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at https://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at https://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http= s://groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--001a1144f3f811667e055562a9a4--