Re: [lojban] Spaces in jbovlaste

Subject: Re: [lojban] Spaces in jbovlaste

Date: Thu, 27 Jul 2017 06:43:39 -0700 (PDT)

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=M6Xw2gCYIxp1+s2MGTfAYUrm1EdEyPdGv91WurPSwW8=; b=OrHn82aSkGRGzezrp3grfG5xttOjD/pE5shSccF0pM9Ca8n7e5hvX1ZI6ArjOB+FF2 x48X4L2U4tIC05KH5kzzlKSbmkog3oX4NRpiS21WA+O2tHHumdrBcQO7SYPaJoGMAkN/ /GfDOmR/7wtMS83KJLdBQJ5Cq5sIzKUiQCO8paYABsq264NbVCH5jFZg1Q6KURTZni6g UO+UvjFunZVFU9VELaMSWyEx4BqowepGNn9a65ckrEjZyR+5HjB9tB5ui8sR9kMhO88Z 27tlz6SO0Jw4vazXj5a990iJVeb5iX2gXXdnPA/+DlLkIrCrLVzomVrxAA0LhnvRzm6u z0Rg==

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=M6Xw2gCYIxp1+s2MGTfAYUrm1EdEyPdGv91WurPSwW8=; b=R0d2fetv5YxXTpHyPolF7eqv1kyyEtGHBrwU9C00c7Whh7HOvwZ8WaorxWTjnVT07I xmHpnbQdB2VxTmrtPKVWTGWY2WWr6n2ET0wyrqjzJ+cL+oGaVI9EOAN1vEeD42+rS3Oe RNx4BX5o9t3TMUBREu/G0c7Gdcd1SGMNIjSEhuKQn39GVux+yIPIQzBawPdaKhAnywwP GqmDe2CguT0RRb9OD7oe+27GeGC4J39SfhWPIl+TxTYD2XMULTT9f2TtJO3PV2PsXt+j WqIsgc64xMcp/EdRJwsvDCr+rx/QTQwhNWPEBIoiP+GI1vbyk40ZhwI/HqbiKA4zUPo7 XIsA==

In-reply-to: <3c86d96b-e0ea-af6b-2ee8-51d4e0741fe5@gmail.com>

List-archive: <https://groups.google.com/group/lojba>

List-help: <https://groups.google.com/support/>, <mailto:lojban+help@googlegroups.com>

List-id: <lojban.googlegroups.com>

List-post: <https://groups.google.com/group/lojban/post>, <mailto:lojban@googlegroups.com>

List-subscribe: <https://groups.google.com/group/lojban/subscribe>, <mailto:lojban+subscribe@googlegroups.com>

List-unsubscribe: <mailto:googlegroups-manage+1004133512417+unsubscribe@googlegroups.com>, <https://groups.google.com/group/lojban/subscribe>

Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com

References: <b8da37b0-bc48-417f-ad27-6ba85424a312@googlegroups.com> <3c86d96b-e0ea-af6b-2ee8-51d4e0741fe5@gmail.com>

Reply-to: lojban@googlegroups.com

Sender: lojban@googlegroups.com

Le jeudi 27 juillet 2017 13:04:54 UTC+2, Ilmen a écrit :

If spell checkers are only concerned with identifying what is a correct
word and what isn't,

Exactly! For now, my first concern is to get a first step towards spell/grammar checking for common software (see the other thread). That's clearly a "better than nothing" idea... and yes, it's clearly sub-optimal.

then you should disregard Jbovlaste entries
containing whitespace (they are multi-words lexemes), or even better,
check all the words that compose them to see if any of them is missing
from your spell-check whitelist (I strongly suspect there exists bu and
zei compounds containing words that appears nowhere else in the
dictionary…).

Great! I'll do that. Thanks.

"re zei zgabube" is indeed a sequence of three words. It is present in
the dictionary because it is an independent lexeme, you cannot
accurately derive its meaning from its parts. This occurs all the times
in natlangs, think for example to the English "take off".

Okay. But as you mentioned, spell checkers only check spelling! So in the English ones, "take" and "off" are separated. The grammar checker, however, should detect the meaning of "take off" instead of "take" and "off" separately.

As for cmavo sequences, people are allowed to chain them up without
whitespaces in between (this causes no ambiguity), although nowadays it
seems more common to always separate them with whitespaces. For a
spell-checker, two strategy are possible: the lazy one would be to
enforce the style of putting whitespaces between every cmavo, thus
marking e.g. "lonu" as incorrect; the second strategy, more involved,
would be to check any unknown letter string to see if it matchs a
sequence of cmavo, and allow it if it does (e.g. if the program hits
"calonu" and is able to find it can be a sequence of cmavo ca+lo+nu,
only then it would allow it). But I don't know if the software you're
using is able to do that without an explicit and systematic list of all
allowable cmavo strings…

You're right. I guess I'll insert both "split" and "merged" jbovlaste entries ("tai da'i" and "taida'i"). But as long as the reference doesn't exhibit ALL possible combinations ("ca lo no", "ca lonu", "calonu", etc.), and as long as there are no subtle rules about generating "affixes" (ie. compounds words generation for spell checkers), then it would be hard being precise.

I'll start with a very basic spell checker and maybe add rules later on... if there are enough people willing to help! I'm clearly too few experienced in Lojban to easily find the rules which are the "most important". Do you think about a few rules that could be integrated?

I guess that the rule "a cmavo can follow a cmavo as suffix" could be nice, but I don't know how to implement it. I'm currently struggling with https://www.systutorials.com/docs/linux/man/4-hunspell/#lbAI

If the software were to need an explicit and exhaustive list of allowed
words, I guess it wouldn't be very handy to use for very synthetic
languages (e.g. Turkish, Quechua, Greenlandic…), which might have an
infinite number of valid words.

Well, that's the "affix" stuff I just wrote about. I don't know anything about those languages, but surely they have "good" affix/replacement rules in their dictionaries.

Anyway, thank you very much for clarification.

Sukender

lojban+unsubscribe@googlegroups.com

lojban@googlegroups.com

https://groups.google.com/group/lojban

https://groups.google.com/d/optout

Follow-Ups:

Re: [lojban] Spaces in jbovlaste
- From: sukender1@gmail.com

References:

[lojban] Spaces in jbovlaste
- From: sukender1@gmail.com
Re: [lojban] Spaces in jbovlaste
- From: Ilmen <ilmen.pokebip@gmail.com>