[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Spaces in jbovlaste



If spell checkers are only concerned with identifying what is a correct word and what isn't, then you should disregard Jbovlaste entries containing whitespace (they are multi-words lexemes), or even better, check all the words that compose them to see if any of them is missing from your spell-check whitelist (I strongly suspect there exists bu and zei compounds containing words that appears nowhere else in the dictionary…).

"re zei zgabube" is indeed a sequence of three words. It is present in the dictionary because it is an independent lexeme, you cannot accurately derive its meaning from its parts. This occurs all the times in natlangs, think for example to the English "take off".

As for cmavo sequences, people are allowed to chain them up without whitespaces in between (this causes no ambiguity), although nowadays it seems more common to always separate them with whitespaces. For a spell-checker, two strategy are possible: the lazy one would be to enforce the style of putting whitespaces between every cmavo, thus marking e.g. "lonu" as incorrect; the second strategy, more involved, would be to check any unknown letter string to see if it matchs a sequence of cmavo, and allow it if it does (e.g. if the program hits "calonu" and is able to find it can be a sequence of cmavo ca+lo+nu, only then it would allow it). But I don't know if the software you're using is able to do that without an explicit and systematic list of all allowable cmavo strings…

If the software were to need an explicit and exhaustive list of allowed words, I guess it wouldn't be very handy to use for very synthetic languages (e.g. Turkish, Quechua, Greenlandic…), which might have an infinite number of valid words.

—Ilmen.


On 27/07/2017 10:49, sukender1@gmail.com wrote:
coi ro do

I found entries with spaces in jbovlaste. This is an issue for spell checking dictionaries (actually in "aspell"). I know that spaces are not relevant when parsing Lojban, but they're still important for human reading. This is why I would not like a rule like "import every entry and remove spaces everywhere"...

So, I understand that it may be normal for compound cmavo, like "tai da'i", but can't these be written without space ("taida'i") without breaking the reading flow? However, some entries seem very strange to me, such as "re zei zgabube". Aren't these 3 separated words??

Thank you for your explanations.

co'o


--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at https://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.