coi la .ilmen.
I just applied your idea (added split entries) and added merged entries... And I also found a very simple way to add compound cmavo!
Indeed:
- I created a script that splits jbovlaste entries into cmavo and non-cmavo, by using a simple regex (using rules listed in the CLL, chapter 4.2)
- Then I tagged all cmavo with a flag "C", and added the Hunspell rule "CCC*" (~= "CC+"), which means you can "glue" 2 or more cmavo together.
Of course, this will allow un-grammatical things such as "lonulonucalo", but once again this is not the spell-checker role.
I tried your example "calonu". It seems the "lonu" entry exists, so my dictionary inteprets that as a "normal word" (= non-simple-cmavo) instead of a "compound cmavo". But all following combinations are now valid :
- ca, lo, nu
- lo nu, lonu, ca lo, calo
- ca lonu, calo nu, calonu
Only calo & calonu are detected as a compound (remember "lonu" is an entry), but anyway that works as expected.
Experimental cmavo support will be added soon.
Do you know other rules that could be great integrating?
I still have issues with dots in LibreOffice (.i .a and such)... And some words of "le cmalu noltru" are not recognized yet. Is there any other word source I can use?
co'o
--
Sukender