[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: la cmaxes, a minimal morphology parser





Em terça-feira, 20 de dezembro de 2016 18:05:21 UTC+3, cogas uasanbon escreveu:


2015年12月25日金曜日 21時02分19秒 UTC+9 Gleki Arxokuna:
Here is a short peg.js parser of morphology of Lojban words.

Features:
1. only checks for morphology of words, the rest is thrown away. Hence, you don't need much prettification, a simple 
'[["cmavo","coi"],["cmavo","do"],["cmavo","mi"],["gismu","tavla"],["cmavo","do"]]'
is returned.
2. when you need a parser of minimal size. morfologi.js file, the compiled parser ready to use by _javascript_-compatible apps is under 30 kilobytes of uncompressed (but minified) _javascript_.
3. can help you study lojban morphology from PEG, which is easier to grasp when everything else is removed.
4. can help restore omitted spaces within compound cmavo and similar (so that you can apply your writing conventions)
5. somewhat faster than the full grammar parser when you run numerous queries. E.g. this parser is now used in la sutysisku app to automatically determine to which word class a given dictionary entry belongs.
6. doesn't support zoi ... zoi quotations (a separate preprocessor needed).

{zarrja}, {zallja}, {zammja}, {zannja} are grammatical in this ilmentufa. Is that a bug?

It detects word classes of grammatically correct words. 

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at https://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.