[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Re: Lojban tokenizer for machine learning, first version



Oleg Parashchenko <olpa@uucode.com> writes:

> I've just released the first version of a lojban tokenizer. It is intended 
> for use in machine learning applications and therefore is a bit different 
> from a linguistic tokenizer. In particular, it does sub-word tokenization.
>
> Additionally, there is a lexer, which can be used to develop alternative 
> tokenizers.

.uanai How is that different from any of the other Lojban parsers that
have been written?  I am interested in your lexer, however.  Which
version of the grammar did you use?  The PEG?  I'd be very curious to
see how your lexer distinguishes between lujvo and fu'ivla.

-- 
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/86letzxpe6.fsf%40cmarib.ramside.