For various reasons we may need stats of N-grams from Lojban corpus.
Not that it's hard to generate such stats.
But we first need to preprocess the log of our history:
Definitely, messages from "mensi", "livla" must be removed.
Anything else?
I'd like to eventually develop an algorithm of preprocessing this log.
Any help is welcomed.
But spreadsheets might be needed instead since list can be long.
PS. If you wonder where N-grams might be needed the immediate application is "collect most frequent phrases in Lojban and make a phrasebook out of that".