Sooner or later, we're going to need something that can go through
the corpus ( http://www.lojban.org/corpus/ ) and answer questions
like "show me all sentences in which the x3 of tubnu is filled", to
aid figuring out how to fix the various gismu list problems.
I've given some thought to this while working on my parser. There are problems, easy ones and increasingly difficult ones.
First of all, the corpus must be "cleaned" to pass a parser. Obvious mistakes can be corrected, incomprehensible sections removed and intentional deviations from the morphology and/or the grammar (like some stuff in Alice) commented out and provided with an alternative form passing the parser.
Forgetting lujvo, tanru and all the mess with connectives, things are pretty easy.
Enumerating the sumti in simple sentences with no FA/SE is a trivial exercise, and I've got a piece of code to do it at the output stage of my parser.
FA is slightly more involved but can probably also be done without any extra information from the syntax stage.
SE is more complicated as it involves backtracking when we want to get the sumti numbered relative to the base gismu.
Doing anything with lujvo requires a split form to see what needs to be done, ranging from easy (just a SE-rafsi + a single gismu) via hairy to well nigh impossible.
In this context tanru are more or less like split lujvo.
I haven't yet at all thought about the effect of connectives, but the added complexity probably ranges from trivial to hairy.
I'd start with the easy bits as it is always better to have something reasonably soon than a promised perfection probably never. I can add sumti enumeration including the FA/SE cases (excluding lujvo and tanru) to my parser pretty soon and the simplest lujvo case (SE-rafsi + gismu) as soon as I get a lujvo splitter done. At this stage I'd draw the line here. Pretty soon means sometime in August as I'll spend July doing something completely different,
like attending 60+ chamber music concerts, which doesn't, however, mean I can necessarily completely stop tinkering with ideas in my head, alas.
Veijo