Is the empty string (ε) really intended to be included in the lojban language, or is this a quirk of the machine grammar implementations?
It was done on purpose. It would be trivial to change the grammar to disallow it. If we disallow it, however, would we also have to consider the text "valsi si" ungrammatical?
It makes sense that you would want to keep that grammatical.
If it is included, as the parsers indicate, null lojban texts are to be found everywhere that you can't parse the beginning of a non-null lojban text.
I'm not sure I follow. The grammar can only parse one text at a time, it can't parse a string of texts, so I'm not sure I see what the problem is.
What I mean is that because the empty string is grammatical, a text will be discovered for any input, whether or not the input is grammatical and can be consumed. This is easy enough to correct for. What's proving to be a more difficult obstacle in using the parser to detect lojban texts is the "bare cmevla" rule: "lojban is an improved version of loglan" parses as a grammatical bare cmevla sequence. But that's a whole 'nother kettle of fish.
In any case, thank you.
mi'e la mukti mu'o