[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lojban] Word break algorithm so far
On Monday 13 January 2003 11:06, Jorge Llambias wrote:
> la pier cusku di'e
>
> >3. Pick the first piece that has not been resolved.
> > C. If the piece does not end in 'y' or a consonant and has no
> > consonant that is adjacent to a consonant when 'y' is removed:
> > I. Number the consonants starting with 1 and find the last one whose
> > number is a power of 2.
> > II. If this consonant is the first letter in the piece or there are
> > no consonants, resolve the string as a cmavo.
> > III.If this consonant is not the first letter, split before it.
>
> Why do you need I, II and III? Shouldn't you just split before
> every consonant at this point?
I wrote the program to split once each time it examines a piece, or at most
twice, doing two different kinds of split. Given that constraint, this is the
most efficient way to break a piece that consists entirely of cmavo. If a
piece ends in a long string of BY, it hits another part of the algorithm that
takes quadratic time, so taking nlogn time on this is moot.
I have to check whether the consonant is the first letter, otherwise I would
break off a null piece, which is an error (though currently marked as a
cmavo).
phma