[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Word break algorithm so far



On Monday 13 January 2003 11:06, Jorge Llambias wrote:
> la pier cusku di'e
>
> >3.  Pick the first piece that has not been resolved.
> >   C.  If the piece does not end in 'y' or a consonant and has no
> > consonant that is adjacent to a consonant when 'y' is removed:
> >     I.  Number the consonants starting with 1 and find the last one whose
> >         number is a power of 2.
> >     II. If this consonant is the first letter in the piece or there are
> > no consonants, resolve the string as a cmavo.
> >     III.If this consonant is not the first letter, split before it.
>
> Why do you need I, II and III? Shouldn't you just split before
> every consonant at this point?

I wrote the program to split once each time it examines a piece, or at most 
twice, doing two different kinds of split. Given that constraint, this is the 
most efficient way to break a piece that consists entirely of cmavo. If a 
piece ends in a long string of BY, it hits another part of the algorithm that 
takes quadratic time, so taking nlogn time on this is moot.

I have to check whether the consonant is the first letter, otherwise I would 
break off a null piece, which is an error (though currently marked as a 
cmavo).

phma