[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lojban] Word break algorithm so far

To: lojban@yahoogroups.com
Subject: Re: [lojban] Word break algorithm so far
From: Pierre Abbat <phma@webjockey.net>
Date: Wed, 15 Jan 2003 08:14:33 -0500
In-reply-to: <F24b3yJmCSZb5HOcfBz00012c78@hotmail.com>
Organization: dis
References: <F24b3yJmCSZb5HOcfBz00012c78@hotmail.com>
User-agent: KMail/1.5

On Monday 13 January 2003 11:06, Jorge Llambias wrote:
> la pier cusku di'e
>
> >3.  Pick the first piece that has not been resolved.
> >   C.  If the piece does not end in 'y' or a consonant and has no
> > consonant that is adjacent to a consonant when 'y' is removed:
> >     I.  Number the consonants starting with 1 and find the last one whose
> >         number is a power of 2.
> >     II. If this consonant is the first letter in the piece or there are
> > no consonants, resolve the string as a cmavo.
> >     III.If this consonant is not the first letter, split before it.
>
> Why do you need I, II and III? Shouldn't you just split before
> every consonant at this point?

I wrote the program to split once each time it examines a piece, or at most 
twice, doing two different kinds of split. Given that constraint, this is the 
most efficient way to break a piece that consists entirely of cmavo. If a 
piece ends in a long string of BY, it hits another part of the algorithm that 
takes quadratic time, so taking nlogn time on this is moot.

I have to check whether the consonant is the first letter, otherwise I would 
break off a null piece, which is an error (though currently marked as a 
cmavo).

phma

References:
- Re: [lojban] Word break algorithm so far
  - From: "Jorge Llambias" <jjllambias@hotmail.com>

Prev by Date: [lojban] Le petit prince
Next by Date: Re: [lojban] Le petit prince
Previous by thread: Re: [lojban] Word break algorithm so far
Next by thread: lojban language names in jbovlaste
Index(es):
- Date
- Thread