[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Word resolution algorithm so far



1.  Scan the line from left to right. Convert all spaces to pauses
    unless preceded by comma; convert space to comma if preceded by comma.
2.  Break at all pauses (cannot pause in the middle of a word).
3.  Pick the first piece that has not been resolved.
  A.  If the piece ends in a consonant:
    I.  Make a decapitalized copy of the string with commas removed.
    II. Search backward in the string for a place in the string that is
        preceded by "la", "lai", "la'i", or "doi" where the 'l' or 'd' is not
	immediately preceded by a consonant. (ala'um option off)
    II. Search backward in the string for a place in the string that is
        preceded by "la", "lai", "la'i", or "doi" where the 'l' or 'd' is not
        immediately preceded by a consonant and such that the character at
        that place is a consonant. (ala'um option on)
    III.If you found such a place:
      a.  Split before the place and call the second part a cmene.
      b.  If the second part does not begin with a consonant, resolve it as an
          error. (not necessary if ala'um option is on)
      c.  Search backward in the first part for a consonant. If it is not
          the first character, split before it and resolve the second part as
          a cmavo.
    IV. If you did not find such a place, resolve the piece as a cmene.
  B.  If the piece ends in 'y':
    I.  Search backward for a consonant.
    II. If you find one:
      a.  If it is preceded by a consonant, resolve the piece as an error.
      b.  If it is not preceded by a consonant, break before the consonant
          and resolve the second piece as a cmavo.
    III.If you do not find one, resolve the piece as a cmavo.
  C.  If the piece does not end in 'y' or a consonant and has no consonant
      that is adjacent to a consonant when 'y' is removed:
    I.  Number the consonants starting with 1 and find the last one whose
        number is a power of 2.
    II. If this consonant is the first letter in the piece or there are no
        consonants, resolve the string as a cmavo.
    III.If this consonant is not the first letter, split before it.
  C.  If the piece contains 'y' and no consonant following 'y' is followed
      two letters later, not counting apostrophes and commas, by a vowel,
      split it after 'y'. (e.g. ly.Ebucy.Obukybu.DENpabu)
  Z.  Resolve any other kind of piece as an error.
999.If there are any more pieces unresolved, return to step 3.

3.C is not implemented yet. The reason for writing it that way is that {kybu} 
stands for a single letter, so it is more natural to say {kybu.DENpabu} than 
{ky.buDENpabu}.

phma