[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban-beginners] Re: welcome and question about brivla recognizing
On Sunday 20 July 2008 03:54:55 Mateusz Grotek wrote:
> Hello.
> I'm new to lojban, recently started reading CLL, but have some questions
> about brivla recognition from speech stream.
> What is exact algorithm for doing it? I tried to create one, but it
> looks like i have to count letters before stress, what i don't wanna do.
> Is it really needed? (Because of something what is called "tosmabru
> failure" in book). And point 5b) in draft look for me somehow wrong, but
> maybe it's my fault. Could you explain it to me please?
I found a bug in the algorithm; I don't remember what it is. I modified it,
implemented it (called valfendi), and added support for fu'ivla rafsi. The
following is the part of it that resolves brivla:
----
3. Pick the first piece that has not been resolved.
li'o
E. If the piece contains a consonant followed two letters later, not
counting apostrophes and commas, by a vowel, and there is no 'y' after
the letter between the consonant and the vowel, then there is a (possibly
invalid) brivla in the piece.
I. Make a copy of the string, decapitalize all consonants, remove all
commas adjacent to consonants, and insert commas before consonant
clusters, between adjacent nondiphthong vowels, and after each pair
of vowels without a comma between them.
II. If the stress option is set and no vowel in the piece is stressed,
stress the vowel in the next-to-last syllable not counting syllables
which have 'y' in them.
III.Capitalize all letters in all syllables which have at least one capital
letter in them.
IV. Scan forward for a stressed vowel other than 'Y' after or at the first
CC or CyC consonant cluster, then scan forward to the end of the next
syllable, ignoring syllables with 'y' in them. If the next syllable is
itself stressed, reset the count.
V. If you reached the end of the word looking for a stressed vowel or the
next syllable, resolve the piece as an error. If the next syllable
begins with a non-initial consonant cluster, a vowel, or an apostrophe,
go back to IV and keep looking. If the next syllable begins with a
valid consonant cluster or single consonant, break before it and
consider the first part, which is a brivlavau. If there is no next
syllable, the whole piece is a brivlavau.
VI. If the piece does not have a CC or CyC consonant cluster in the first
five letters, not counting apostrophe, comma, or 'y', find the first
consonant after the first letter and break before it.
VI. If the piece has a consonant cluster in the first five letters, find
the first CC or CyC consonant cluster and check whether it is a valid
initial cluster (if it contains 'y' it is not) and whether the part
beginning there is a slinku'i (see below) or monosyllabic. If the CyC
contains a stressed 'Y', it is not a consonant cluster; thus
/ledYcIlta/ is {le dy cilta} "D's thread", but /ledycIlta/ is
{ledycilta} "hypha". If the part contains a diphthong and no other
vowels, it is monosyllabic even if there is a comma in the diphthong,
e.g. {pru,a}.
a. If the brivlavau begins with a consonant cluster, it is a valid
initial cluster, and the brivlavau is not a slinku'i and is not
monosyllabic, resolve it as a brivla.
b. If the brivlavau begins with a consonant cluster but the cluster is
not a valid initial cluster or the brivlavau is a slinku'i or
monosyllabic, resolve it as an error.
c. If the brivlavau does not begin with a consonant cluster, the cluster
is a valid initial cluster, and the part beginning there is not a
slinku'i and is not monosyllabic, break before the consonant cluster
and resolve the second part as a brivla.
d. If the brivlavau does not begin with a consonant cluster and the
cluster is not a valid initial cluster or the part beginning there
is a slinku'i, resolve the brivlavau as a brivla.
A slinku'i, as far as word breaking is concerned, is anything that matches
the regex
^C[raf3]*([gim]?$|[raf4]?y)
but does not match the regex
^[raf3]*([gim]?$|[raf4]?y)
where
C matches any consonant
[raf3] matches any 3-letter rafsi, meaningful or not (any CCV where CC is a
valid initial pair, CVC, or CVV where the VV is a diphthong allowed in lujvo)
[raf4] matches any 4-letter rafsi, meaningful or not (any CCVC where CC is a
valid initial pair, or CVCC for any CC)
[gim] matches any gismu, meaningful or not ([raf4]V).
Anything after the first 'y' is ignored. It has no effect on where to break the
word, only on whether the word is valid.
----
The tosmabru is handled by rule 3.E.VI.c. It is possible to find arbitrarily
long sequences in which changing one of the last five letters changes it from
brivla to tosmabru, but finding a long one in which both the brivla and the
valsrsmabru (the valsrtosmabru minus the CV cmavo) are sensible is pretty
difficult. So when listening to speech, you'll probably lex such words by
recognizing them, not by counting letters backward from valid or invalid pairs.
Pierre