Received: from mail-ie0-f185.google.com ([209.85.223.185]:33501) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from ) id 1YkIDu-0001Kw-0V for lojban-beginners-archive@lojban.org; Mon, 20 Apr 2015 13:23:22 -0700 Received: by ierx19 with SMTP id x19sf36868418ier.0 for ; Mon, 20 Apr 2015 13:23:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:sender:list-subscribe:list-unsubscribe; bh=GyY1v7cWwYT4oZa5TokFdI17g66moRL/Amxzp0a6PAU=; b=Qr3nRfzcDXV1Xq5YffpC6/6hHzotxdjFHTTU11btD05a1T78B8kaD2VELLY75nwm+0 HLpLU4GSRZH2NJ/Wrk6DMj2xwDhOvImhnvA8F2b1rikJ/ptJ2mXgUgxYwCDf7aeAVbLA Mo8gRXEPTpm9HKTWWKBBXVo9Cl485cA0Vv8UPxYZBRIwoKYqVxssn1K2HaIdu6C/3TFZ 9OZJ2Au8Js3mRp8lzFstdYUSUjUhfL615QmogHAqriuH5ulAW6tKaUVRoRA5vBIB2NtE Csn09TNChyYaZWrOU027u92nT5yc86SqxAwawJ9GpiMdvlWRxwoWZ2szzkz/UoyZIXjM oYFg== X-Received: by 10.182.191.105 with SMTP id gx9mr127981obc.13.1429561392236; Mon, 20 Apr 2015 13:23:12 -0700 (PDT) X-BeenThere: lojban-beginners@googlegroups.com Received: by 10.182.78.71 with SMTP id z7ls375179obw.22.gmail; Mon, 20 Apr 2015 13:23:11 -0700 (PDT) X-Received: by 10.182.20.236 with SMTP id q12mr24891761obe.1.1429561390952; Mon, 20 Apr 2015 13:23:10 -0700 (PDT) Received: from mail-oi0-x235.google.com (mail-oi0-x235.google.com. [2607:f8b0:4003:c06::235]) by gmr-mx.google.com with ESMTPS id ux4si844378igb.1.2015.04.20.13.23.10 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 Apr 2015 13:23:10 -0700 (PDT) Received-SPF: pass (google.com: domain of lytlesw@gmail.com designates 2607:f8b0:4003:c06::235 as permitted sender) client-ip=2607:f8b0:4003:c06::235; Received: by mail-oi0-x235.google.com with SMTP id t201so136801504oif.3 for ; Mon, 20 Apr 2015 13:23:10 -0700 (PDT) X-Received: by 10.202.184.3 with SMTP id i3mr1887782oif.61.1429561390785; Mon, 20 Apr 2015 13:23:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.174.134 with HTTP; Mon, 20 Apr 2015 13:22:40 -0700 (PDT) In-Reply-To: <55340DC0.7000106@gmail.com> References: <55340DC0.7000106@gmail.com> From: MorphemeAddict Date: Mon, 20 Apr 2015 16:22:40 -0400 Message-ID: Subject: Re: [lojban-beginners] What actually are the rules of word formation? To: lojban-beginners@googlegroups.com Content-Type: multipart/alternative; boundary=001a113ce222e1eae305142db2b8 X-Original-Sender: lytlesw@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of lytlesw@gmail.com designates 2607:f8b0:4003:c06::235 as permitted sender) smtp.mail=lytlesw@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban-beginners@googlegroups.com Precedence: list Mailing-list: list lojban-beginners@googlegroups.com; contact lojban-beginners+owners@googlegroups.com List-ID: X-Google-Group-Id: 300742228892 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.6 (-) X-Spam_score: -1.6 X-Spam_score_int: -15 X-Spam_bar: - Content-Length: 32128 --001a113ce222e1eae305142db2b8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Very well done! As I was reading through it, I wanted more examples, but looking back for an example of no examples, I don't find any. stevo On Sun, Apr 19, 2015 at 4:19 PM, mezohe wrote: > de'i li 2015-04-19 ti'u li 16:18 li'ai TR NS di'e cusku: > >> When I first learned about loglan/lojban I thought all words were of the >> form `CVCCV` or `CCVCV`, and compound words (lujvo) were always >> `(CVC|CCV)...CV`. >> > > That seems correct for original, pre-rafsi Loglan. Words with other shape= s > were invented later. > > I knew names had more flexibility. Later I learned >> that borrowed words did too. Even later I learned that compound words >> too had more flexible rules, e.g. hypen letters. And finally in the end, >> starring me in the face, it dawned on me that even simple words like >> `brivla` did not fit the original formations. >> > > Par for the course. Most of the learning materials leave the morphology > unexplained, and even the CLL leaves some questions open. > > After reading a fair bit, I find myself more confused than ever. Are >> there any single set of definitive rules defining what and what isn't a >> legal lojban word? >> > > Each parser has its own set of word formation rules. Most words in use ar= e > recognized by all parsers but there are edge cases. I'll try to lay out h= ow > camxes does it, since the existing documents explain it a little > disjointedly. > > ------ MEZOHE'S CONDENSED CLL 2.0 MORPHOLOGY CHAPTER COUNTERFEIT ------ > > =3D=3D=3D Phonemes =3D=3D=3D > > At the most basic level, an utterance is made of phonemes. Here are the > main classes of phonemes (there are subclasses as seen later): > > - consonants {zunsna}: > bdgjvz (voiced), cfkpstx (unvoiced), lmnr (syllabic) > - glides {karmlisna}: i u > - h {me'o .y'y}: ' > - word break (glottal stop) {depybu'i}: . > - vowels {karsna}: a e i o u > - diphthongs: au ai ei oi > - y {me'o .ybu}: y > > The comma {me'o slaka bu} isn't a phoneme, but is used to separate > syllables for clarity. Removing it has no effect. > > i and u are vowels, unless a vowel or diphthong follows, in which case > they are glides. Glide-diphthong pairs win over glide-vowel pairs, which > win over diphthongs. > > At this level, strings of consonants follow these rules: > - consonants can be next to consonants, word breaks, vowels, > diphthongs, and y > - no consonant can be followed by itself > - voiced consonants can't be next to voiceless ones, and vice versa > - sibilants (cjsz) can't be next to each other > - x can't be next to c or k > - the substrings mz, nts, ntc, ndz, ndj are not allowed > > Glides must follow a word break, vowel, diphthong, or y, and be followed > by a vowel, diphthong, or y. i as a glide can't follow a diphthong ending > in i, and u as a glide can't follow the diphthong au. > > h can't be next to a consonant, glide, or glottal stop. > > Vowels, diphthongs, and y can be next to consonants, glides, h, and word > breaks. > > =3D=3D=3D Syllables =3D=3D=3D > > These are the shapes syllables {slaka} can have: > > * Vowel syllable > - a word break, a glide, or up to three consonants > - then a vowel or a diphthong > - then optionally a consonant > - e.g. .a, spa, pan, blaif, stra > > * h-syllable > - the letter ' > - then a vowel or diphthong > - then optionally a consonant > - e.g. 'u, 'ei, 'am > > * y-syllable > - a word break, a glide, or up to three consonants > - then the letter y > - e.g. by, .y, gry, zbly > > * hy-syllable > - the string "'y" > > * consonantal syllable {zunsnaslaka} > - a consonant > - then a syllabic consonant > - e.g. fl, sm, rn > > When a syllable starts with more than one consonant, the rules for these > clusters {zunsnagri} are more restrictive than the general ones above. > These are the permissible initial doubles, stolen with love from CLL: > > pl pr fl fr > bl br vl vr > > cp cf ct ck cm cn cl cr > jb jv jd jg jm > sp sf st sk sm sn sl sr > zb zv zd zg zm > > tc tr ts kl kr > dj dr dz gl gr > > ml mr xl xr > > And the permissible initial triples: > > cfr cfl sfr sfl jvr jvl zvr zvl > cpr cpl spr spl jbr jbl zbr zbl > ckr ckl skr skl jgr jgl zgr zgl > ctr str jdr zdr > cmr cml smr sml jmr jml zmr zml > > When segmenting text into syllables, when a consonant could possibly > either start a syllable or end one, it's always taken to start one. In > other words, onsets are greedy, codas are lazy. > > =3D=3D=3D Words =3D=3D=3D > > Words can be cmavo, cmevla, or brivla. cmavo and brivla are made of > syllables, while cmevla are free strings of phonemes. > > cmavo are composed of: > > - one vowel- or y-syllable, with at most one initial consonant and no > final consonant > - optionally followed by any number of h- or hy-syllables without any > final consonants > > Examples: .a, ba, bai, ba'i, ba'ai, by, by'i, ia, iai, iy, ua'ai'y > > There are two exceptions: "ybu", also spelled "y.bu", is a single cmavo > despite the medial consonant and word break, and "y" surrounded by word > breaks and not followed by "bu" is a word break itself, not a cmavo. > > cmavo can be stressed on any syllable. > > cmevla are arbitrary strings of phonemes, following phoneme but not > syllable restrictions, starting with a word break, containing no word > breaks, and ending with a consonant followed by a word break. They can be > stressed on any vowel, diphthong, or syllabic consonant. > > A brivla is composed of any number of initial rafsi followed by a final > rafsi. It must begin with a vowel syllable, end with a vowel- or > h-syllable, and have at least two syllables. It may not be a slinkuhi, an= d > may not start with a sequence of cmavo that yields a valid word when > removed. Stress (marked here with a grave accent) is on the second-last > vowel- or h-syllable. > > A final rafsi is: > > - a zihevla: > - a vowel syllable > - followed by any number of vowel, h-, or consonantal syllables > - followed by a vowel- or h-syllable with no final consonant > - is not a gismu or sequence of more than one rafsi > - e.g. cpi,k=C3=B9,ku =C3=A0l,ga f=C3=AC,pr,koi gl=C3=A0u,ka spr=C3= =A0,'e > - or a gismu: > - a CV vowel syllable followed by a CCV one > - or a CVC one then a CV one > - or a CCV one then a CV one > - e.g. p=C3=A0,stu v=C3=A8d,li ts=C3=A0,ni > - or a short final rafsi: > - a CVV or CCV vowel syllable, e.g. xau, cpa > - or a CV vowel syllable followed by a 'V h-syllable, > e.g. f=C3=A0'i > > An initial rafsi is any one of these: > > - a gismu followed by the syllable "'y" > e.g. fasnu'y > - a gismu with its final vowel replaced with y > e.g. fasny > - a zihevla followed by the syllable "'y" > e.g. sorpeka'y > - a CV vowel syllable followed by a Cy y-syllable > e.g. fa,ky > - a short y-less rafsi, unless the following rafsi is a zihevla rafsi: > - a vowel syllable of the form CVV, CVVr, CVC, or CCV > - or a CV syllable followed by a 'V or 'Vr syllable > e.g. gau gaur gas jbu li,'a li,'ar > - a short y-less rafsi followed by a short final rafsi followed by "'y" > e.g. cau,cni,'y ri,'ar,ju,'o,'y mul,fau,'y, jbo,jbe,'y > - a zihevla that ends in a vowel syllable with its final vowel replaced > with y, unless the result breaks up into a string of any other rafsi > e.g. ka,'or,ty a,sny > > If a CVVr or CV'Vr rafsi is followed by a rafsi beginning with "r", and > only then, the final "r" of the first rafsi is replaced with an "n". > If a rafsi ending in "y" is followed by a rafsi beginning with a vowel, > and only then, an "'" is prepended to the second rafsi. In other situatio= ns > where sticking two rafsi together violates phoneme or syllable rules, the > left rafsi needs to be replaced with one ending with "y". > > A brivla consisting of just a zihevla is called a zihevla, one consisting > of just a gismu is a gismu, and all others are called lujvo. > > A slinkuhi {valslinku'i} is a [consonant followed by a brivla that up to > its first y-syllable, or if no y-syllables, in its entirety, is composed = of > non-zihevla rafsi] that itself can't be broken up into a string of rafsi. > e.g. _p_r=C3=A0,'i _s_p=C3=B2r,te _z_bla,zd=C3=A0,vro _c_nar,jy,fra,= g=C3=A0,ri > > Other non-words also behave like slinkuhi, in that prepending a cmavo > makes them a word, but these arise from rules other than the one named > slinkuhi. > e.g. cpa cpau cpra cprau (brivla must have 2+ syllables) > cl,p=C3=A0r,nu (brivla must start with a vowel syllable) > > A tosmabru {valrtosmabru} is a sequence of cmavo followed by a brivla. > tosmabru can be coerced into being brivla by adding a consonant at the en= d > of the last syllable of the first cmavo. > > e.g. gau,tc=C3=AC,ni -> gau tcini; cmavo + gismu > gaur,tc=C3=AC,ni -> gaurtcini; a single lujvo > .a,'u,nain,mo -> .a'u nainmo; cmavo + zi'evla > .a,'ur,nain,mo -> .a'urnainmo; a single zihevla > boi,k=C3=A8i,foi -> boi k=C3=A8i foi; three cmavo > boir,k=C3=A8i,foi -> boirkeifoi; a single lujvo > > =3D=3D=3D Word breaks, glottal stops =3D=3D=3D > > All word breaks may be pronounced as glottal stops, and some word breaks > have to. Glottal stops are required before and after all cmevla, as well = as > before all words starting with a vowel or "y". They are also required aft= er > certain cmavo: > > - When pronouncing two words together would break a phonotactic rule, > they need to be separated with a glottal stop. > e.g. "au" "u=C3=A0n,mo" -> {.au .uanmo} > > - Each pair of cmavo of the form CV Cy followed by either a brivla or a > cmavo of the form CVV or CV'V needs a glottal stop between the last > and second-last word. > e.g. "ca" "vy" "c=C3=A0r,vi" -> {ca vy. carvi} /Sa.vy?.'Sar.vi/ > (/Sa.vy.'Sar.vi/ would be {cavycarvi}, a lujvo) > > - Every stressed cmavo followed by a brivla starting with a consonant > cluster needs a glottal stop after the cmavo. > e.g. "b=C3=A0" "sna,j=C3=B9,'i" -> {b=C3=A0. snaju'i} /'ba?.sna.'Zu.h= i/ > (/'ba.sna.'Zu.hi/ would be {basna j=C3=B9'i}, a gismu and a cmav= o) > > =3D=3D=3D Parser peculiarities =3D=3D=3D > > jbofihe, popular before camxes came along, has different rules than camxe= s. > > * Vowel syllables > > - They may start with any number of consonants, and the rule for > initial triples doesn't exist. The only restriction is that all > pairs in the initial cluster need to be valid initial pairs. > e.g. {stsmla'u} is a word > > - They may end with up to two consonants, not just one. > e.g. {bongnanba} is a word > > - Syllables beginning with glides are their own type, and if not > preceded by a glottal stop, they continue the word like an > h-syllable. > e.g. {.aierne} is one word, not two, > {.ia} always starts with a glottal stop > > - Syllables beginning with vowels don't require a word boundary > before them. > e.g. {sincrboa} is a word, {.joan.} is a word > > (Or, more accurately, jbofihe has no notion of syllables in the sense tha= t > camxes does, but even under jbofihe practically no one would use words th= at > violated these modified syllable rules) > > * cmevla > > Dotside doesn't apply: the beginning of cmevla can also be delimited by > some cmavo, namely {la}, {lai}, {la'i}, or {doi}. If one of these cmavo > precedes a cmevla, no initial glottal stop is required. cmevla can't > contain any of these cmavo. For example {la .larfin.} parses as three > words, "la" "la" "rfin" > > * brivla > > zihevla as final rafsi, rafsi beginning with vowels, and rafsi ending in > "'y" do not exist. > e.g. {bardykentauru}, {.algyro'i}, {sorpeka'ykla} aren't words > > rafsi with CVCy shape are illegal if the corresponding CVC rafsi is legal > in the situation. > e.g. {jbobanyjvo} isn't a word, only {jbobanjvo} is > > rafsi with CVVr or CV'Vr shape are only recognized as rafsi if using the > corresponding CVV or CV'V rafsi would result in tosmabru. > e.g. {lerpi'oci'arci'e} is a zihevla, > {lerpi'oci'aci'e} is a lujvo, > {ci'arci'e} is a lujvo > > All brivla must have a consonant cluster within the first five letters > after ' and y are removed. {ko'oinde} is not a word. > > ---------------------------------------------------------------------- > > I hope that I didn't overlook too many rules and that the text is fairly > understandable. Do tell if something is wrong or unclear. > > mu'o do > > > -- > You received this message because you are subscribed to the Google Groups > "Lojban Beginners" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to lojban-beginners+unsubscribe@googlegroups.com. > To post to this group, send email to lojban-beginners@googlegroups.com. > Visit this group at http://groups.google.com/group/lojban-beginners. > For more options, visit https://groups.google.com/d/optout. > --=20 You received this message because you are subscribed to the Google Groups "= Lojban Beginners" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban-beginners+unsubscribe@googlegroups.com. To post to this group, send email to lojban-beginners@googlegroups.com. Visit this group at http://groups.google.com/group/lojban-beginners. For more options, visit https://groups.google.com/d/optout. --001a113ce222e1eae305142db2b8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Very well done!

As I was reading through it, I wa= nted more examples, but looking back for an example of no examples, I don&#= 39;t find any.=C2=A0

stevo

On Sun, Apr 19, 2015 at 4:19 PM, mezohe <w= ow.jvs@gmail.com> wrote:
de= 'i li 2015-04-19 ti'u li 16:18 li'ai TR NS di'e cusku:
When I first learned about loglan/lojban I thought all words were of the form `CVCCV` or `CCVCV`, and compound words (lujvo) were always
`(CVC|CCV)...CV`.

That seems correct for original, pre-rafsi Loglan. Words with other shapes = were invented later.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 I knew names= had more flexibility. Later I learned
that borrowed words did too. Even later I learned that compound words
too had more flexible rules, e.g. hypen letters. And finally in the end, starring me in the face, it dawned on me that even simple words like
`brivla` did not fit the original formations.

Par for the course. Most of the learning materials leave the morphology une= xplained, and even the CLL leaves some questions open.

After reading a fair bit, I find myself more confused than ever. Are
there any single set of definitive rules defining what and what isn't a=
legal lojban word?

Each parser has its own set of word formation rules. Most words in use are = recognized by all parsers but there are edge cases. I'll try to lay out= how camxes does it, since the existing documents explain it a little disjo= intedly.

------ MEZOHE'S CONDENSED CLL 2.0 MORPHOLOGY CHAPTER COUNTERFEIT ------=

=3D=3D=3D Phonemes =3D=3D=3D

At the most basic level, an utterance is made of phonemes. Here are the mai= n classes of phonemes (there are subclasses as seen later):

- consonants {zunsna}:
=C2=A0 =C2=A0 bdgjvz (voiced), cfkpstx (unvoiced), lmnr (syllabic)
- glides {karmlisna}: i u
- h {me'o .y'y}: '
- word break (glottal stop) {depybu'i}: .
- vowels {karsna}: a e i o u
- diphthongs: au ai ei oi
- y {me'o .ybu}: y

The comma {me'o slaka bu} isn't a phoneme, but is used to separate = syllables for clarity. Removing it has no effect.

i and u are vowels, unless a vowel or diphthong follows, in which case they= are glides. Glide-diphthong pairs win over glide-vowel pairs, which win ov= er diphthongs.

At this level, strings of consonants follow these rules:
- consonants can be next to consonants, word breaks, vowels,
=C2=A0 diphthongs, and y
- no consonant can be followed by itself
- voiced consonants can't be next to voiceless ones, and vice versa
- sibilants (cjsz) can't be next to each other
- x can't be next to c or k
- the substrings mz, nts, ntc, ndz, ndj are not allowed

Glides must follow a word break, vowel, diphthong, or y, and be followed by= a vowel, diphthong, or y. i as a glide can't follow a diphthong ending= in i, and u as a glide can't follow the diphthong au.

h can't be next to a consonant, glide, or glottal stop.

Vowels, diphthongs, and y can be next to consonants, glides, h, and word br= eaks.

=3D=3D=3D Syllables =3D=3D=3D

These are the shapes syllables {slaka} can have:

* Vowel syllable
=C2=A0 - a word break, a glide, or up to three consonants
=C2=A0 - then a vowel or a diphthong
=C2=A0 - then optionally a consonant
=C2=A0 - e.g. .a, spa, pan, blaif, stra

* h-syllable
=C2=A0 - the letter '
=C2=A0 - then a vowel or diphthong
=C2=A0 - then optionally a consonant
=C2=A0 - e.g. 'u, 'ei, 'am

* y-syllable
=C2=A0 - a word break, a glide, or up to three consonants
=C2=A0 - then the letter y
=C2=A0 - e.g. by, .y, gry, zbly

* hy-syllable
=C2=A0 - the string "'y"

* consonantal syllable {zunsnaslaka}
=C2=A0 - a consonant
=C2=A0 - then a syllabic consonant
=C2=A0 - e.g. fl, sm, rn

When a syllable starts with more than one consonant, the rules for these cl= usters {zunsnagri} are more restrictive than the general ones above. These = are the permissible initial doubles, stolen with love from CLL:

=C2=A0 =C2=A0 pl pr=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0fl fr
=C2=A0 =C2=A0 bl br=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0vl vr

=C2=A0 =C2=A0 cp cf=C2=A0 =C2=A0 =C2=A0 ct ck cm cn=C2=A0 =C2=A0 =C2=A0 cl = cr
=C2=A0 =C2=A0 jb jv=C2=A0 =C2=A0 =C2=A0 jd jg jm
=C2=A0 =C2=A0 sp sf=C2=A0 =C2=A0 =C2=A0 st sk sm sn=C2=A0 =C2=A0 =C2=A0 sl = sr
=C2=A0 =C2=A0 zb zv=C2=A0 =C2=A0 =C2=A0 zd zg zm

=C2=A0 =C2=A0 tc tr=C2=A0 =C2=A0 =C2=A0 ts=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0kl kr
=C2=A0 =C2=A0 dj dr=C2=A0 =C2=A0 =C2=A0 dz=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0gl gr

=C2=A0 =C2=A0 ml mr=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0xl xr

And the permissible initial triples:

=C2=A0 =C2=A0 cfr cfl sfr sfl=C2=A0 =C2=A0jvr jvl zvr zvl
=C2=A0 =C2=A0 cpr cpl spr spl=C2=A0 =C2=A0jbr jbl zbr zbl
=C2=A0 =C2=A0 ckr ckl skr skl=C2=A0 =C2=A0jgr jgl zgr zgl
=C2=A0 =C2=A0 ctr=C2=A0 =C2=A0 =C2=A0str=C2=A0 =C2=A0 =C2=A0 =C2=A0jdr=C2= =A0 =C2=A0 =C2=A0zdr
=C2=A0 =C2=A0 cmr cml smr sml=C2=A0 =C2=A0jmr jml zmr zml

When segmenting text into syllables, when a consonant could possibly either= start a syllable or end one, it's always taken to start one. In other = words, onsets are greedy, codas are lazy.

=3D=3D=3D Words =3D=3D=3D

Words can be cmavo, cmevla, or brivla. cmavo and brivla are made of syllabl= es, while cmevla are free strings of phonemes.

cmavo are composed of:

- one vowel- or y-syllable, with at most one initial consonant and no
=C2=A0 final consonant
- optionally followed by any number of h- or hy-syllables without any
=C2=A0 final consonants

Examples: .a, ba, bai, ba'i, ba'ai, by, by'i, ia, iai, iy, ua&#= 39;ai'y

There are two exceptions: "ybu", also spelled "y.bu", i= s a single cmavo despite the medial consonant and word break, and "y&q= uot; surrounded by word breaks and not followed by "bu" is a word= break itself, not a cmavo.

cmavo can be stressed on any syllable.

cmevla are arbitrary strings of phonemes, following phoneme but not syllabl= e restrictions, starting with a word break, containing no word breaks, and = ending with a consonant followed by a word break. They can be stressed on a= ny vowel, diphthong, or syllabic consonant.

A brivla is composed of any number of initial rafsi followed by a final raf= si. It must begin with a vowel syllable, end with a vowel- or h-syllable, a= nd have at least two syllables. It may not be a slinkuhi, and may not start= with a sequence of cmavo that yields a valid word when removed. Stress (ma= rked here with a grave accent) is on the second-last vowel- or h-syllable.<= br>
A final rafsi is:

- a zihevla:
=C2=A0 - a vowel syllable
=C2=A0 - followed by any number of vowel, h-, or consonantal syllables
=C2=A0 - followed by a vowel- or h-syllable with no final consonant
=C2=A0 - is not a gismu or sequence of more than one rafsi
=C2=A0 - e.g. cpi,k=C3=B9,ku=C2=A0 =C3=A0l,ga=C2=A0 f=C3=AC,pr,koi=C2=A0 gl= =C3=A0u,ka=C2=A0 spr=C3=A0,'e
- or a gismu:
=C2=A0 - a CV vowel syllable followed by a CCV one
=C2=A0 - or a CVC one then a CV one
=C2=A0 - or a CCV one then a CV one
=C2=A0 - e.g. p=C3=A0,stu=C2=A0 v=C3=A8d,li=C2=A0 ts=C3=A0,ni
- or a short final rafsi:
=C2=A0 - a CVV or CCV vowel syllable, e.g. xau, cpa
=C2=A0 - or a CV vowel syllable followed by a 'V h-syllable,
=C2=A0 =C2=A0 e.g. f=C3=A0'i

An initial rafsi is any one of these:

- a gismu followed by the syllable "'y"
=C2=A0 =C2=A0 e.g. fasnu'y
- a gismu with its final vowel replaced with y
=C2=A0 =C2=A0 e.g. fasny
- a zihevla followed by the syllable "'y"
=C2=A0 =C2=A0 e.g. sorpeka'y
- a CV vowel syllable followed by a Cy y-syllable
=C2=A0 =C2=A0 e.g. fa,ky
- a short y-less rafsi, unless the following rafsi is a zihevla rafsi:
=C2=A0 - a vowel syllable of the form CVV, CVVr, CVC, or CCV
=C2=A0 - or a CV syllable followed by a 'V or 'Vr syllable
=C2=A0 =C2=A0 e.g. gau=C2=A0 gaur=C2=A0 gas=C2=A0 jbu=C2=A0 li,'a=C2=A0= li,'ar
- a short y-less rafsi followed by a short final rafsi followed by "&#= 39;y"
=C2=A0 =C2=A0 e.g. cau,cni,'y=C2=A0 ri,'ar,ju,'o,'y=C2=A0 m= ul,fau,'y,=C2=A0 jbo,jbe,'y
- a zihevla that ends in a vowel syllable with its final vowel replaced
=C2=A0 with y, unless the result breaks up into a string of any other rafsi=
=C2=A0 =C2=A0 e.g. ka,'or,ty=C2=A0 a,sny

If a CVVr or CV'Vr rafsi is followed by a rafsi beginning with "r&= quot;, and only then, the final "r" of the first rafsi is replace= d with an "n".
If a rafsi ending in "y" is followed by a rafsi beginning with a = vowel, and only then, an "'" is prepended to the second rafsi= . In other situations where sticking two rafsi together violates phoneme or= syllable rules, the left rafsi needs to be replaced with one ending with &= quot;y".

A brivla consisting of just a zihevla is called a zihevla, one consisting o= f just a gismu is a gismu, and all others are called lujvo.

A slinkuhi {valslinku'i} is a [consonant followed by a brivla that up t= o its first y-syllable, or if no y-syllables, in its entirety, is composed = of non-zihevla rafsi] that itself can't be broken up into a string of r= afsi.
=C2=A0 e.g. _p_r=C3=A0,'i=C2=A0 _s_p=C3=B2r,te=C2=A0 _z_bla,zd=C3=A0,vr= o=C2=A0 _c_nar,jy,fra,g=C3=A0,ri

Other non-words also behave like slinkuhi, in that prepending a cmavo makes= them a word, but these arise from rules other than the one named slinkuhi.=
=C2=A0 e.g. cpa=C2=A0 cpau=C2=A0 cpra=C2=A0 cprau=C2=A0 (brivla must have 2= + syllables)
=C2=A0 =C2=A0 =C2=A0 =C2=A0cl,p=C3=A0r,nu=C2=A0 (brivla must start with a v= owel syllable)

A tosmabru {valrtosmabru} is a sequence of cmavo followed by a brivla. tosm= abru can be coerced into being brivla by adding a consonant at the end of t= he last syllable of the first cmavo.

=C2=A0 e.g. gau,tc=C3=AC,ni -> gau tcini; cmavo + gismu
=C2=A0 =C2=A0 =C2=A0 =C2=A0gaur,tc=C3=AC,ni -> gaurtcini; a single lujvo=
=C2=A0 =C2=A0 =C2=A0 =C2=A0.a,'u,nain,mo -> .a'u nainmo; cmavo += zi'evla
=C2=A0 =C2=A0 =C2=A0 =C2=A0.a,'ur,nain,mo -> .a'urnainmo; a sing= le zihevla
=C2=A0 =C2=A0 =C2=A0 =C2=A0boi,k=C3=A8i,foi -> boi k=C3=A8i foi; three c= mavo
=C2=A0 =C2=A0 =C2=A0 =C2=A0boir,k=C3=A8i,foi -> boirkeifoi; a single luj= vo

=3D=3D=3D Word breaks, glottal stops =3D=3D=3D

All word breaks may be pronounced as glottal stops, and some word breaks ha= ve to. Glottal stops are required before and after all cmevla, as well as b= efore all words starting with a vowel or "y". They are also requi= red after certain cmavo:

- When pronouncing two words together would break a phonotactic rule,
=C2=A0 they need to be separated with a glottal stop.
=C2=A0 =C2=A0 e.g. "au" "u=C3=A0n,mo" -> {.au .uanmo= }

- Each pair of cmavo of the form CV Cy followed by either a brivla or a
=C2=A0 cmavo of the form CVV or CV'V needs a glottal stop between the l= ast
=C2=A0 and second-last word.
=C2=A0 =C2=A0 e.g. "ca" "vy" "c=C3=A0r,vi" -&= gt; {ca vy. carvi} /Sa.vy?.'Sar.vi/
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(/Sa.vy.'Sar.vi/ would be {cavycarvi}= , a lujvo)

- Every stressed cmavo followed by a brivla starting with a consonant
=C2=A0 cluster needs a glottal stop after the cmavo.
=C2=A0 =C2=A0 e.g. "b=C3=A0" "sna,j=C3=B9,'i" ->= {b=C3=A0. snaju'i} /'ba?.sna.'Zu.hi/
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(/'ba.sna.'Zu.hi/ would be {basna= j=C3=B9'i}, a gismu and a cmavo)

=3D=3D=3D Parser peculiarities =3D=3D=3D

jbofihe, popular before camxes came along, has different rules than camxes.=

* Vowel syllables

=C2=A0 - They may start with any number of consonants, and the rule for
=C2=A0 =C2=A0 initial triples doesn't exist. The only restriction is th= at all
=C2=A0 =C2=A0 pairs in the initial cluster need to be valid initial pairs.<= br> =C2=A0 =C2=A0 =C2=A0 e.g. {stsmla'u} is a word

=C2=A0 - They may end with up to two consonants, not just one.
=C2=A0 =C2=A0 =C2=A0 e.g. {bongnanba} is a word

=C2=A0 - Syllables beginning with glides are their own type, and if not
=C2=A0 =C2=A0 preceded by a glottal stop, they continue the word like an =C2=A0 =C2=A0 h-syllable.
=C2=A0 =C2=A0 =C2=A0 e.g. {.aierne} is one word, not two,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0{.ia} always starts with a glottal= stop

=C2=A0 - Syllables beginning with vowels don't require a word boundary<= br> =C2=A0 =C2=A0 before them.
=C2=A0 =C2=A0 =C2=A0 e.g. {sincrboa} is a word, {.joan.} is a word

(Or, more accurately, jbofihe has no notion of syllables in the sense that = camxes does, but even under jbofihe practically no one would use words that= violated these modified syllable rules)

* cmevla

Dotside doesn't apply: the beginning of cmevla can also be delimited by= some cmavo, namely {la}, {lai}, {la'i}, or {doi}. If one of these cmav= o precedes a cmevla, no initial glottal stop is required. cmevla can't = contain any of these cmavo. For example {la .larfin.} parses as three words= , "la" "la" "rfin"

* brivla

zihevla as final rafsi, rafsi beginning with vowels, and rafsi ending in &q= uot;'y" do not exist.
=C2=A0 e.g. {bardykentauru}, {.algyro'i}, {sorpeka'ykla} aren't= words

rafsi with CVCy shape are illegal if the corresponding CVC rafsi is legal i= n the situation.
=C2=A0 e.g. {jbobanyjvo} isn't a word, only {jbobanjvo} is

rafsi with CVVr or CV'Vr shape are only recognized as rafsi if using th= e corresponding CVV or CV'V rafsi would result in tosmabru.
=C2=A0 e.g. {lerpi'oci'arci'e} is a zihevla,
=C2=A0 =C2=A0 =C2=A0 =C2=A0{lerpi'oci'aci'e} is a lujvo,
=C2=A0 =C2=A0 =C2=A0 =C2=A0{ci'arci'e} is a lujvo

All brivla must have a consonant cluster within the first five letters afte= r ' and y are removed. {ko'oinde} is not a word.

----------------------------------------------------------------------

I hope that I didn't overlook too many rules and that the text is fairl= y understandable. Do tell if something is wrong or unclear.

mu'o do


--
You received this message because you are subscribed to the Google Groups &= quot;Lojban Beginners" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban-beginners+unsubscribe@googlegroups.com.
To post to this group, send email to lojban-beginners@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban-beginners.=
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;Lojban Beginners" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lo= jban-beginners+unsubscribe@googlegroups.com.
To post to this group, send email to lojban-beginners@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban-beginners.
For more options, visit http= s://groups.google.com/d/optout.
--001a113ce222e1eae305142db2b8--