Received: from mail-wg0-f64.google.com ([74.125.82.64]:33613) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.80.1) (envelope-from ) id 1YjxHn-0002IF-GI for lojban-beginners-archive@lojban.org; Sun, 19 Apr 2015 15:01:59 -0700 Received: by wggz12 with SMTP id z12sf50078602wgg.0 for ; Sun, 19 Apr 2015 15:01:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=from:message-id:date:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=eBnXOPIBG6YE5mls61DM2wms0bHh5AZJ6tbaJ0IMDEQ=; b=ALlH38gAxaS8N3serCotqv0JYsNUOalYsnUuXmD6ABhFv0KgUJEj+mJUB6/BrU4ENw tLXtqLgNKFp6P3YaSGmaLbGdBmJxS2xluf3Wysh2OqRsPrnRbmkMt2pLXxtJD5lY8Uq2 Ix9G4g3SjlqFtbVa5BF6mI+0TZnqqMNTTtfBCovi62X4Onde5+qgT9py15kVzmAAUz7d 6kIGGOJmTUDmPYpGuvUqEXT+qeJECVXF1ul6lzp4nhuoxJohr7b5OpCD4Q3oXZSdo1LF jFX7ZKNCZDIQji1HvfEf66+oJ+i5x2Hn0DGvVSdSLvOAXuAVE286BVgYKlw0x5my48TJ it0g== X-Received: by 10.152.88.8 with SMTP id bc8mr154848lab.37.1429480908737; Sun, 19 Apr 2015 15:01:48 -0700 (PDT) X-BeenThere: lojban-beginners@googlegroups.com Received: by 10.152.225.134 with SMTP id rk6ls743115lac.83.gmail; Sun, 19 Apr 2015 15:01:47 -0700 (PDT) X-Received: by 10.112.138.2 with SMTP id qm2mr5860182lbb.19.1429480907005; Sun, 19 Apr 2015 15:01:47 -0700 (PDT) Received: by 10.194.57.100 with SMTP id h4mswjq; Sun, 19 Apr 2015 13:19:21 -0700 (PDT) X-Received: by 10.180.10.136 with SMTP id i8mr4707302wib.7.1429474761483; Sun, 19 Apr 2015 13:19:21 -0700 (PDT) Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com. [2a00:1450:400c:c00::22a]) by gmr-mx.google.com with ESMTPS id gt9si383426wib.2.2015.04.19.13.19.21 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 Apr 2015 13:19:21 -0700 (PDT) Received-SPF: pass (google.com: domain of wow.jvs@gmail.com designates 2a00:1450:400c:c00::22a as permitted sender) client-ip=2a00:1450:400c:c00::22a; Received: by wgin8 with SMTP id n8so159296092wgi.0 for ; Sun, 19 Apr 2015 13:19:21 -0700 (PDT) X-Received: by 10.194.89.70 with SMTP id bm6mr23603049wjb.146.1429474761332; Sun, 19 Apr 2015 13:19:21 -0700 (PDT) Received: from [172.30.130.89] ([77.73.245.178]) by mx.google.com with ESMTPSA id ei4sm216878wib.22.2015.04.19.13.19.19 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 Apr 2015 13:19:20 -0700 (PDT) From: mezohe Message-ID: <55340DC0.7000106@gmail.com> Date: Sun, 19 Apr 2015 22:19:12 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: lojban-beginners@googlegroups.com Subject: Re: [lojban-beginners] What actually are the rules of word formation? References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Original-Sender: wow.jvs@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of wow.jvs@gmail.com designates 2a00:1450:400c:c00::22a as permitted sender) smtp.mail=wow.jvs@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban-beginners@googlegroups.com Precedence: list Mailing-list: list lojban-beginners@googlegroups.com; contact lojban-beginners+owners@googlegroups.com List-ID: X-Google-Group-Id: 300742228892 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.6 (-) X-Spam_score: -1.6 X-Spam_score_int: -15 X-Spam_bar: - Content-Length: 12623 de'i li 2015-04-19 ti'u li 16:18 li'ai TR NS di'e cusku: > When I first learned about loglan/lojban I thought all words were of the > form `CVCCV` or `CCVCV`, and compound words (lujvo) were always > `(CVC|CCV)...CV`. That seems correct for original, pre-rafsi Loglan. Words with other=20 shapes were invented later. > I knew names had more flexibility. Later I learned > that borrowed words did too. Even later I learned that compound words > too had more flexible rules, e.g. hypen letters. And finally in the end, > starring me in the face, it dawned on me that even simple words like > `brivla` did not fit the original formations. Par for the course. Most of the learning materials leave the morphology=20 unexplained, and even the CLL leaves some questions open. > After reading a fair bit, I find myself more confused than ever. Are > there any single set of definitive rules defining what and what isn't a > legal lojban word? Each parser has its own set of word formation rules. Most words in use=20 are recognized by all parsers but there are edge cases. I'll try to lay=20 out how camxes does it, since the existing documents explain it a little=20 disjointedly. ------ MEZOHE'S CONDENSED CLL 2.0 MORPHOLOGY CHAPTER COUNTERFEIT ------ =3D=3D=3D Phonemes =3D=3D=3D At the most basic level, an utterance is made of phonemes. Here are the=20 main classes of phonemes (there are subclasses as seen later): - consonants {zunsna}: bdgjvz (voiced), cfkpstx (unvoiced), lmnr (syllabic) - glides {karmlisna}: i u - h {me'o .y'y}: ' - word break (glottal stop) {depybu'i}: . - vowels {karsna}: a e i o u - diphthongs: au ai ei oi - y {me'o .ybu}: y The comma {me'o slaka bu} isn't a phoneme, but is used to separate=20 syllables for clarity. Removing it has no effect. i and u are vowels, unless a vowel or diphthong follows, in which case=20 they are glides. Glide-diphthong pairs win over glide-vowel pairs, which=20 win over diphthongs. At this level, strings of consonants follow these rules: - consonants can be next to consonants, word breaks, vowels, diphthongs, and y - no consonant can be followed by itself - voiced consonants can't be next to voiceless ones, and vice versa - sibilants (cjsz) can't be next to each other - x can't be next to c or k - the substrings mz, nts, ntc, ndz, ndj are not allowed Glides must follow a word break, vowel, diphthong, or y, and be followed=20 by a vowel, diphthong, or y. i as a glide can't follow a diphthong=20 ending in i, and u as a glide can't follow the diphthong au. h can't be next to a consonant, glide, or glottal stop. Vowels, diphthongs, and y can be next to consonants, glides, h, and word=20 breaks. =3D=3D=3D Syllables =3D=3D=3D These are the shapes syllables {slaka} can have: * Vowel syllable - a word break, a glide, or up to three consonants - then a vowel or a diphthong - then optionally a consonant - e.g. .a, spa, pan, blaif, stra * h-syllable - the letter ' - then a vowel or diphthong - then optionally a consonant - e.g. 'u, 'ei, 'am * y-syllable - a word break, a glide, or up to three consonants - then the letter y - e.g. by, .y, gry, zbly * hy-syllable - the string "'y" * consonantal syllable {zunsnaslaka} - a consonant - then a syllabic consonant - e.g. fl, sm, rn When a syllable starts with more than one consonant, the rules for these=20 clusters {zunsnagri} are more restrictive than the general ones above.=20 These are the permissible initial doubles, stolen with love from CLL: pl pr fl fr bl br vl vr cp cf ct ck cm cn cl cr jb jv jd jg jm sp sf st sk sm sn sl sr zb zv zd zg zm tc tr ts kl kr dj dr dz gl gr ml mr xl xr And the permissible initial triples: cfr cfl sfr sfl jvr jvl zvr zvl cpr cpl spr spl jbr jbl zbr zbl ckr ckl skr skl jgr jgl zgr zgl ctr str jdr zdr cmr cml smr sml jmr jml zmr zml When segmenting text into syllables, when a consonant could possibly=20 either start a syllable or end one, it's always taken to start one. In=20 other words, onsets are greedy, codas are lazy. =3D=3D=3D Words =3D=3D=3D Words can be cmavo, cmevla, or brivla. cmavo and brivla are made of=20 syllables, while cmevla are free strings of phonemes. cmavo are composed of: - one vowel- or y-syllable, with at most one initial consonant and no final consonant - optionally followed by any number of h- or hy-syllables without any final consonants Examples: .a, ba, bai, ba'i, ba'ai, by, by'i, ia, iai, iy, ua'ai'y There are two exceptions: "ybu", also spelled "y.bu", is a single cmavo=20 despite the medial consonant and word break, and "y" surrounded by word=20 breaks and not followed by "bu" is a word break itself, not a cmavo. cmavo can be stressed on any syllable. cmevla are arbitrary strings of phonemes, following phoneme but not=20 syllable restrictions, starting with a word break, containing no word=20 breaks, and ending with a consonant followed by a word break. They can=20 be stressed on any vowel, diphthong, or syllabic consonant. A brivla is composed of any number of initial rafsi followed by a final=20 rafsi. It must begin with a vowel syllable, end with a vowel- or=20 h-syllable, and have at least two syllables. It may not be a slinkuhi,=20 and may not start with a sequence of cmavo that yields a valid word when=20 removed. Stress (marked here with a grave accent) is on the second-last=20 vowel- or h-syllable. A final rafsi is: - a zihevla: - a vowel syllable - followed by any number of vowel, h-, or consonantal syllables - followed by a vowel- or h-syllable with no final consonant - is not a gismu or sequence of more than one rafsi - e.g. cpi,k=C3=B9,ku =C3=A0l,ga f=C3=AC,pr,koi gl=C3=A0u,ka spr=C3= =A0,'e - or a gismu: - a CV vowel syllable followed by a CCV one - or a CVC one then a CV one - or a CCV one then a CV one - e.g. p=C3=A0,stu v=C3=A8d,li ts=C3=A0,ni - or a short final rafsi: - a CVV or CCV vowel syllable, e.g. xau, cpa - or a CV vowel syllable followed by a 'V h-syllable, e.g. f=C3=A0'i An initial rafsi is any one of these: - a gismu followed by the syllable "'y" e.g. fasnu'y - a gismu with its final vowel replaced with y e.g. fasny - a zihevla followed by the syllable "'y" e.g. sorpeka'y - a CV vowel syllable followed by a Cy y-syllable e.g. fa,ky - a short y-less rafsi, unless the following rafsi is a zihevla rafsi: - a vowel syllable of the form CVV, CVVr, CVC, or CCV - or a CV syllable followed by a 'V or 'Vr syllable e.g. gau gaur gas jbu li,'a li,'ar - a short y-less rafsi followed by a short final rafsi followed by "'y" e.g. cau,cni,'y ri,'ar,ju,'o,'y mul,fau,'y, jbo,jbe,'y - a zihevla that ends in a vowel syllable with its final vowel replaced with y, unless the result breaks up into a string of any other rafsi e.g. ka,'or,ty a,sny If a CVVr or CV'Vr rafsi is followed by a rafsi beginning with "r", and=20 only then, the final "r" of the first rafsi is replaced with an "n". If a rafsi ending in "y" is followed by a rafsi beginning with a vowel,=20 and only then, an "'" is prepended to the second rafsi. In other=20 situations where sticking two rafsi together violates phoneme or=20 syllable rules, the left rafsi needs to be replaced with one ending with=20 "y". A brivla consisting of just a zihevla is called a zihevla, one=20 consisting of just a gismu is a gismu, and all others are called lujvo. A slinkuhi {valslinku'i} is a [consonant followed by a brivla that up to=20 its first y-syllable, or if no y-syllables, in its entirety, is composed=20 of non-zihevla rafsi] that itself can't be broken up into a string of rafsi= . e.g. _p_r=C3=A0,'i _s_p=C3=B2r,te _z_bla,zd=C3=A0,vro _c_nar,jy,fra,g= =C3=A0,ri Other non-words also behave like slinkuhi, in that prepending a cmavo=20 makes them a word, but these arise from rules other than the one named=20 slinkuhi. e.g. cpa cpau cpra cprau (brivla must have 2+ syllables) cl,p=C3=A0r,nu (brivla must start with a vowel syllable) A tosmabru {valrtosmabru} is a sequence of cmavo followed by a brivla.=20 tosmabru can be coerced into being brivla by adding a consonant at the=20 end of the last syllable of the first cmavo. e.g. gau,tc=C3=AC,ni -> gau tcini; cmavo + gismu gaur,tc=C3=AC,ni -> gaurtcini; a single lujvo .a,'u,nain,mo -> .a'u nainmo; cmavo + zi'evla .a,'ur,nain,mo -> .a'urnainmo; a single zihevla boi,k=C3=A8i,foi -> boi k=C3=A8i foi; three cmavo boir,k=C3=A8i,foi -> boirkeifoi; a single lujvo =3D=3D=3D Word breaks, glottal stops =3D=3D=3D All word breaks may be pronounced as glottal stops, and some word breaks=20 have to. Glottal stops are required before and after all cmevla, as well=20 as before all words starting with a vowel or "y". They are also required=20 after certain cmavo: - When pronouncing two words together would break a phonotactic rule, they need to be separated with a glottal stop. e.g. "au" "u=C3=A0n,mo" -> {.au .uanmo} - Each pair of cmavo of the form CV Cy followed by either a brivla or a cmavo of the form CVV or CV'V needs a glottal stop between the last and second-last word. e.g. "ca" "vy" "c=C3=A0r,vi" -> {ca vy. carvi} /Sa.vy?.'Sar.vi/ (/Sa.vy.'Sar.vi/ would be {cavycarvi}, a lujvo) - Every stressed cmavo followed by a brivla starting with a consonant cluster needs a glottal stop after the cmavo. e.g. "b=C3=A0" "sna,j=C3=B9,'i" -> {b=C3=A0. snaju'i} /'ba?.sna.'Zu.hi= / (/'ba.sna.'Zu.hi/ would be {basna j=C3=B9'i}, a gismu and a cmavo= ) =3D=3D=3D Parser peculiarities =3D=3D=3D jbofihe, popular before camxes came along, has different rules than camxes. * Vowel syllables - They may start with any number of consonants, and the rule for initial triples doesn't exist. The only restriction is that all pairs in the initial cluster need to be valid initial pairs. e.g. {stsmla'u} is a word - They may end with up to two consonants, not just one. e.g. {bongnanba} is a word - Syllables beginning with glides are their own type, and if not preceded by a glottal stop, they continue the word like an h-syllable. e.g. {.aierne} is one word, not two, {.ia} always starts with a glottal stop - Syllables beginning with vowels don't require a word boundary before them. e.g. {sincrboa} is a word, {.joan.} is a word (Or, more accurately, jbofihe has no notion of syllables in the sense=20 that camxes does, but even under jbofihe practically no one would use=20 words that violated these modified syllable rules) * cmevla Dotside doesn't apply: the beginning of cmevla can also be delimited by=20 some cmavo, namely {la}, {lai}, {la'i}, or {doi}. If one of these cmavo=20 precedes a cmevla, no initial glottal stop is required. cmevla can't=20 contain any of these cmavo. For example {la .larfin.} parses as three=20 words, "la" "la" "rfin" * brivla zihevla as final rafsi, rafsi beginning with vowels, and rafsi ending in=20 "'y" do not exist. e.g. {bardykentauru}, {.algyro'i}, {sorpeka'ykla} aren't words rafsi with CVCy shape are illegal if the corresponding CVC rafsi is=20 legal in the situation. e.g. {jbobanyjvo} isn't a word, only {jbobanjvo} is rafsi with CVVr or CV'Vr shape are only recognized as rafsi if using the=20 corresponding CVV or CV'V rafsi would result in tosmabru. e.g. {lerpi'oci'arci'e} is a zihevla, {lerpi'oci'aci'e} is a lujvo, {ci'arci'e} is a lujvo All brivla must have a consonant cluster within the first five letters=20 after ' and y are removed. {ko'oinde} is not a word. ---------------------------------------------------------------------- I hope that I didn't overlook too many rules and that the text is fairly=20 understandable. Do tell if something is wrong or unclear. mu'o do --=20 You received this message because you are subscribed to the Google Groups "= Lojban Beginners" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban-beginners+unsubscribe@googlegroups.com. To post to this group, send email to lojban-beginners@googlegroups.com. Visit this group at http://groups.google.com/group/lojban-beginners. For more options, visit https://groups.google.com/d/optout.