Received: from mail-pf0-f184.google.com ([209.85.192.184]:36089) by stodi.digitalkingdom.org with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1dgeQM-0006ne-Fv for lojban-list-archive@lojban.org; Sat, 12 Aug 2017 14:58:28 -0700 Received: by mail-pf0-f184.google.com with SMTP id s86sf4394265pfd.3 for ; Sat, 12 Aug 2017 14:58:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=PXDLuPn9KwPWePZgQWynGT9G0g42k1GkuW/wvrTTsxI=; b=gr2xYjcKBCbT13mumKcgDvGSuELWvvUdKQFgofEUFN+hv2Rc7EGpEbvwhxboVngtzL R4lF0ozvupYzExqYU0dDlpumdYD/GvwcvfrfvooIzmo/URNHLAClA1892exk7AN/08BG dluVjGxmH1X1G4ubCFysj8ZW8iWkak8AE3J4RxCp87c354AkW3kALh8/BrD0VLJiWh29 fr2+oRAfYlpV+TTjYnI9IzvMzaSi1yJJvlBTZCuExbULk58Ps81cLB7zL9jPRqpkBg1W QLxapJ6dI5fkl3d0XmFmHRC7EhXwfTTTh0Zx4y8Bxjz1A86Tggc4CxBvL1+2KQn1Ndeb yWwA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=PXDLuPn9KwPWePZgQWynGT9G0g42k1GkuW/wvrTTsxI=; b=az+HVZHEYu/TFzABik3OHBqJW69czLzytSJctQVCgKv79ahn9wzGveYG3aSi9AC2mx MRCp/j/A8+gsV5ZJGSxmm+KgIhRBNVWkOHauvoHABxzN1w1tVk+Z8yDWn0LozCw0uoB5 qtQkswXx3WEpajIg4pyyTpE21Vemxa+w7ov8nNdbR80QWtIpSt9fwF76B74ttofTjsGr Rxea7Ogu5B2SWLHzDLxCFEqkv+u2HqRBgCYIRXt1CIjMmcx2LP4Z+ic9wpbbPyoycl/q sNigxpf6MWwSjOC6jwvCTrSyuBIpEH0cpPVxKdK+ACAuEaZov5ulgHpAPb86XV7U69D4 yl2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=PXDLuPn9KwPWePZgQWynGT9G0g42k1GkuW/wvrTTsxI=; b=cw3Z4yTyKpv5qMAdwxNRh3oT2utDhuT0NsmHOYJc3Rxkdzk4JT4uEsdcUCfA8nNn2e lsOQMWl7m/EYkFEYL82IcVgYOu8gSyLtLCSGwAGpAi4ZXt9zlTD0eSWXoDqLs9T7DCCl Ug+wvWmwYYSOKDo+tpUZtdcTSJvx4P2Rc2S+xGJWDpknxz6OWf6FVtHDcigsjoGAEo+3 Y6VbM+xvpDoZ4CxNBEXQhmW1sXBfySSwtV79I8OIImtxrGigmoQv96RnKPaW6rugPLmf 1zSN+4hc/dtPHFQJD4VIn23bqrOev53xVFFFQ1jodCSZNK5aIMdR8VyMRph8EgW65X7m 7MYw== Sender: lojban@googlegroups.com X-Gm-Message-State: AHYfb5i5tJ77mJUEjsNbu5Z5wZXTMUFHA+f47d1inoheAzjsjzA+NhJv z8+n1hswjpRUxA== X-Received: by 10.36.57.69 with SMTP id l66mr96361ita.1.1502575100131; Sat, 12 Aug 2017 14:58:20 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.107.172.2 with SMTP id v2ls3076776ioe.43.gmail; Sat, 12 Aug 2017 14:58:19 -0700 (PDT) X-Received: by 10.31.162.84 with SMTP id l81mr111720vke.15.1502575099679; Sat, 12 Aug 2017 14:58:19 -0700 (PDT) Date: Sat, 12 Aug 2017 14:58:19 -0700 (PDT) From: vpbroman@gmail.com To: lojban Message-Id: <1c9125b6-dced-4e61-b0c5-6e41725f3bdf@googlegroups.com> In-Reply-To: <3de4ca72-0437-4e08-b0cf-46a19cc3e3d6@googlegroups.com> References: <29218653-0d30-4c46-9fe9-f227ca6fbab1@googlegroups.com> <3de4ca72-0437-4e08-b0cf-46a19cc3e3d6@googlegroups.com> Subject: [lojban] Re: gismu database MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_3390_1290999147.1502575099423" X-Original-Sender: vpbroman@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -4.6 (----) X-Spam_score: -4.6 X-Spam_score_int: -45 X-Spam_bar: ---- ------=_Part_3390_1290999147.1502575099423 Content-Type: multipart/alternative; boundary="----=_Part_3391_1556450593.1502575099424" ------=_Part_3391_1556450593.1502575099424 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sources. The gismu, rafsi, rough gloss, and prolix definition came from jbovlaste. The short glosses were my improvements, based on the definitions. The mnemonics came from the etymology in the 6 languages, software=20 evaluated for relevance by similarity, then adjusted by hand. The frequency rank was my own mish-mash combination from several sources of= =20 frequency. The glosses for conversion sumti came from a flash-card deck, I forget=20 which, polished by myself. The types and case tags are my own homebrew, but they frequently come from= =20 annotations in the jbovlaste def itself. The example sentences are from la gleki's dictionary, from la muplis, and= =20 from my own creativity. The lujvo examples were software selected from a big file of lujvo that I= =20 analyzed -- but they ought to be chosen for interest. The classification of gismu will be based on a listing I found that sorts= =20 gismu in categories, which I am refining. Processing this stuff into a dictionary might be a little better than a=20 dump of jbovlaste, but would still call for proofreading. Spell checking of lojban is more algorithmic than for most languages. The gismu and cmavo lists are fixed, except for classes allowing for=20 experimental forms. The possible lujvo are not limited by a dictionary but by the morphology=20 algorithm. Even the fuhivla are open-ended, but limited by the rules. Still, a spell checker would usefully point out impossible words as=20 distinct from not-yet-defined words as needing attention. mihe bremenli On Saturday, August 12, 2017 at 12:59:03 AM UTC-7, Benoit Neil wrote: > > coi do > > Thanks for sharing. As I'm developing spell checking dictionaries, I have= =20 > questions... > > - What sources do you use for your database? > - Do you think it would be wise integrating your entries in my=20 > dictionaries, knowing the fact that I use jbovlaste as a unique source= for=20 > now? > - Do you think there might be some data I could use to improve spell= =20 > checking? > > ki'e .i co'o > > .i mi'e la .sykyndyr. > > Le samedi 12 ao=C3=BBt 2017 03:24:50 UTC+2, Vincent Broman a =C3=A9crit : >> >> In case this might be useful to others, and perhaps in hopes of gatherin= g=20 >> helpful diffs/patches if anyone does updates, >> I wanted to share my evolving database on the gismu. >> >> lojban_gismu_dict.txt >> https://app.box.com/s/tq9jcjlrwj5ah21hy0hldhd1bn25wo7m >> >> The format looks like this. >> >> # >> fanva: >> morji: >> klesi: >> >> A/fa: : : >> B/fe: : : >> C/fi: : : >> D/fo: : : >> E/fu: : : >> mupli: >> lujvo: >> >> cusku cus sku #14 >> fanva: express >> morji: express say >> klesi:=20 >> person A expresses or says text B for audience C via expressive medium D= ; >> A/fa: PRS: ACT: expresser >> B/fe: TXT: PRD: expressed words >> C/fi: PRS: DST: audience >> D/fo: THI: INS: expressive medium >> mupli: le gunka jatna cu cusku se duhu miha ba mutce gunka kei miha lo= =20 >> samselmri >> lujvo: cuskahi cuskuhi skudji skuspu biksku cnisku >> >> The short glosses are the actual English text one would put in between= =20 >> x1/A and x2/B, usually without a leading "is" or a trailing "of". >> The gismu categories are a future enhancement not yet present. >> The definition templates in many cases are simplified from the prolix=20 >> definitions usually found. >> The argument types and cases have mnemonics defined in the following=20 >> lists. They are pretty debatable. >> >> types-cases.txt >> https://app.box.com/s/26vo9vgncz0f1xrnukc79l8ftl7bczo5 >> >> The database includes all the standard gismu and defs, but a lot of othe= r=20 >> info in it is very incomplete. >> I am in process of adding in many of la gleki's example sentences from= =20 >> his dictionary, which are useful for learning. >> Enjoy. >> >> mihe bremenli >> >> --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at https://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. ------=_Part_3391_1556450593.1502575099424 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Sources.
The gismu, rafsi, rough gloss, and prolix defi= nition came from jbovlaste.
The short glosses were my improvements, base= d on the definitions.
The mnemonics came from the etymology in the 6 lan= guages, software evaluated for relevance by similarity, then adjusted by ha= nd.
The frequency rank was my own mish-mash combination from several sou= rces of frequency.
The glosses for conversion sumti came from a flash-ca= rd deck, I forget which, polished by myself.
The types and case tags are= my own homebrew, but they frequently come from annotations in the jbovlast= e def itself.
The example sentences are from la gleki's dictionary, = from la muplis, and from my own creativity.
The lujvo examples were soft= ware selected from a big file of lujvo that I analyzed -- but they ought to= be chosen for interest.
The classification of gismu will be based on a = listing I found that sorts gismu in categories, which I am refining.
Processing this stuff into a dictionary might be a little better than a du= mp of jbovlaste, but would still call for proofreading.

Spell checki= ng of lojban is more algorithmic than for most languages.
The gismu and = cmavo lists are fixed, except for classes allowing for experimental forms.<= br>The possible lujvo are not limited by a dictionary but by the morphology= algorithm.
Even the fuhivla are open-ended, but limited by the rules.Still, a spell checker would usefully point out impossible words as disti= nct from not-yet-defined words as needing attention.

mihe bremenli
On Saturday, August 12, 2017 at 12:59:03 AM UTC-7, Benoit Neil wrote:=
coi do

Thanks for sharing. As I'm developing spell che= cking dictionaries, I have questions...
  • What sources do y= ou use for your database?
  • Do you think it would be wise integrating= your entries in my dictionaries, knowing the fact that I use jbovlaste as = a unique source for now?
  • Do you think there might be some data I = =C2=A0could use to improve spell checking?
ki'e .i = co'o

.i mi'e la .sykyndyr.

Le sam= edi 12 ao=C3=BBt 2017 03:24:50 UTC+2, Vincent Broman a =C3=A9crit=C2=A0:
In case this might be= useful to others, and perhaps in hopes of gathering helpful diffs/patches = if anyone does updates,
I wanted to share my evolving database on the gi= smu.

lojban_gismu_dict.txt
https://app.box.com/s/tq9j= cjlrwj5ah21hy0hldhd1bn25wo7m

The format looks like this.
<gismu> <cvc-rafsi>=C2=A0=C2=A0=C2=A0=C2=A0 <ccv-rafsi&= gt;=C2=A0=C2=A0=C2=A0=C2=A0 <cvv-rafsi>=C2=A0=C2=A0=C2=A0=C2=A0 #<= frequency-rank>
fanva: <short-gloss>
morji: <mnemonics-co= gnates>
klesi: <gismu-category>
<predicate-template-with-= slots>
A/fa: <type-of-A>: <A-case-role>: <gloss-f= or-lo-gismu>
B/fe: <type-of-B>: <B-case-role>: <gloss-= for-lo-se-gismu>
C/fi: <type-of-C>: <C-case-role>: <gl= oss-for-lo-te-gismu>
D/fo: <type-of-D>: <D-case-role>: &l= t;gloss-for-lo-ve-gismu>
E/fu: <type-of-E>: <E-case-role>= : <gloss-for-lo-xe-gismu>
mupli: <example-sentence-filling-all-= slots>
lujvo: <examples-of-short-lujvo-with-these-short-= rafsi>

cusku=C2=A0=C2=A0 cus=C2=A0=C2=A0=C2=A0=C2=A0 sku=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 #14
fanv= a: express
morji: express say
klesi:
person A expresses or says t= ext B for audience C via expressive medium D;
A/fa: PRS: ACT: expresser<= br>B/fe: TXT: PRD: expressed words
C/fi: PRS: DST: audience
D/fo: THI= : INS: expressive medium
mupli: le gunka jatna cu cusku se duhu miha ba = mutce gunka kei miha lo samselmri
lujvo: cuskahi cuskuhi skudji skuspu b= iksku cnisku

The short glosses are the actual English text one would= put in between x1/A and x2/B, usually without a leading "is" or = a trailing "of".
The gismu categories are a future enhancement= not yet present.
The definition templates in many cases are simplified = from the prolix definitions usually found.
The argument types and cases = have mnemonics defined in the following lists. They are pretty debatable.
types-cases.txt
https://app.box.com/s/26vo9vgncz0f1xrn= ukc79l8ftl7bczo5

The database includes all the standard gis= mu and defs, but a lot of other info in it is very incomplete.
I am in p= rocess of adding in many of la gleki's example sentences from his dicti= onary, which are useful for learning.
Enjoy.

mihe bremenli

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http= s://groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_3391_1556450593.1502575099424-- ------=_Part_3390_1290999147.1502575099423--