Received: from mail-ve0-f187.google.com ([209.85.128.187]:56783) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1WWZmS-0002Lm-1L for lojban-list-archive@lojban.org; Sat, 05 Apr 2014 16:13:51 -0700 Received: by mail-ve0-f187.google.com with SMTP id oy12sf707220veb.4 for ; Sat, 05 Apr 2014 16:13:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe:content-type; bh=KMkIGwyGJhqbSWU/l8vcwNwpfIm1Lkqty36MNHbJCKE=; b=fhVmvFoHQBgqUZePrqXtJHyTdzIDHrva6FNGj5rv2j0U1al/Ju8S3LigluGbf2Z/XW Vgh8clTRDts7BxvWG7kCKHqeVxNZnIeazbPAImWj1pxSiw+RHeMo3Dj/7Vxi2ewQuYzK xeV5qQBl8XQ/cqUPDRMZpp8iPq66E6tm1Gz4sIv/VYGxrPo9raC/YgrHnYfMwh5/H8EB 4N+htPEuEiFoIgKe//oWA1ApARhXmblRHMlOXF2yeGF0ya/wk3gctv0HJyTLm8eMX4qG 6/8VbSK6K31UvKvba4rOuwUtMwxbx2UlDE1q2JOIJRHGSLkiXTrZG26I5sOX4svGtT5r /8BQ== X-Received: by 10.140.101.110 with SMTP id t101mr358413qge.5.1396739617593; Sat, 05 Apr 2014 16:13:37 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.140.47.197 with SMTP id m63ls1416815qga.58.gmail; Sat, 05 Apr 2014 16:13:37 -0700 (PDT) X-Received: by 10.236.179.66 with SMTP id g42mr10761504yhm.17.1396739617271; Sat, 05 Apr 2014 16:13:37 -0700 (PDT) Received: from mail-vc0-x241.google.com (mail-vc0-x241.google.com [2607:f8b0:400c:c03::241]) by gmr-mx.google.com with ESMTPS id d8si2672685vdv.2.2014.04.05.16.13.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 05 Apr 2014 16:13:37 -0700 (PDT) Received-SPF: pass (google.com: domain of nictytan@gmail.com designates 2607:f8b0:400c:c03::241 as permitted sender) client-ip=2607:f8b0:400c:c03::241; Received: by mail-vc0-f193.google.com with SMTP id if11so926252vcb.4 for ; Sat, 05 Apr 2014 16:13:37 -0700 (PDT) X-Received: by 10.58.187.9 with SMTP id fo9mr11901136vec.4.1396739617089; Sat, 05 Apr 2014 16:13:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.144.9 with HTTP; Sat, 5 Apr 2014 16:13:17 -0700 (PDT) In-Reply-To: References: From: Jacob Errington Date: Sat, 5 Apr 2014 19:13:17 -0400 Message-ID: Subject: Re: [lojban] ANNOUNCE: Beta-version of a new Lojban corpus search system To: "lojban@googlegroups.com" X-Original-Sender: nictytan@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of nictytan@gmail.com designates 2607:f8b0:400c:c03::241 as permitted sender) smtp.mail=nictytan@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=047d7b6776c8b86f3c04f653c83b X-Spam-Score: 0.0 (/) X-Spam_score: 0.0 X-Spam_score_int: 0 X-Spam_bar: / --047d7b6776c8b86f3c04f653c83b Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable .i je'u doi la .dan. do banli .i sei lakne lo do se finti cu balrai lo ro lojbo tutci This will be incredibly useful to anyone wanting to do research on words and their frequencies, which will of course be of the utmost importance in the furthering of Lojban. ki'e sai .dan. .i mi'e la tsani mu'o On 5 April 2014 18:08, Dan Ros=E9n wrote: > ju'i jbopli > > I have made a new Lojban corpus searching system. The idea is to enhance > the study of the usage and development of the language. The search tool > supports, apart from searching for a single word, also searching for > selma'o, place structures of bridi, seltau, date, irc nick, and more. > > Good old jbofi'e has attempted to parse all sentences, and its terbri > information is extracted from successful parses. When the parse fails, > cmafi'e is used for word segmentation and selma'o-tagging. > > The system is kindly hosted by durka at: > > https://www.alexburka.com/~danr > > Here are some examples of what you can do with the extended search: > (Please be patient when clicking the links, it takes a little while to > render the pages.) > > Searching for usages of traji3: > > https://www.alexburka.com/~danr/#?stats_reduce=3Dword&cqp=3D%5Btags%20_%3= D%20%22traji3%22%5D&search_tab=3D1&within=3Dsentence&hpp=3D100&search=3Dcqp > > Please note that you can click on the words in the search results to get > more information in the right-hand side sidebar. > > Searching for self-greetings, COI + mi: > > https://www.alexburka.com/~danr/#?cqp=3D%5Bpos%20%3D%20%22COI%22%5D%20%5B= word%20%3D%20%22mi%22%5D&stats_reduce=3Dword&search_tab=3D1&within=3Dsenten= ce&search=3Dcqp > > Searching for irc messages authored by Robin: > > https://www.alexburka.com/~danr/#?stats_reduce=3Dword&cqp=3D%5B_.text_nic= k%20%3D%20%22rlpowell%22%20%26%20lbound(sentence)%5D&search_tab=3D1&within= =3Dsentence&search=3Dcqp&page=3D855 > > Usages of pi'o as terminal rafsi (or zi'evla): > > https://www.alexburka.com/~danr/#?stats_reduce=3Dword&cqp=3D%5Bword%20%26= %3D%20%22pi'o%22%20%26%20pos%20%3D%20%22BRIVLA%22%5D&search_tab=3D1&within= =3Dsentence&search=3Dcqp > > Examples of statistics: > > Statistics of lo + BRIVLA: > > https://www.alexburka.com/~danr/#?cqp=3D%5Bword%20%3D%20%22lo%22%5D%20%5B= pos%20%3D%20%22BRIVLA%22%5D&stats_reduce=3Dword&search_tab=3D1&within=3Dsen= tence&search=3Dcqp&result_tab=3D1 > > The most common seltau (the info is set to "end with q"): > > https://www.alexburka.com/~danr/#?cqp=3D%5Btrans%20%26%3D%20%22q%22%5D&se= arch_tab=3D1&within=3Dsentence&page=3D0&search=3Dcqp&stats_reduce=3Dword&re= sult_tab=3D1 > > Popular selbri in gadri (info ends with n): > > https://www.alexburka.com/~danr/#?cqp=3D%5Btrans%20%26%3D%20%22n%22%5D&se= arch_tab=3D1&within=3Dsentence&page=3D0&search=3Dcqp&result_tab=3D1 > > There is also a comparison mode, which requires you to save to searches, > by pressing the down arrow next to the search button. Then you can search > for statistically significant differences between them. Try yourself by > comparing used words without vs with only the irc corpus! > > The system is an adaptation of the Swedish corpora search system Korp: > http://spraakbanken.gu.se/korp > If you are interested to help out, please do contact me. > > Happy exploring! > > mi'e .danr. ko banli mu'o > > -- > You received this message because you are subscribed to the Google Groups > "lojban" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to lojban+unsubscribe@googlegroups.com. > To post to this group, send email to lojban@googlegroups.com. > Visit this group at http://groups.google.com/group/lojban. > For more options, visit https://groups.google.com/d/optout. > --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --047d7b6776c8b86f3c04f653c83b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
.i je'u doi la .dan. do banli .i sei lakne lo do se fi= nti cu balrai lo ro lojbo tutci

This will be incredibly useful to an= yone wanting to do research on words and their frequencies, which will of c= ourse be of the utmost importance in the furthering of Lojban.

ki'e sai .dan.
.i mi'e la tsani mu'o


On 5 April 2014 18:08, = Dan Ros=E9n <lurifax@gmail.com> wrote:
ju'i jbopli

I ha= ve made a new Lojban corpus searching system.=A0 The idea is to enhance the= study of the usage and development of the language.=A0 The search tool sup= ports, apart from searching for a single word, also searching for selma'= ;o, place structures of bridi, seltau, date, irc nick, and more.

Good old jbofi'e has attempted to parse all sentences, and its terb= ri information is extracted from successful parses. When the parse fails, c= mafi'e is used for word segmentation and selma'o-tagging.

The system is kindly hosted by durka at:

https://www.alexburka.com/~danr
<= br>Here are some examples of what you can do with the extended search:
(Please be patient when clicking the links, it takes a little while to rend= er the pages.)

Searching for usages of traji3:
https://www.alexburka.com/~danr/#?stats_reduce=3Dword&cqp= =3D%5Btags%20_%3D%20%22traji3%22%5D&search_tab=3D1&within=3Dsentenc= e&hpp=3D100&search=3Dcqp

Please note that you can click on the words in the search results to ge= t more information in the right-hand side sidebar.

Searching for sel= f-greetings, COI + mi:
https://www.alexburka.com/~danr/#?cqp=3D%5Bpos%20%3D%20%22COI%2= 2%5D%20%5Bword%20%3D%20%22mi%22%5D&stats_reduce=3Dword&search_tab= =3D1&within=3Dsentence&search=3Dcqp

Searching for irc messages authored by Robin:
https://www.= alexburka.com/~danr/#?stats_reduce=3Dword&cqp=3D%5B_.text_nick%20%3D%20= %22rlpowell%22%20%26%20lbound(sentence)%5D&search_tab=3D1&within=3D= sentence&search=3Dcqp&page=3D855

Usages of pi'o as terminal rafsi (or zi'evla):
https://www.alex= burka.com/~danr/#?stats_reduce=3Dword&cqp=3D%5Bword%20%26%3D%20%22pi= 9;o%22%20%26%20pos%20%3D%20%22BRIVLA%22%5D&search_tab=3D1&within=3D= sentence&search=3Dcqp

Examples of statistics:

Statistics of lo + BRIVLA:
htt= ps://www.alexburka.com/~danr/#?cqp=3D%5Bword%20%3D%20%22lo%22%5D%20%5Bpos%2= 0%3D%20%22BRIVLA%22%5D&stats_reduce=3Dword&search_tab=3D1&withi= n=3Dsentence&search=3Dcqp&result_tab=3D1

The most common seltau (the info is set to "end with q"):
= https://ww= w.alexburka.com/~danr/#?cqp=3D%5Btrans%20%26%3D%20%22q%22%5D&search_tab= =3D1&within=3Dsentence&page=3D0&search=3Dcqp&stats_reduce= =3Dword&result_tab=3D1

Popular selbri in gadri (info ends with n):
https://www.alexburka.com/~danr/#?cqp=3D%5Btrans%20%26%3D%= 20%22n%22%5D&search_tab=3D1&within=3Dsentence&page=3D0&sear= ch=3Dcqp&result_tab=3D1

There is also a comparison mode, which requires you to save to searches= , by pressing the down arrow next to the search button. Then you can search= for statistically significant differences between them. Try yourself by co= mparing used words without vs with only the irc corpus!

The system is an adaptation of the Swedish corpora search system Korp:<= br>http://spra= akbanken.gu.se/korp
If you are interested to help out, please do con= tact me.

Happy exploring!

mi'e .danr. ko banli mu'o

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http:= //groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--047d7b6776c8b86f3c04f653c83b--