Received: from mail-vc0-f188.google.com ([209.85.220.188]:60892) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.80.1) (envelope-from ) id 1WWYm2-0001kF-Aq for lojban-list-archive@lojban.org; Sat, 05 Apr 2014 15:09:24 -0700 Received: by mail-vc0-f188.google.com with SMTP id ld13sf997186vcb.15 for ; Sat, 05 Apr 2014 15:09:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:reply-to:from:date:message-id:subject:to :x-original-sender:x-original-authentication-results:precedence :mailing-list:list-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type; bh=cLV2yHhkEnhR4BZzrIPhxVyB4c6o4It1i3n5ziSfa/0=; b=DZE2PTfHgN1HQq2ZK/P66uK4jA9pRhtzDVAiTg2Q35sl7JYASe4+86Qb00nC6lqzG8 67rZ/KRch2cvkxo5V5mc4fxEyJhyoHlXXHHtVlJXYCqaZ5utz5fSwqCS3RmKwo4TYYJz VtV6gWWIQ87qgt1S/Ec1vPnhhgBTKpdpul0okIiAro9rNegYwHC9xQaKUO5GtnbozkZ9 dj/LB3+brKOQtFazPB7KAryEZyTHb0JUoHxnt+nrEQn6jvk49CWBpwUz1OmulBp7EkKj H2e6oSUtUnrrlT8rrAIifhSymtWXQ5YQnjWKN2EJwpNMh4i3FlvyHR+deyT5NA+Yw0Qk Lrew== X-Received: by 10.50.59.179 with SMTP id a19mr277905igr.10.1396735747857; Sat, 05 Apr 2014 15:09:07 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.50.103.101 with SMTP id fv5ls1413406igb.9.gmail; Sat, 05 Apr 2014 15:09:07 -0700 (PDT) X-Received: by 10.66.189.228 with SMTP id gl4mr12667408pac.26.1396735747334; Sat, 05 Apr 2014 15:09:07 -0700 (PDT) Received: from mail-qa0-x229.google.com (mail-qa0-x229.google.com [2607:f8b0:400d:c00::229]) by gmr-mx.google.com with ESMTPS id ga1si4181627qcb.0.2014.04.05.15.09.07 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sat, 05 Apr 2014 15:09:07 -0700 (PDT) Received-SPF: pass (google.com: domain of lurifax@gmail.com designates 2607:f8b0:400d:c00::229 as permitted sender) client-ip=2607:f8b0:400d:c00::229; Received: by mail-qa0-x229.google.com with SMTP id j5so4520920qaq.14 for ; Sat, 05 Apr 2014 15:09:07 -0700 (PDT) X-Received: by 10.140.21.8 with SMTP id 8mr4669174qgk.55.1396735747186; Sat, 05 Apr 2014 15:09:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.95.65 with HTTP; Sat, 5 Apr 2014 15:08:47 -0700 (PDT) Reply-To: lojban@googlegroups.com From: =?UTF-8?Q?Dan_Ros=C3=A9n?= Date: Sun, 6 Apr 2014 00:08:47 +0200 Message-ID: Subject: [lojban] ANNOUNCE: Beta-version of a new Lojban corpus search system To: lojban@googlegroups.com X-Original-Sender: lurifax@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of lurifax@gmail.com designates 2607:f8b0:400d:c00::229 as permitted sender) smtp.mail=lurifax@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary=001a11c14e940e64b504f652e21e X-Spam-Score: 0.0 (/) X-Spam_score: 0.0 X-Spam_score_int: 0 X-Spam_bar: / --001a11c14e940e64b504f652e21e Content-Type: text/plain; charset=UTF-8 ju'i jbopli I have made a new Lojban corpus searching system. The idea is to enhance the study of the usage and development of the language. The search tool supports, apart from searching for a single word, also searching for selma'o, place structures of bridi, seltau, date, irc nick, and more. Good old jbofi'e has attempted to parse all sentences, and its terbri information is extracted from successful parses. When the parse fails, cmafi'e is used for word segmentation and selma'o-tagging. The system is kindly hosted by durka at: https://www.alexburka.com/~danr Here are some examples of what you can do with the extended search: (Please be patient when clicking the links, it takes a little while to render the pages.) Searching for usages of traji3: https://www.alexburka.com/~danr/#?stats_reduce=word&cqp=%5Btags%20_%3D%20%22traji3%22%5D&search_tab=1&within=sentence&hpp=100&search=cqp Please note that you can click on the words in the search results to get more information in the right-hand side sidebar. Searching for self-greetings, COI + mi: https://www.alexburka.com/~danr/#?cqp=%5Bpos%20%3D%20%22COI%22%5D%20%5Bword%20%3D%20%22mi%22%5D&stats_reduce=word&search_tab=1&within=sentence&search=cqp Searching for irc messages authored by Robin: https://www.alexburka.com/~danr/#?stats_reduce=word&cqp=%5B_.text_nick%20%3D%20%22rlpowell%22%20%26%20lbound(sentence)%5D&search_tab=1&within=sentence&search=cqp&page=855 Usages of pi'o as terminal rafsi (or zi'evla): https://www.alexburka.com/~danr/#?stats_reduce=word&cqp=%5Bword%20%26%3D%20%22pi'o%22%20%26%20pos%20%3D%20%22BRIVLA%22%5D&search_tab=1&within=sentence&search=cqp Examples of statistics: Statistics of lo + BRIVLA: https://www.alexburka.com/~danr/#?cqp=%5Bword%20%3D%20%22lo%22%5D%20%5Bpos%20%3D%20%22BRIVLA%22%5D&stats_reduce=word&search_tab=1&within=sentence&search=cqp&result_tab=1 The most common seltau (the info is set to "end with q"): https://www.alexburka.com/~danr/#?cqp=%5Btrans%20%26%3D%20%22q%22%5D&search_tab=1&within=sentence&page=0&search=cqp&stats_reduce=word&result_tab=1 Popular selbri in gadri (info ends with n): https://www.alexburka.com/~danr/#?cqp=%5Btrans%20%26%3D%20%22n%22%5D&search_tab=1&within=sentence&page=0&search=cqp&result_tab=1 There is also a comparison mode, which requires you to save to searches, by pressing the down arrow next to the search button. Then you can search for statistically significant differences between them. Try yourself by comparing used words without vs with only the irc corpus! The system is an adaptation of the Swedish corpora search system Korp: http://spraakbanken.gu.se/korp If you are interested to help out, please do contact me. Happy exploring! mi'e .danr. ko banli mu'o -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --001a11c14e940e64b504f652e21e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
ju'i jbopli

I have made a new Lojban corpus sea= rching system.=C2=A0 The idea is to enhance the study of the usage and deve= lopment of the language.=C2=A0 The search tool supports, apart from searchi= ng for a single word, also searching for selma'o, place structures of b= ridi, seltau, date, irc nick, and more.

Good old jbofi'e has attempted to parse all sentences, and its terb= ri information is extracted from successful parses. When the parse fails, c= mafi'e is used for word segmentation and selma'o-tagging.

The system is kindly hosted by durka at:

https://www.alexburka.com/~danr

Here are some e= xamples of what you can do with the extended search:
(Please be patient = when clicking the links, it takes a little while to render the pages.)

Searching for usages of traji3:
https= ://www.alexburka.com/~danr/#?stats_reduce=3Dword&cqp=3D%5Btags%20_%3D%2= 0%22traji3%22%5D&search_tab=3D1&within=3Dsentence&hpp=3D100&= ;search=3Dcqp

Please note that you can click on the words in the search results to ge= t more information in the right-hand side sidebar.

Searching for sel= f-greetings, COI + mi:
https:= //www.alexburka.com/~danr/#?cqp=3D%5Bpos%20%3D%20%22COI%22%5D%20%5Bword%20%= 3D%20%22mi%22%5D&stats_reduce=3Dword&search_tab=3D1&within=3Dse= ntence&search=3Dcqp

Searching for irc messages authored by Robin:
https://www.alexburka.com/~dan= r/#?stats_reduce=3Dword&cqp=3D%5B_.text_nick%20%3D%20%22rlpowell%22%20%= 26%20lbound(sentence)%5D&search_tab=3D1&within=3Dsentence&searc= h=3Dcqp&page=3D855

Usages of pi'o as terminal rafsi (or zi'evla):
https://www.alexburka.com/~danr/#?= stats_reduce=3Dword&cqp=3D%5Bword%20%26%3D%20%22pi'o%22%20%26%20pos= %20%3D%20%22BRIVLA%22%5D&search_tab=3D1&within=3Dsentence&searc= h=3Dcqp

Examples of statistics:

Statistics of lo + BRIVLA:
https://www.alexburka= .com/~danr/#?cqp=3D%5Bword%20%3D%20%22lo%22%5D%20%5Bpos%20%3D%20%22BRIVLA%2= 2%5D&stats_reduce=3Dword&search_tab=3D1&within=3Dsentence&s= earch=3Dcqp&result_tab=3D1

The most common seltau (the info is set to "end with q"):
= https://www.alexburka.com/~d= anr/#?cqp=3D%5Btrans%20%26%3D%20%22q%22%5D&search_tab=3D1&within=3D= sentence&page=3D0&search=3Dcqp&stats_reduce=3Dword&result_t= ab=3D1

Popular selbri in gadri (info ends with n):
h= ttps://www.alexburka.com/~danr/#?cqp=3D%5Btrans%20%26%3D%20%22n%22%5D&s= earch_tab=3D1&within=3Dsentence&page=3D0&search=3Dcqp&resul= t_tab=3D1

There is also a comparison mode, which requires you to save to searches= , by pressing the down arrow next to the search button. Then you can search= for statistically significant differences between them. Try yourself by co= mparing used words without vs with only the irc corpus!

The system is an adaptation of the Swedish corpora search system Korp:<= br>http://spraakbanken.gu.se/kor= p
If you are interested to help out, please do contact me.

Happy exploring!

mi'e .danr. ko banli mu'o

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http:= //groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--001a11c14e940e64b504f652e21e--