Received: from mail-qc0-f186.google.com ([209.85.216.186]:54667) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1US2iS-000863-6a for lojban-list-archive@lojban.org; Tue, 16 Apr 2013 03:02:33 -0700 Received: by mail-qc0-f186.google.com with SMTP id d32sf142966qcs.13 for ; Tue, 16 Apr 2013 03:02:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=x-received:x-beenthere:x-received:date:from:to:message-id :in-reply-to:references:subject:mime-version:x-original-sender :reply-to:precedence:mailing-list:list-id:x-google-group-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; bh=7VsmPP7AarNjoeICZOs6LoQ+qltv3bwjBxyNIiJ9e/k=; b=wNep4Su9LH1n8iTXX/SDP+lKPboW/B0zetSxo4l80TIB9ngbqYJmiuUNYDIQNQLVsQ 0V81tm7P6J7b0f+BpRev0AolvymLZiPDjCYOAOrHDx1Cyx+KI6jCY9hMPzLs7ynKCoeK 09NqLGgPHTuMU1kyqzfjZHyXM5devKXMJXrL4ijOyxvihuKOaj1ZjoEn7uE/VBCQSar/ 9aaWpztgxdc9AI5gUPuKREyAlFm3shkFDenpsKAHoageZtJ6G+ypbIOV1PZqNoAU4XKC 2dtgEFs7SS858qoaXuFZUSE+eALBsdPpLoFq4aal4xJjWekC44Z3mRQZti+cgCHQtiDq gSqg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:x-beenthere:x-received:date:from:to:message-id :in-reply-to:references:subject:mime-version:x-original-sender :reply-to:precedence:mailing-list:list-id:x-google-group-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type; bh=7VsmPP7AarNjoeICZOs6LoQ+qltv3bwjBxyNIiJ9e/k=; b=qkjqo3xaMyzAZkka6ZZWcTmPoTKHs5Z/2FB9B8Mti/eT7NaWobHNnmORZBr77cG8Sc Eciy43VVMNrbnYMdOdi32uhQbQwU5cj1O63MgzjmMZF5kfjumb2ikI/apPQoDM8oI2mz NZQiyAlM4SIq3Q/xs/xODziGhEbQ93w1+7Bg1MZ5nS9K/yurBWfRntZv7OfbfeJrXOHV CvP/hq25bPMJJg9eP9ldT4FEThIHmtHJaVDExrxLNzzc5KDNpxoigvAaVNCGdxWu1TnG Hmb++/hV3BIngmcPF/A6arpwUZ7QwyXrODQ4U6zBmxcL0w9khE/n4qA0A8Z6aBcCA/32 bhIQ== X-Received: by 10.49.127.145 with SMTP id ng17mr74374qeb.9.1366106533878; Tue, 16 Apr 2013 03:02:13 -0700 (PDT) X-BeenThere: lojban@googlegroups.com Received: by 10.49.40.198 with SMTP id z6ls214387qek.71.gmail; Tue, 16 Apr 2013 03:02:12 -0700 (PDT) X-Received: by 10.49.96.100 with SMTP id dr4mr74205qeb.20.1366106532696; Tue, 16 Apr 2013 03:02:12 -0700 (PDT) Date: Tue, 16 Apr 2013 03:02:11 -0700 (PDT) From: la gleki To: lojban@googlegroups.com Message-Id: <7e6e66ac-0d51-4d9d-a49f-6e96741629dc@googlegroups.com> In-Reply-To: References: <20130415195125.GB11548@stodi.digitalkingdom.org> <20130416090920.GA18465@stodi.digitalkingdom.org> Subject: Re: [lojban] Request for a full frequency list of all lojbanic words for an Android app. MIME-Version: 1.0 X-Original-Sender: gleki.is.my.name@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary="----=_Part_1533_9877932.1366106531901" X-Spam-Score: -0.1 (/) X-Spam_score: -0.1 X-Spam_score_int: 0 X-Spam_bar: / ------=_Part_1533_9877932.1366106531901 Content-Type: text/plain; charset=ISO-8859-1 On Tuesday, April 16, 2013 1:34:16 PM UTC+4, Ross Ogilvie wrote: > > Okay, I filtered my previous frequency list of lojban words, removing all > cmene and non lojban words, then manually picked out some author's names > that are brivla. > > What do you mean by corpus? irc log saves only parsable sentences. But i still can see many english words. What is the source of this corpus? Also i think that we can trim the list to only first 5000 words/clusters. The rest can be added manually from jbovlaste. > Please find attached. > > -- Ross > > On 16 April 2013 19:09, Robin Lee Powell > > wrote: > >> On Tue, Apr 16, 2013 at 12:36:01AM -0700, la gleki wrote: >> > >> > >> > On Monday, April 15, 2013 11:51:25 PM UTC+4, Robin Powell wrote: >> > > >> > > On Fri, Apr 12, 2013 at 07:57:17AM -0700, la gleki wrote: >> > > > peeps, i need ur help. >> > > > we are gonna have Swype/Swipe feature for MultiLing android >> keyboard. I >> > > > need a list of all lojbanic words + frequency of each. >> > > > i know of a gismu frequency list. But it seems that not all gismu >> are >> > > there >> > > > (less than 1342). What about cmavo, fu'ivla? >> > > > >> > > > Of course, most rare words can be given the lowest rating but what >> are >> > > the >> > > > most frequent words? >> > > > Can we rerun the algorithm to count all the occurrencies of all >> words? >> > > >> > > >> > > >> http://users.digitalkingdom.org/~rlpowell/hobbies/lojban/flashcards/?C=M;O=D >> > > -- the _freq lists should have everything. >> > > >> > > It should be pretty easy to regenerate this stuff with the latest >> > > from http://corpus.lojban.org/ , but I am (as usual) not >> > > volunteering. >> > > >> > >> > Is there a script that can generate such lists? >> >> The scripts I used are in that same directory; not sure what's what >> at this point, though. >> >> -Robin >> >> -- >> You received this message because you are subscribed to the Google Groups >> "lojban" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to lojban+un...@googlegroups.com . >> To post to this group, send email to loj...@googlegroups.com >> . >> Visit this group at http://groups.google.com/group/lojban?hl=en. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at http://groups.google.com/group/lojban?hl=en. For more options, visit https://groups.google.com/groups/opt_out. ------=_Part_1533_9877932.1366106531901 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Tuesday, April 16, 2013 1:34:16 PM UTC+4, Ross Ogilvie wrote:Okay, I filtered my previous freq= uency list of lojban words, removing all cmene and non lojban words, then m= anually picked out some author's names that are brivla.


What do you mean by corpus? irc log saves only parsabl= e sentences. But i still can see many english words. What is the source of = this corpus?

Also i think that we can trim the lis= t to only first 5000 words/clusters. The rest can be added manually from jb= ovlaste.

 
Please find attached.

-- Ross

On 16 April 2013 19:09, Robin Lee Powell <rlpo..= .@digitalkingdom.org> wrote:
http://users.digitalkingdom.<= wbr>org/~rlpowell/hobbies/lojban/flashcards/?C=3DM;O=3DD
> > -- the _freq lists should have everything.
> >
> > It should be pretty easy to regenerate this stuff with the latest=
> > from http= ://corpus.lojban.org/ , but I am (as usual) not
> > volunteering.
> >
>
> Is there a script that can generate such lists?

The scripts I used are in that same directory; not sure what's what at this point, though.

-Robin

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+un...@googlegroups.com.
To post to this group, send email to loj...@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban?hl=3Den= .
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsubscribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban?hl=3Den.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
------=_Part_1533_9877932.1366106531901--