Received: from mail-lf0-f58.google.com ([209.85.215.58]:33090) by stodi.digitalkingdom.org with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.85) (envelope-from ) id 1aSlyz-0002rH-Nw for lojban-list-archive@lojban.org; Mon, 08 Feb 2016 05:36:12 -0800 Received: by mail-lf0-f58.google.com with SMTP id e36sf38310308lfi.0 for ; Mon, 08 Feb 2016 05:36:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:from:date:message-id:subject:to:content-type :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:sender:list-subscribe:list-unsubscribe; bh=GJPD7+CwptwzIDl5LAgY3DPlUF/QozEtVZrWl7jviv4=; b=vdsOzI+43demWuVyGu1ZL6hSTSWqNkEhHTVt9UXX0MCR5mToa8ON35KhVYCGkQE6i6 pitX7eETDsgvWD9ixTFjOYtH94QngrD4w+EeQbJzIyLmq4abN90qTTTRf6+rnttknbFB IuMqmTlq691Jkh/39ypxcFYJUUJoayNn/kZkqijdNbI8AE3PQSh05NQ9h8rzaDpCz2El Bc2Vvnxa47sOazhWgX1H+mzoD4RJu454ldqEB1Rz+WBZmBks7sY0Z23aOQEqGAQTpOV6 FY1aSuzgHdaDtrHXNkEEhwjn8BgCLWJuKARcE+bRfFkn5+CX93aA4CfPcBWIhcKUiV9e OKtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:x-spam-checked-in-group :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe; bh=GJPD7+CwptwzIDl5LAgY3DPlUF/QozEtVZrWl7jviv4=; b=YEbmvbI+BVNLYamnk+ZKmNtHk0j0V8/pfvCmFIzXekIxxWs6tWjVCkUG13AXCcKc5M k/Fgq2n9URq1SMCCMJFq5qQDP9+29LKFzTAyFq0EIKGC5aq00sMMIt/dj6iaSUGVvmND Q65+nwEJzUPxSPXBqTNetfecsJx1+0xjy0+9ecskduW/TD9zOQCGfIjWfzMoboH7SDRX B6Si6Zog1YHa/Iqi6qYsqq/QkIAW2MQ2NVpfNIbFLp3rQoqqZ7GT8XNN9GjXwZXkYQhl EEqHOe/LSQuWTOz3hmeU8XbG3WRtUyXrYrFoxiwOwc8BCDW48AUnRlmVQ5SzzBDwvO+f TPcQ== X-Gm-Message-State: AG10YOQETUu7gvcNQo05Q85TYDTNhcPnoVYhmrK/FMIZVwS6f37GUxP5Fo0GHizJ0pvxsA== X-Received: by 10.25.39.67 with SMTP id n64mr406581lfn.14.1454938554937; Mon, 08 Feb 2016 05:35:54 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.25.139.8 with SMTP id n8ls604533lfd.45.gmail; Mon, 08 Feb 2016 05:35:54 -0800 (PST) X-Received: by 10.112.137.102 with SMTP id qh6mr3343515lbb.9.1454938554306; Mon, 08 Feb 2016 05:35:54 -0800 (PST) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com. [2a00:1450:400c:c09::229]) by gmr-mx.google.com with ESMTPS id 5si454722wmm.2.2016.02.08.05.35.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Feb 2016 05:35:54 -0800 (PST) Received-SPF: pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c09::229 as permitted sender) client-ip=2a00:1450:400c:c09::229; Received: by mail-wm0-x229.google.com with SMTP id p63so115583832wmp.1 for ; Mon, 08 Feb 2016 05:35:54 -0800 (PST) X-Received: by 10.28.17.8 with SMTP id 8mr28561881wmr.65.1454938554132; Mon, 08 Feb 2016 05:35:54 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.92.136 with HTTP; Mon, 8 Feb 2016 05:35:14 -0800 (PST) From: Gleki Arxokuna Date: Mon, 8 Feb 2016 16:35:14 +0300 Message-ID: Subject: [lojban] N-grams of Lojban corpus To: "lojban@googlegroups.com" Content-Type: multipart/alternative; boundary=001a1145bc8ab0700f052b42470e X-Original-Sender: gleki.is.my.name@gmail.com X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of gleki.is.my.name@gmail.com designates 2a00:1450:400c:c09::229 as permitted sender) smtp.mailfrom=gleki.is.my.name@gmail.com; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Spam-Checked-In-Group: lojban@googlegroups.com X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -1.7 (-) X-Spam_score: -1.7 X-Spam_score_int: -16 X-Spam_bar: - --001a1145bc8ab0700f052b42470e Content-Type: text/plain; charset=UTF-8 For various reasons we may need stats of N-grams from Lojban corpus. Not that it's hard to generate such stats. But we first need to preprocess the log of our history: http://www.lojban.org/irclogs/irclogs.zip Definitely, messages from "mensi", "livla" must be removed. Anything else? I'd like to eventually develop an algorithm of preprocessing this log. Any help is welcomed. I started adding different lists of N-grams here: https://mw.lojban.org/papri/N-grams_of_Lojban_corpus But spreadsheets might be needed instead since list can be long. PS. If you wonder where N-grams might be needed the immediate application is "collect most frequent phrases in Lojban and make a phrasebook out of that". -- You received this message because you are subscribed to the Google Groups "lojban" group. To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com. To post to this group, send email to lojban@googlegroups.com. Visit this group at https://groups.google.com/group/lojban. For more options, visit https://groups.google.com/d/optout. --001a1145bc8ab0700f052b42470e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
For various reasons we may need stats of N-grams from Lojb= an corpus.

Not that it's hard to generate such stats= .

But we first need to preprocess the log of our h= istory:

Def= initely, messages from "mensi", "livla" must be removed= .

Anything else?

I'd = like to eventually develop an algorithm of preprocessing this log.
Any help is welcomed.


I started a= dding different lists of N-grams here:=C2=A0https://mw.lojban.org/papri/N-grams_of_Lo= jban_corpus

But spreadsheets might be needed i= nstead since list can be long.

PS. If you wonder w= here N-grams might be needed the immediate application is "collect mos= t frequent phrases in Lojban and make a phrasebook out of that".
=

--
You received this message because you are subscribed to the Google Groups &= quot;lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to lojban+unsub= scribe@googlegroups.com.
To post to this group, send email to lojban@googlegroups.com.
Visit this group at http= s://groups.google.com/group/lojban.
For more options, visit http= s://groups.google.com/d/optout.
--001a1145bc8ab0700f052b42470e--