From nobody@digitalkingdom.org Tue Aug 29 16:33:31 2006 Received: with ECARTIS (v1.0.0; list lojban-list); Tue, 29 Aug 2006 16:33:32 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.62) (envelope-from ) id 1GID4x-0001IW-E4 for lojban-list-real@lojban.org; Tue, 29 Aug 2006 16:33:11 -0700 Received: from phma.optus.nu ([166.82.175.165] helo=ixazon.dynip.com) by chain.digitalkingdom.org with esmtp (Exim 4.62) (envelope-from ) id 1GID4v-0001I8-Hv for lojban-list@lojban.org; Tue, 29 Aug 2006 16:33:11 -0700 Received: from [192.168.25.19] (unknown [192.168.25.19]) by ixazon.dynip.com (Postfix) with ESMTP id 7E0ACCE770 for ; Tue, 29 Aug 2006 19:32:56 -0400 (EDT) From: Pierre Abbat To: lojban-list@lojban.org Subject: [lojban] Re: Grammar checking wikipedia bot. Date: Tue, 29 Aug 2006 19:32:32 -0400 User-Agent: KMail/1.9.1 References: <44F0F0AD.4090901@bommelibom.com> In-Reply-To: <44F0F0AD.4090901@bommelibom.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200608291932.35203.phma@phma.optus.nu> X-Spam-Score: -2.3 (--) X-archive-position: 12567 X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: phma@phma.optus.nu Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list On Saturday 26 August 2006 21:09, Einar Faanes wrote: > coi ro do > > I mentioned this on the irc-channel earlier today, but I think I should > post it here as well. I have an idea which I think is possible and which > I think should be set alive. When lojban is parseable (and we have a > parser) we should take advantage of that by automatically check the > lojban wikipedia for spelling errors by channeling the text through > jbofi'e. I think that this can be done by using the wikimedia bot > framework, which is among other things used to update and add > interlanguage links. > > It should be possible to make the bot download a page, strip it of > non-lojban elements (wikimarkup etc.), check the text and post an > errormessage on either the articles discussion page or a centralized > reference page. The bot is written in python. I'm no programmer, but > know others here which are and which may find this interesting. Sounds good, though there are a couple of things that would cause false errors: 1. Some words are valid fu'ivla, but jbofi'e doesn't recognize them, such as {srutio} (a discarded form for {strutione}, ostrich) and {largectremia} (crape myrtle). Also the PEG accepts some lujvo made with fu'ivla that jbofi'e doesn't. 2. Some tables don't parse if you just remove the markup. The prefix chart in [[treci'e]] is set up so that you put the cells in the row in the blanks in the sentence formed by the header row. If you removed the markup in a table in the English Wikipedia, the result wouldn't parse in English either. phma To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.