From nobody@digitalkingdom.org Sat Aug 13 17:11:50 2005 Received: with ECARTIS (v1.0.0; list lojban-list); Sat, 13 Aug 2005 21:42:38 -0700 (PDT) Received: from nobody by chain.digitalkingdom.org with local (Exim 4.52) id 1E466G-0002XP-Gm for lojban-list-real@lojban.org; Sat, 13 Aug 2005 17:11:40 -0700 Received: from order.neosynapse.net ([199.181.80.5]) by chain.digitalkingdom.org with esmtp (Exim 4.52) id 1E466C-0002XF-N7 for lojban-list@lojban.org; Sat, 13 Aug 2005 17:11:40 -0700 Received: from [127.0.0.1] (order.neosynapse.net [172.16.0.5]) by order.neosynapse.net (Postfix) with ESMTP id 92DDF343C6F for ; Sat, 13 Aug 2005 20:16:02 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v733) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <2EA51D67-CF97-4B71-89C5-439FA19CED1A@neosynapse.net> From: Steven Arnold Subject: [lojban] Re: Loglish: A Modest Proposal Date: Sat, 13 Aug 2005 20:11:33 -0400 To: lojban-list@lojban.org X-Mailer: Apple Mail (2.733) X-Spam-Score: -2.6 (--) X-archive-position: 10335 X-Approved-By: rlpowell@digitalkingdom.org X-ecartis-version: Ecartis v1.0.0 Sender: lojban-list-bounce@lojban.org Errors-to: lojban-list-bounce@lojban.org X-original-sender: stevena@neosynapse.net Precedence: bulk Reply-to: lojban-list@lojban.org X-list: lojban-list On Aug 13, 2005, at 4:00 PM, Arnt Richard Johansen wrote: > To quote your web page: > > # [...] avoid what's really annoying about Lojban (the lack of a full > # vocabulary). > > I suppose that lack of vocabulary will always be a problem in > knowledge representation systems, until someone develops AGI or a > way to extract a suitable dictionary from a text corpus. Wordnet is a system that attempts to take a set of "core meanings" and associate those meanings with words from different languages. It is accessible over the Internet. I invented a language by writing a program in Python that fetched the list of core meanings and assigned words to them from a list. It was a very fast route to a 26,000+ word dictionary. Granted, the dictionary needed a little data grooming -- there were a number of words that, to me, didn't deserve a separate term. There were also words that I wanted to make sure got shorter words, since I expected them to be used more often. But I think the data grooming was by far the minor portion of the task, and by using Wordnet, I saved probably hundreds of hours of word development compared to doing it all by hand. That, combined with using Markov chains for word generation, created an excellent base language in a very short time. I'd be happy to share the source code of these tools with anyone who is interested; email me privately for that. steve To unsubscribe from this list, send mail to lojban-list-request@lojban.org with the subject unsubscribe, or go to http://www.lojban.org/lsg2/, or if you're really stuck, send mail to secretary@lojban.org for help.