Return-path: Envelope-to: rlpowell@digitalkingdom.org Delivery-date: Sun, 23 Jan 2005 07:49:01 -0800 Received: from chain.digitalkingdom.org ([64.81.49.134]) by chain.digitalkingdom.org with esmtp (Exim 4.34) id 1Csjyk-000748-Gu; Sun, 23 Jan 2005 07:48:42 -0800 Received: with ECARTIS (v1.0.0; list jbovlaste); Sun, 23 Jan 2005 07:48:40 -0800 (PST) Received: from miranda.org ([209.58.150.153] ident=qmailr) by chain.digitalkingdom.org with smtp (Exim 4.34) id 1Csjyi-000742-6R for jbovlaste@lojban.org; Sun, 23 Jan 2005 07:48:40 -0800 Received: (qmail 20619 invoked by uid 534); 23 Jan 2005 15:48:38 -0000 From: jkominek@miranda.org Date: Sun, 23 Jan 2005 08:48:38 -0700 To: jbovlaste@lojban.org Subject: [jbovlaste] Re: [lojban] Re: French and Russian speakers needed, please. Message-ID: <20050123154838.GI3740@miranda.org> References: <000901c50154$466b87a0$77642090@csrv.ad.york.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000901c50154$466b87a0$77642090@csrv.ad.york.ac.uk> Accept-Language: jbo, en User-Agent: Mutt/1.5.6+20040907i X-ecartis-version: Ecartis v1.0.0 Sender: jbovlaste-bounce@lojban.org Errors-to: jbovlaste-bounce@lojban.org X-original-sender: jkominek@miranda.org Precedence: bulk Reply-to: jbovlaste@lojban.org X-list: jbovlaste X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on chain.digitalkingdom.org X-Spam-Level: X-Spam-Status: No, hits=-4.5 required=5.0 tests=AWL,BAYES_00,NO_REAL_NAME autolearn=no version=2.64 Content-Length: 859 Lines: 23 On Sun, Jan 23, 2005 at 02:03:11PM -0000, Dr EK Sklyanin wrote: > Thanks for the advice. > > However, does the fixed-fields-length format work well with the > multibyte UTF8 encoding which I am going to use (or, should I use an > 8bit encoding, like KOI8-R?). And if it is compatible, then should I use > the fields lengths exactly the same as in the English file? We need it in UTF-8, but, feel free to use whatever encoding is maximally convenient for you. We can very easily convert KOI8-R to UTF-8, if necessary. The current snarfgismu scripts process the Esperanto and Spanish lists, which use a mix of fixed-width columns and tab delimiters. While we shouldn't have any problems dealing with fixed field lengths and UTF-8 data, it would probably be slightly easier to deal with tab delimited data, anyways. -- Jay Kominek