Received: from mail-gg0-f183.google.com ([209.85.161.183]:48100) by stodi.digitalkingdom.org with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) (envelope-from ) id 1TqWhP-0002Ic-VW; Wed, 02 Jan 2013 14:22:25 -0800 Received: by mail-gg0-f183.google.com with SMTP id o4sf8726642ggm.20 for ; Wed, 02 Jan 2013 14:22:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=x-received:x-beenthere:x-received:x-received:date:from:to :message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type; bh=aOVI7MrtvrGKxnK7YYBUSVzJ/TsNCUiW6kmbFaNTD1w=; b=dvuj8moP+GpfUF/2/HBz2nUAKz+tyOFI90hbc4s3M3hc5J4gjA4zw4uTL6l/DsJybq DRhuj+zawxm2RHrcP4KrPKrIhVBjddzOcakeLhwGk/1Ii7DtHtghTvuts+AK3yO7aMPG oiaE8gyGes+8Igwh2d8hxxIFidH/2ss0Qb0PMwXtxlN6bs+bdfWuK10hRkm/qbezbNX8 6lJfwv7b6mhFxyqcVZgrIo3a6Ob2lbgxZbB39TPQjhgxngApWnTHT/rDcz5RgZKfmIvy 7N8rjlabPh4CUQ2tNpoQnwQ3Pfy0sWLq0DIVUuweU/nn26JGfKy/e5+OSt6wtNRcfpBW JitA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:x-beenthere:x-received:x-received:date:from:to :message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type; bh=aOVI7MrtvrGKxnK7YYBUSVzJ/TsNCUiW6kmbFaNTD1w=; b=T27tTWC8d3y6d/0L5eCueM3XaQpER3Hw5PGb1QDotG7nJztR/FIiayBAh2EqvJhCYn xr0ue/+rMWEsOPoRhgL6eoSuvClgqG47a/jhj5eP4DKH/tsVegATezwzCPFfRZ7xWEv+ LCMBNEuwycfos+i88Ps5SNl+8JGuX3qn3IJ/dWlZCehh9k5G0DM7D3mNnu8rshs1kGKU wdGyhWruXbsB4Oa6VABY8/AR8saTcD1ufr1njdr0M6UJeeIdmKZ0FQ1CLldHEA9uFyTM 5qwq0Ub1pADF8ccUajnXZ9ORsLEmLMULjPY3VD8Ov7HLJl/hA5Cs+0oxfl1KGjkVhzBG 1mlg== X-Received: by 10.49.62.164 with SMTP id z4mr7440140qer.34.1357165325631; Wed, 02 Jan 2013 14:22:05 -0800 (PST) X-BeenThere: lojban@googlegroups.com Received: by 10.49.127.177 with SMTP id nh17ls6764375qeb.36.gmail; Wed, 02 Jan 2013 14:22:04 -0800 (PST) X-Received: by 10.224.180.141 with SMTP id bu13mr26253512qab.2.1357165324243; Wed, 02 Jan 2013 14:22:04 -0800 (PST) Received: by 10.224.89.18 with SMTP id c18msqam; Wed, 2 Jan 2013 11:00:35 -0800 (PST) X-Received: by 10.49.63.164 with SMTP id h4mr7174823qes.39.1357153235715; Wed, 02 Jan 2013 11:00:35 -0800 (PST) Date: Wed, 2 Jan 2013 11:00:34 -0800 (PST) From: evarismb@gmail.com To: lojban@googlegroups.com Message-Id: In-Reply-To: <783963.269.1332348732955.JavaMail.geo-discussion-forums@vbbp15> References: <29741151.5374.1331043579316.JavaMail.geo-discussion-forums@vbkc1> <8f2d80fb-7cda-4645-854d-4f119e0d5726@l14g2000vbe.googlegroups.com> <20567224.17.1331117056640.JavaMail.geo-discussion-forums@ynic10> <85d85f4f-d5f5-4fe2-a278-c278b63bffe1@m2g2000vbc.googlegroups.com> <24b50624-5057-46e1-90c1-3b0ba4e4f9e5@gr6g2000vbb.googlegroups.com> <877cc974-305f-4763-8756-03768c19d643@s7g2000vby.googlegroups.com> <783963.269.1332348732955.JavaMail.geo-discussion-forums@vbbp15> Subject: [lojban] Re: How to export tatoeba in simple format MIME-Version: 1.0 X-Original-Sender: evarismb@gmail.com Reply-To: lojban@googlegroups.com Precedence: list Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com List-ID: X-Google-Group-Id: 1004133512417 List-Post: , List-Help: , List-Archive: Sender: lojban@googlegroups.com List-Subscribe: , List-Unsubscribe: , Content-Type: multipart/alternative; boundary="----=_Part_41_14020046.1357153234086" X-Spam-Score: -0.1 (/) X-Spam_score: -0.1 X-Spam_score_int: 0 X-Spam_bar: / ------=_Part_41_14020046.1357153234086 Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable Hi, I'm a german Teacher at a spanish University and i've tried to adapt your= =20 script to download a bilingual csv (german-spanish) from tatoeba. The=20 problem is i have absolute no programming/ linux knowledge and i can't=20 figure out why this doesn't work. It would be very nice if you could give= =20 me a hint how to do that.=20 Thank you! El mi=E9rcoles, 21 de marzo de 2012 17:52:12 UTC+1, ianek escribi=F3: > > OK, I've made it. http://dl.dropbox.com/u/17805197/parse-tatoeba.tar.gz > Unpack it to a directory with links.csv and sentences.csv from Tatoeba. > Run ./prepare-links.sh once. (You'll have to do it again only if you=20 > replace links/setences with newer files). > Then run ./make-pairs.sh [language-code] > [some filename].csv > For example ./make-pairs.sh eng > jbo-eng.csv > > I've made it so that it gathers all of the interlinked sentences. This ha= s=20 > some drawbacks. Do you know the "phone game"? If you do, you know what I'= m=20 > saying. If you don't, you will know when you look at some pairs... > > mu'o mi'e ianek > > On Wednesday, March 7, 2012 7:36:44 PM UTC+1, ianek wrote: >> >> http://dl.dropbox.com/u/17805197/jbo-rus.csv=20 >> >> But it's probably not complete, for the reason I mentioned.=20 >> >> On 7 Mar, 19:32, ianek wrote:=20 >> > I've just found out that links.csv is not complete, ie. it doesn't=20 >> > cover all the pairs. For example, we have a Lojban sentence "lo purci= =20 >> > ka'e te djuno gi'e na ka'e se galfi .i lo balvi ka'e se galfi gi'e na= =20 >> > ka'e te djuno" and a Polish sentence "Przesz=B3o=B6=E6 mo=BFe by=E6 ty= lko=20 >> > poznana, nie zmieniona. Przysz=B3o=B6=E6 mo=BFe by=E6 tylko zmieniona,= nie=20 >> > poznana." and they're not linked to each other, but they both are=20 >> > linked to "The past can only be known, not changed. The future can=20 >> > only be changed, not known.". I wonder if there's a rule that such=20 >> > sentence always have a "common relative", it would certainly make=20 >> > things easier. But I think that now using a database (maybe sqlite3)= =20 >> > would be necessary.=20 >> >=20 >> > mu'o mi'e ianek=20 >> >=20 >> > On 7 Mar, 15:51, ianek wrote:=20 >> >=20 >> >=20 >> >=20 >> >=20 >> >=20 >> >=20 >> >=20 >> > > What platform? Is Linux ok?=20 >> >=20 >> > > On 7 Mar, 11:44, gleki wrote:=20 >> >=20 >> > > > I'm interested. And actually in periodically doing it myself. Not= =20 >> by=20 >> > > > request.=20 >> > > > Because the database is live and is being updated by us.=20 >> >=20 >> > > > Of course I know about those three files.=20 >> >=20 >> > > > For now, I'd prefer such export for several directions at one (a= =20 >> > > > multilingual spreadsheet).=20 >> > > > I want all sentences for which we have lojban translations.=20 >> > > > i.e.=20 >> > > > first column lojban=20 >> > > > 2 column english=20 >> > > > then i need=20 >> > > > japanese=20 >> > > > chinese=20 >> > > > russian=20 >> > > > arabic=20 >> > > > spanish=20 >> > > > polish=20 >> > > > french=20 >> > > > german=20 >> >=20 >> > > > I'll repeat once again. An automated script for doing so would be= =20 >> awesome.=20 >> >=20 >> > > > On Wednesday, March 7, 2012 2:47:17 AM UTC+4, ianek wrote:=20 >> >=20 >> > > > > I've created the list for you, but it was an ugly hack in bash. = A=20 >> > > > > better way would be to create a database and import sentences.cs= v=20 >> and=20 >> > > > > links.csv to it, and then write a very simple program instead of= =20 >> > > > > hacking around with grep etc. But it would be more work of=20 >> course. And=20 >> > > > > maybe not faster, considering that import would take time.=20 >> >=20 >> > > > > Here you go:http://dl.dropbox.com/u/17805197/jbo-eng.csv=20 >> > > > > It's tab-seperated list, any spreadsheet program should read it.= =20 >> >=20 >> > > > > As a by-product, I am able to produce such a list for any other= =20 >> > > > > language available in tatoeba instantly, if anyone's interested.= =20 >> >=20 >> > > > > mu'o mi'e ianek=20 >> >=20 >> > > > > On 6 Mar, 22:17, ianek wrote:=20 >> >=20 >> > > > > >> http://tatoeba.org/pol/download_tatoeba_example_sentenceshttp://tatoe...= =20 >> >=20 >> > > > > > There are actually three columns: id, language, sentence, but= =20 >> with=20 >> > > > > > some database-fu or script-fu or maybe even spreadsheet-fu you= =20 >> can get=20 >> > > > > > what you want. Or maybe I'll hack it together in a while.=20 >> >=20 >> > > > > > mu'o mi'e ianek=20 >> >=20 >> > > > > > On 6 Mar, 15:19, gleki wrote:=20 >> >=20 >> > > > > > > I wanna export tatoeba databse into a simple spreadsheet wit= h=20 >> two=20 >> > > > > columns.=20 >> > > > > > > One for English and another one for Lojban=20 >> >=20 >> > > > > > > Does anyone know how to do that ? > > --=20 You received this message because you are subscribed to the Google Groups "= lojban" group. To view this discussion on the web visit https://groups.google.com/d/msg/lo= jban/-/v0ZeJiJjbOEJ. To post to this group, send email to lojban@googlegroups.com. To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com. For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den. ------=_Part_41_14020046.1357153234086 Content-Type: text/html; charset=ISO-8859-2 Content-Transfer-Encoding: quoted-printable Hi,
I'm a german Teacher at a spanish University and i've tried to adapt= your script to download a bilingual csv (german-spanish) from tatoeba. The= problem is i have absolute no programming/ linux knowledge and i can't fig= ure out why this doesn't work. It would be very nice if you could give me a= hint how to do that.
Thank you!

El mi=E9rcoles, 21 de marzo de = 2012 17:52:12 UTC+1, ianek escribi=F3:
OK, I've made it. http://dl.dropbox.com/u/17805= 197/parse-tatoeba.tar.gz
Unpack it to a directory with links.csv an= d sentences.csv from Tatoeba.
Run ./prepare-links.sh once. (You'l= l have to do it again only if you replace links/setences with newer files).=
Then run ./make-pairs.sh [language-code] > [some filename].cs= v
For example ./make-pairs.sh eng > jbo-eng.csv

I've made it so that it gathers all of the interlinked sent= ences. This has some drawbacks. Do you know the "phone game"? If you do, yo= u know what I'm saying. If you don't, you will know when you look at some p= airs...

mu'o mi'e ianek

On Wednesday, March= 7, 2012 7:36:44 PM UTC+1, ianek wrote:
http://dl.dropbox.com/u/17805197/jbo-rus.csv

But it's probably not complete, for the reason I mentioned.

On 7 Mar, 19:32, ianek <jane...@gmail.com> wrote:
> I've just found out that links.csv is not complete, ie. it doesn't
> cover all the pairs. For example, we have a Lojban sentence "lo pu= rci
> ka'e te djuno gi'e na ka'e se galfi .i lo balvi ka'e se galfi gi'e= na
> ka'e te djuno" and a Polish sentence "Przesz=B3o=B6=E6 mo=BFe by= =E6 tylko
> poznana, nie zmieniona. Przysz=B3o=B6=E6 mo=BFe by=E6 tylko zmieni= ona, nie
> poznana." and they're not linked to each other, but they both are
> linked to "The past can only be known, not changed. The future can
> only be changed, not known.". I wonder if there's a rule that such
> sentence always have a "common relative", it would certainly make
> things easier. But I think that now using a database (maybe sqlite= 3)
> would be necessary.
>
> mu'o mi'e ianek
>
> On 7 Mar, 15:51, ianek <jane...@gmail.com> wrote:
>
>
>
>
>
>
>
> > What platform? Is Linux ok?
>
> > On 7 Mar, 11:44, gleki <gleki.is.my.n...@gmail.com&= gt; wrote:
>
> > > I'm interested. And actually in periodically doing it my= self.  Not by
> > > request.
> > > Because the database is live and is being updated by us.
>
> > > Of course I know about those three files.
>
> > > For now, I'd prefer such export for several directions a= t one (a
> > > multilingual spreadsheet).
> > > I want all sentences for which we have lojban translatio= ns.
> > > i.e.
> > > first column    lojban
> > > 2 column   english
> > > then i need
> > > japanese
> > > chinese
> > > russian
> > > arabic
> > > spanish
> > > polish
> > > french
> > > german
>
> > > I'll repeat once again. An automated script for doing so=  would be awesome.
>
> > > On Wednesday, March 7, 2012 2:47:17 AM UTC+4, ianek wrot= e:
>
> > > > I've created the list for you, but it was an ugly h= ack in bash. A
> > > > better way would be to create a database and import= sentences.csv and
> > > > links.csv to it, and then write a very simple progr= am instead of
> > > > hacking around with grep etc. But it would be more = work of course. And
> > > > maybe not faster, considering that import would tak= e time.
>
> > > > Here you go:http://dl.dropbox.com/u/17805197/j= bo-eng.csv
> > > > It's tab-seperated list, any spreadsheet program sh= ould read it.
>
> > > > As a by-product, I am able to produce such a list f= or any other
> > > > language available in tatoeba instantly, if anyone'= s interested.
>
> > > > mu'o mi'e ianek
>
> > > > On 6 Mar, 22:17, ianek <jane...@gmail.com= > wrote:
>
> > > >http://tatoeba.org/pol/download_tatoeba_example_sentenceshttp://tatoe...
>
> > > > > There are actually three columns: id, language= , sentence, but with
> > > > > some database-fu or script-fu or maybe even sp= readsheet-fu you can get
> > > > > what you want. Or maybe I'll hack it together = in a while.
>
> > > > > mu'o mi'e ianek
>
> > > > > On 6 Mar, 15:19, gleki <gleki.is.my.n...= @gmail.com> wrote:
>
> > > > > > I wanna export tatoeba databse into a sim= ple spreadsheet with two
> > > > columns.
> > > > > > One for English and another one for Lojba= n
>
> > > > > > Does anyone know how to do that ?

--
You received this message because you are subscribed to the Google Groups "= lojban" group.
To view this discussion on the web visit https://groups.google.com/d/msg/lojban/-/v0= ZeJiJjbOEJ.
=20 To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegrou= ps.com.
For more options, visit this group at http://groups.google.com/group/lojban= ?hl=3Den.
------=_Part_41_14020046.1357153234086--