[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Re: How to export tatoeba in simple format
OK, I've made it. http://dl.dropbox.com/u/17805197/parse-tatoeba.tar.gzUnpack it to a directory with links.csv and sentences.csv from Tatoeba.
Run ./prepare-links.sh once. (You'll have to do it again only if you replace links/setences with newer files).
Then run ./make-pairs.sh [language-code] > [some filename].csv
For example ./make-pairs.sh eng > jbo-eng.csv
I've made it so that it gathers all of the interlinked sentences. This has some drawbacks. Do you know the "phone game"? If you do, you know what I'm saying. If you don't, you will know when you look at some pairs...
mu'o mi'e ianek
On Wednesday, March 7, 2012 7:36:44 PM UTC+1, ianek wrote:
http://dl.dropbox.com/u/17805197/jbo-rus.csv
But it's probably not complete, for the reason I mentioned.
On 7 Mar, 19:32, ianek <jane...@gmail.com> wrote:
> I've just found out that links.csv is not complete, ie. it doesn't
> cover all the pairs. For example, we have a Lojban sentence "lo purci
> ka'e te djuno gi'e na ka'e se galfi .i lo balvi ka'e se galfi gi'e na
> ka'e te djuno" and a Polish sentence "Przeszłość może być tylko
> poznana, nie zmieniona. Przyszłość może być tylko zmieniona, nie
> poznana." and they're not linked to each other, but they both are
> linked to "The past can only be known, not changed. The future can
> only be changed, not known.". I wonder if there's a rule that such
> sentence always have a "common relative", it would certainly make
> things easier. But I think that now using a database (maybe sqlite3)
> would be necessary.
>
> mu'o mi'e ianek
>
> On 7 Mar, 15:51, ianek <jane...@gmail.com> wrote:
>
>
>
>
>
>
>
> > What platform? Is Linux ok?
>
> > On 7 Mar, 11:44, gleki <gleki.is.my.n...@gmail.com> wrote:
>
> > > I'm interested. And actually in periodically doing it myself. Not by
> > > request.
> > > Because the database is live and is being updated by us.
>
> > > Of course I know about those three files.
>
> > > For now, I'd prefer such export for several directions at one (a
> > > multilingual spreadsheet).
> > > I want all sentences for which we have lojban translations.
> > > i.e.
> > > first column lojban
> > > 2 column english
> > > then i need
> > > japanese
> > > chinese
> > > russian
> > > arabic
> > > spanish
> > > polish
> > > french
> > > german
>
> > > I'll repeat once again. An automated script for doing so would be awesome.
>
> > > On Wednesday, March 7, 2012 2:47:17 AM UTC+4, ianek wrote:
>
> > > > I've created the list for you, but it was an ugly hack in bash. A
> > > > better way would be to create a database and import sentences.csv and
> > > > links.csv to it, and then write a very simple program instead of
> > > > hacking around with grep etc. But it would be more work of course. And
> > > > maybe not faster, considering that import would take time.
>
> > > > Here you go:http://dl.dropbox.com/u/17805197/jbo-eng.csv
> > > > It's tab-seperated list, any spreadsheet program should read it.
>
> > > > As a by-product, I am able to produce such a list for any other
> > > > language available in tatoeba instantly, if anyone's interested.
>
> > > > mu'o mi'e ianek
>
> > > > On 6 Mar, 22:17, ianek <jane...@gmail.com> wrote:
>
> > > >http://tatoeba.org/pol/download_tatoeba_example_sentenceshttp://tatoe...
>
> > > > > There are actually three columns: id, language, sentence, but with
> > > > > some database-fu or script-fu or maybe even spreadsheet-fu you can get
> > > > > what you want. Or maybe I'll hack it together in a while.
>
> > > > > mu'o mi'e ianek
>
> > > > > On 6 Mar, 15:19, gleki <gleki.is.my.n...@gmail.com> wrote:
>
> > > > > > I wanna export tatoeba databse into a simple spreadsheet with two
> > > > columns.
> > > > > > One for English and another one for Lojban
>
> > > > > > Does anyone know how to do that ?
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To view this discussion on the web visit https://groups.google.com/d/msg/lojban/-/PLp6H0iMVuIJ.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.