[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Technical, Help Request: What information *should* a Lojban dictionary system have?
- To: lojban-list@lojban.org, bpfk@lojban.org, jbovlaste@lojban.org
- Subject: [lojban] Technical, Help Request: What information *should* a Lojban dictionary system have?
- From: Robin Lee Powell <rlpowell@digitalkingdom.org>
- Date: Sat, 11 Sep 2010 14:50:35 -0700
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=beta; h=domainkey-signature:received:x-beenthere:received:received:received :received:received-spf:received:received:date:from:to:subject :message-id:reply-to:mail-followup-to:mime-version:user-agent :x-original-sender:x-original-authentication-results:precedence :mailing-list:list-id:list-post:list-help:list-archive:sender :list-subscribe:list-unsubscribe:content-type:content-disposition :content-transfer-encoding; bh=YWtK5Xcgb+pRBkjBxRPPyAb1czejbqBxL/l+IU9hN4s=; b=BeLTHct4gCD45FvOh6XmiPPG8BBhD79xfpZS48rabtnV7rk2UQof10oMONslwdmzzI PBgOEXNj9A2Mq1vBFTPAzbpxXoYzye+HR6v5hZC7IFn/mgAt9N26KrKLAkM4Ek6dM9ij /xPo6Q2y8T4IqM9NmRkkseH+2ZN2L1m4Q9k3Y=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=googlegroups.com; s=beta; h=x-beenthere:received-spf:date:from:to:subject:message-id:reply-to :mail-followup-to:mime-version:user-agent:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:sender:list-subscribe :list-unsubscribe:content-type:content-disposition :content-transfer-encoding; b=gnz4Y3OgYmySDlNKeFkT0Np7a6vqePJw2fZCIeS7FwrLwUCVIYB6qF5drgKcJPNzzK /A3JMMVkvliA0KTj9S3Iq6qEez5y2y7wT5Sz70iOIrIfhMRKhVu2QYkX40L/W6JGTCHj j1iQBA3DMxwySP36T9WjNnsGs9k9nI/JME7u0=
- List-archive: <http://groups.google.com/group/lojban?hl=en_US>
- List-help: <http://groups.google.com/support/?hl=en_US>, <mailto:lojban+help@googlegroups.com>
- List-id: <lojban.googlegroups.com>
- List-post: <http://groups.google.com/group/lojban/post?hl=en_US>, <mailto:lojban@googlegroups.com>
- List-subscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+subscribe@googlegroups.com>
- List-unsubscribe: <http://groups.google.com/group/lojban/subscribe?hl=en_US>, <mailto:lojban+unsubscribe@googlegroups.com>
- Mail-followup-to: lojban-list@lojban.org, bpfk@lojban.org, jbovlaste@lojban.org
- Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
- Reply-to: lojban@googlegroups.com
- Sender: lojban@googlegroups.com
- User-agent: Mutt/1.5.20 (2009-06-14)
(*Please* redirect all followups to the main list (I'd say the
jbovlaste list, but that's a lot harder to get on, so...))
Some of us have had brief chats about what a re-done jbovlaste would
look like. The UI part is pretty well understood, in as much as web
UIs are decently consistent these days and besides, people like
http://vlasisku.lojban.org/, so that provides a good starting point.
Much more interesting to me is the back-end data: What sorts of
things *should* a Lojbanic dictionary store, ideally?
What got this started is the realization that Lojban isn't English,
and that, in particular, the brivla definitions seem anti-Lojbanic.
When I see
x1 gets/procures/acquires/obtains/accepts x2 from source x3
that kind of looks to me like a verb; I see the big thing in the
middle as being "the meaning" of "the verb".
Lojban isn't like that: brivla are as much or more about the
*places* than about the central meaning-concept.
This lead to me wondering what a definition format that really
focused on the places would look like; I don't really have an answer
yet, but this in turn lead to a lot of other stuff.
In particular, it seemed to me that if you had the right kind of
information about the places, you could generate the sort of
definiton I pasted above automatically from that.
Then we had the smart.fm thing, which made it obvious that not all
definitions suit all situations; it was very important there to pare
the definitions down to bare essentials. It was also a giant pain.
So I got to thinking about what sort of data we'd have to have to
generate different levels of detail in the definitions.
As part of that, I ended up extracting some data from jbofihe, some
of the data it uses to generate English glosses, like this:
[([klama1 (go-er(s)):] mi /I, me/) /[is, does]/ <<klama /go-ing/>> ([klama2 (destination(s)):] le /the/ zarci /trading place(s)/)]
Which is kind of ugly, but if you strip out anything that's not
between /.../, you get:
I, me [is, does] go-ing the trading place(s)
which is really rather good. Good enough that one of my
girlfriends, who has never studied a word of Lojban, reads my blog
posts that way.
So this left me thinking that I want dictionary software which
could, given the right data, serve *all* of these purposes: formal
dictionary definitons, casual definitions, and glossing (which
implies very detailed information about the individual places).
I don't know exactly what this looks like, but I *think* we can get
all that by just talking about the places themselvles. The
resulting formal definiton might look a bit different; I'm not sure
yet, which is why I'm posting this: I want help coming up with
something awesome.
A reasonable starting point for discussion is what jbovlaste uses to
generate its glosses, I think:
# x1 gets/procures/acquires/obtains/accepts x2 from source x3 [previous possessor not implied]
cpacu1:A;acquire
cpacu2:P;acquired
cpacu3:D;source* of acquisition
cpacu3t:source
And here's what the letters mean:
│ ││ │ │ │
Letter │ Type ││ Noun │ Verb │ Qualifier │ Tag
───────┼─────────────┼┼─────────────────┼───────────┼───────────┼─────────────────
A │ Act ││ X-er(s) │ X-ing │ X-ing │ X-er(s)
D │ Discrete ││ X(s) │ being X │ X │ X
S │ Substance ││ X │ being X │ X │ X
P │ Property ││ X thing(s) │ being X │ X │ X thing(s)
R │ Rev. prop ││ thing(s) X │ being X │ X │ things(s) X
I │ Idiomatic ││ thing(s) X-ing │ X-ing │ X-ing │ thing(s) X-ing
E │ Event ││ X(s) │ being X │ X │ X
That actual format is .. not great :), but the information is
fantastic.
How can we expand that so that we could, in theory, have enough
information to serve all masters? What would the resulting
dictionary definitions look like?
-Robin
--
http://singinst.org/ : Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei". My personal page: http://www.digitalkingdom.org/rlp/
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to lojban@googlegroups.com.
To unsubscribe from this group, send email to lojban+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.