From thinkit8@lycos.com Sat Jul 14 22:25:44 2001
Return-Path: <thinkit8@lycos.com>
X-Sender: thinkit8@lycos.com
X-Apparently-To: lojban@yahoogroups.com
Received: (EGP: mail-7_2_0); 15 Jul 2001 05:25:44 -0000
Received: (qmail 65119 invoked from network); 15 Jul 2001 05:25:44 -0000
Received: from unknown (10.1.10.26) by l7.egroups.com with QMQP; 15 Jul 2001 05:25:44 -0000
Received: from unknown (HELO ei.egroups.com) (10.1.2.114) by mta1 with SMTP; 15 Jul 2001 05:25:44 -0000
X-eGroups-Return: thinkit8@lycos.com
Received: from [10.1.2.43] by ei.egroups.com with NNFMP; 15 Jul 2001 05:25:44 -0000
Date: Sun, 15 Jul 2001 05:25:39 -0000
To: lojban@yahoogroups.com
Subject: Re: a machine-code natural language?
Message-ID: <9ir9gj+18ae@eGroups.com>
In-Reply-To: <Pine.LNX.4.21.0107142017150.16381-100000@mercury.sabren.com>
User-Agent: eGroups-EW/0.82
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Length: 6194
X-Mailer: eGroups Message Poster
X-Originating-IP: 24.5.121.32
From: thinkit8@lycos.com

ah...you bring up some interesting points, but perhaps i need to 
clarify...

--- In lojban@y..., Michal Wallace <sabren@m...> wrote:
> On Sat, 14 Jul 2001 thinkit8@l... wrote:
> 
> > i originally found out about lojban when lojbab replied to a post 
of 
> > mine to sci.lang about a binary-coded natural language. 
basically, i 
> > was thinking of encoding a natural language much the way 
computers 
> > encode program code. that is, there are certain bit fields for 
> > determining the operation, and for supplying the data to be 
operated 
> > on. do you think it would be feasible to encode what english 
(and 
> > lojban) attempts to express in a manner similar to machine code? 
> > lojban is a very good bridge to attempt this--with its parsable 
text 
> > and unambiguous nature. but it is meant to be a spoken language, 
and 
> > as such a vastly different approach would have to be taken. has 
> > there been any attempts at this?
> 
> 
> This seems like a pretty interesting question, but it's kind of wide
> open.. Just about all the lojban I've seen so far HAS been encoded 
on
> a computer - as ASCII text.
> 
> Now, since lojban only needs 26 symbols (abcdefgijklmnoprstuvxyz.',)
> and the space character, it only needs 5 bits per letter, so one 
step
> in the direction you're talking about might be to pack one and a 
half
> symbols into each byte on disk..

well, true. but this isn't really what i'm getting at. the 26 
symbols are mainly due to the range of human speech...and i'm 
thinking of going beyond that and making a purely written language.

> Of course if I write a program in ASCII text and run it through a
> compiler, I don't just get a shorter version of the code.. I usually
> get a much larger number of instructions, spelled out in 
excruciating
> detail. Unless I wrote my program in assembly language, where 
there's
> a one to one mapping between instructions I type and instructions 
the
> computer understands, the machine code and the original program will
> take completely different forms.

well...sometimes you get executables smaller than your source file. 
obviously a numeric literal will be much smaller in the executable 
than the source. but often, yes, the machine code ends up taking up 
much more space than the source code...especially with RISC.

> If I'm getting your meaning, you're talking about a lojban compiler,
> not just a compact encoding of the words themselves... It's very
> possible to parse lojban and do just about any transformation you 
like
> on the corresponding syntax tree.. You could conceivably even have a
> lojban virtual machine that responded in certain ways to different
> bridi..

yes, that's pretty much true. the analogy is of source code being 
lojban text, and the resultant executable being what i'm getting at.

> But the question is.. What would the machine do upon seeing this 
code?
> 
> When I type [print "hello, world"], it's shorthand for an extremely
> complex series of instructions dealing with the internal workings of
> my computer (like the fact that I want the BIOS to print some
> text). When I say "coi rodo", I'm also expressing a huge amount of
> information (like the fact that there's more than one person
> listening, that I'm addressing all of them, the likelyhood that I've
> just arrived or begun speaking)..
> 
> ni'o
> 
> 
> If you think about language as modelling the world, rather than
> listing instructions, then one purpose of a lojban compiler might
> simply be to expand as much data as possible from a given bridi..
> 
> For example, suppose a text adventure game began:
> 
> {do nenri lo ricgri} => in(you, a forest)
> 
> That's plenty of info for a human player because the human has
> probably seen a forest or at least can imagine one, and can 
therefore
> imagine trees, gound, sky, the species of trees, sounds in the air,
> time of day, and so on.
> 
> But suppose you wanted to convert the game into a 3D virtual world.
> One approach is to hire a bunch of 3D animators to build it for you.
> In the future, a language compiler might be able to build it for you
> itself, just by expanding the short description into a huge detailed
> description, much the way conventional compilers turn high level
> instructions into detailed instructions.
> 
> In that case, the virtual reality system could conceivably parse the
> statement and use it to create a virtual world. Since ricgri is (I
> hope) a girzu [gri] of tricu [ric], the world-building software 
would
> know to describe a whole bunch of tricu.. And if there were a
> knowledge base in which the VR machine could discover that tricu 
grow
> on top of a loldi made of dertu, then those would have to be there
> too...
> 
> Kind of an interesting idea..
> 
> Cheers,
> 
> - Michal
> --------------------------------------------------------------------
--
> let me host you! http://www.sabren.com me: 
http://www.sabren.net
> --------------------------------------------------------------------
--

we perhaps had different things in mind. what you're supposing is 
not so much the encoding of the language, as much as processing it. 
a computer that understands what a forest is can process a simple 
code of "forest", or "group of tall cylindrical static organisms" or 
whatever the code turns out to be, and draw any sized plot of it. it 
would not be useful to store everything about a forest in the binary 
text, when you can just use a description of it, and in processing 
expand it out to whatever is needed.

your analogy of compiled lojban makes a lot of sense to me. what i'm 
imagining is thinking of a coding mechanism without any regard for 
spoken communication, and having a one-to-one correspondance with the 
intended meaning. thus, much like you can't tell if a for loop or a 
while loop generated a piece of code--you couldn't tell if a compiled 
lojban text was "le gerku darxi le mlatu" or "le mlatu se darxi le 
gerku". now, since i thought of this before i even knew of lojban, 
i'm still thinking of building this from the ground up without trying 
to make it a compiled form of an existing language. but lojban does 
seem uniquely suited to this.