From thinkit8@lycos.com Sat Jul 14 22:25:44 2001 Return-Path: X-Sender: thinkit8@lycos.com X-Apparently-To: lojban@yahoogroups.com Received: (EGP: mail-7_2_0); 15 Jul 2001 05:25:44 -0000 Received: (qmail 65119 invoked from network); 15 Jul 2001 05:25:44 -0000 Received: from unknown (10.1.10.26) by l7.egroups.com with QMQP; 15 Jul 2001 05:25:44 -0000 Received: from unknown (HELO ei.egroups.com) (10.1.2.114) by mta1 with SMTP; 15 Jul 2001 05:25:44 -0000 X-eGroups-Return: thinkit8@lycos.com Received: from [10.1.2.43] by ei.egroups.com with NNFMP; 15 Jul 2001 05:25:44 -0000 Date: Sun, 15 Jul 2001 05:25:39 -0000 To: lojban@yahoogroups.com Subject: Re: a machine-code natural language? Message-ID: <9ir9gj+18ae@eGroups.com> In-Reply-To: User-Agent: eGroups-EW/0.82 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Length: 6194 X-Mailer: eGroups Message Poster X-Originating-IP: 24.5.121.32 From: thinkit8@lycos.com ah...you bring up some interesting points, but perhaps i need to clarify... --- In lojban@y..., Michal Wallace wrote: > On Sat, 14 Jul 2001 thinkit8@l... wrote: > > > i originally found out about lojban when lojbab replied to a post of > > mine to sci.lang about a binary-coded natural language. basically, i > > was thinking of encoding a natural language much the way computers > > encode program code. that is, there are certain bit fields for > > determining the operation, and for supplying the data to be operated > > on. do you think it would be feasible to encode what english (and > > lojban) attempts to express in a manner similar to machine code? > > lojban is a very good bridge to attempt this--with its parsable text > > and unambiguous nature. but it is meant to be a spoken language, and > > as such a vastly different approach would have to be taken. has > > there been any attempts at this? > > > This seems like a pretty interesting question, but it's kind of wide > open.. Just about all the lojban I've seen so far HAS been encoded on > a computer - as ASCII text. > > Now, since lojban only needs 26 symbols (abcdefgijklmnoprstuvxyz.',) > and the space character, it only needs 5 bits per letter, so one step > in the direction you're talking about might be to pack one and a half > symbols into each byte on disk.. well, true. but this isn't really what i'm getting at. the 26 symbols are mainly due to the range of human speech...and i'm thinking of going beyond that and making a purely written language. > Of course if I write a program in ASCII text and run it through a > compiler, I don't just get a shorter version of the code.. I usually > get a much larger number of instructions, spelled out in excruciating > detail. Unless I wrote my program in assembly language, where there's > a one to one mapping between instructions I type and instructions the > computer understands, the machine code and the original program will > take completely different forms. well...sometimes you get executables smaller than your source file. obviously a numeric literal will be much smaller in the executable than the source. but often, yes, the machine code ends up taking up much more space than the source code...especially with RISC. > If I'm getting your meaning, you're talking about a lojban compiler, > not just a compact encoding of the words themselves... It's very > possible to parse lojban and do just about any transformation you like > on the corresponding syntax tree.. You could conceivably even have a > lojban virtual machine that responded in certain ways to different > bridi.. yes, that's pretty much true. the analogy is of source code being lojban text, and the resultant executable being what i'm getting at. > But the question is.. What would the machine do upon seeing this code? > > When I type [print "hello, world"], it's shorthand for an extremely > complex series of instructions dealing with the internal workings of > my computer (like the fact that I want the BIOS to print some > text). When I say "coi rodo", I'm also expressing a huge amount of > information (like the fact that there's more than one person > listening, that I'm addressing all of them, the likelyhood that I've > just arrived or begun speaking).. > > ni'o > > > If you think about language as modelling the world, rather than > listing instructions, then one purpose of a lojban compiler might > simply be to expand as much data as possible from a given bridi.. > > For example, suppose a text adventure game began: > > {do nenri lo ricgri} => in(you, a forest) > > That's plenty of info for a human player because the human has > probably seen a forest or at least can imagine one, and can therefore > imagine trees, gound, sky, the species of trees, sounds in the air, > time of day, and so on. > > But suppose you wanted to convert the game into a 3D virtual world. > One approach is to hire a bunch of 3D animators to build it for you. > In the future, a language compiler might be able to build it for you > itself, just by expanding the short description into a huge detailed > description, much the way conventional compilers turn high level > instructions into detailed instructions. > > In that case, the virtual reality system could conceivably parse the > statement and use it to create a virtual world. Since ricgri is (I > hope) a girzu [gri] of tricu [ric], the world-building software would > know to describe a whole bunch of tricu.. And if there were a > knowledge base in which the VR machine could discover that tricu grow > on top of a loldi made of dertu, then those would have to be there > too... > > Kind of an interesting idea.. > > Cheers, > > - Michal > -------------------------------------------------------------------- -- > let me host you! http://www.sabren.com me: http://www.sabren.net > -------------------------------------------------------------------- -- we perhaps had different things in mind. what you're supposing is not so much the encoding of the language, as much as processing it. a computer that understands what a forest is can process a simple code of "forest", or "group of tall cylindrical static organisms" or whatever the code turns out to be, and draw any sized plot of it. it would not be useful to store everything about a forest in the binary text, when you can just use a description of it, and in processing expand it out to whatever is needed. your analogy of compiled lojban makes a lot of sense to me. what i'm imagining is thinking of a coding mechanism without any regard for spoken communication, and having a one-to-one correspondance with the intended meaning. thus, much like you can't tell if a for loop or a while loop generated a piece of code--you couldn't tell if a compiled lojban text was "le gerku darxi le mlatu" or "le mlatu se darxi le gerku". now, since i thought of this before i even knew of lojban, i'm still thinking of building this from the ground up without trying to make it a compiled form of an existing language. but lojban does seem uniquely suited to this.