[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lojban] Re: Lojban tokenizer for machine learning, first version
- To: lojban@googlegroups.com
- Subject: [lojban] Re: Lojban tokenizer for machine learning, first version
- From: scope845hlang343jbo@icebubble.org
- Date: Tue, 14 Jun 2022 14:58:57 +0000
- Arc-authentication-results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of scope845hlang343jbo@icebubble.org designates 2607:f2f8:a1d8::b19:0:f0b as permitted sender) smtp.mailfrom=scope845hlang343jbo@icebubble.org
- Arc-authentication-results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of scope845hlang343jbo@icebubble.org designates 2607:f2f8:a1d8::b19:0:f0b as permitted sender) smtp.mailfrom=scope845hlang343jbo@icebubble.org
- Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version:message-id :in-reply-to:date:references:subject:to:from:sender:dkim-signature; bh=yBDHK0Cj+Nnb16FovlugTk/RhmHxqpF+wFeoINMHxbg=; b=y12pTUVs9u5lyba5AZNKf/zLVzgs00J1s9j4UJA4kVvXMl9t0KiKDMDlEueFj9zHLQ TWEuU4vC07baTaDLdm8pjbMC+ZIPOEPlFvqXw2kxJ/TJ00o4olVd0zTedyYJjxjiciab eRkcME854wcIE5PDad+kBjGIoBYGpjh89BAZk05cyQn8MIUUpkr6obIuJozZdX3Else3 p8/R9DAzKQREXKBxC+TRA40I4UCV04C6Dtj8RLzgH1f4bwwIz7z8nW1SjtuSmaAvW6T/ kI6cTMlighGawMKLERQ57nNHTBFTripxStxZjf5WSIhjdA8dIIeK7oO+cp5P7zBNYsbs gD2Q==
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:message-id:in-reply-to:date:references:subject:to:from; bh=Ei1YOsIuaIyW2Crzi+e4uvb2PbNfcdO8M1VUWPU03nE=; b=pSUCELlucXTG1c+SzOPDwpDOHuk4i9iFmjyH4dq3t47c31tyG4Ld7A3sw3y7T0nDCA KS+xHidT7vt0AMXhdxQUg+Nt6l3KA7yRXtjZZ79TXwsO9SdV7mGBtYm3/EhSpRO35w9P 8ZUOMuP6ffDd81ItLO9pycJzRs/pADg0IUjJXeBpRTQNu1csfJk8KrSBk2UR9I/XzJ47 danOmKk+oQm8b7zIi5ydQs3SaUorc+xZYQq2d+FmZMbXohGvybXpcveUZNmODDG6BlbM CdMio5Hcn39T9hPQnW5W7DVXEJm7jftkvsJ6jkTntXXD19tVntF3xDQ7OKJ7o2LgdsBt pAMg==
- Arc-seal: i=2; a=rsa-sha256; t=1655235208; cv=pass; d=google.com; s=arc-20160816; b=veS3zR3/20O6fO1/ID8XSlZWfQ14iwSG957/Y9csgKovQ5DnxRzejftsjR9RewcXcq mvi/6fJvb4DxR3Ipm1+FmYRYbcvlvhUIS1wzrxWD5snGWWts33PonX/ifka5+cHPbr5m 6Y38lV/yPyXA489sgxS7TG4UUANSM01G0C2IZuv2AZRCPGAnv4opEc7ec9aHyhcPlW/d sI5AysDOnLsjBFZPjRRIU6nQUVOj8GfiYzAe5MhxBgNXaYLBhqvxMBMKhLzGi9+6yIeM RcjVXmFRaj/oh2sNercyHsgG430oiShJIXQPqYRHBxyTG+AV+4kfJ2Sx6gBgavOWLUV6 CutQ==
- Arc-seal: i=1; a=rsa-sha256; t=1655235205; cv=none; d=google.com; s=arc-20160816; b=XBpzzp6lf/SUU43cqaxK2NIzTjbMai8G7Bode69hXxqvnNZP/2eyhdwA0hPSQ/EWNF dvFbjWLDDvINK6N113lsJ+zXfsISrZqZ79MZSeY85OGQ52HufB/yJSwg22mgTcgsgeJ0 J5oiSp2YuX144tAC2OD60zZ8GHwXg4ZL5tJKKE5ujZIWurew+FkY+IdSE+6uxDvle7nQ E/yEYrciqO1mXojNa6Lug8zxGqjXYgi1lHuoFqsyzwGYZY8o7oF2Vj/9rQwLJ6F7gtNQ AMDaLAxDJF/E6t4eVmi41IdYKPnogo62FsUnCr81MikDgOmf1ykssVLFMGEqO/VSCHj/ F6ag==
- Delivery-date: Tue, 14 Jun 2022 12:33:31 -0700
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:from:to:subject:references:date:in-reply-to:message-id :mime-version:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=yBDHK0Cj+Nnb16FovlugTk/RhmHxqpF+wFeoINMHxbg=; b=YUePi2mnZTsnHd1NoV3vCmSzUzGf9AK1Liw80qBFDDq5IHgYmAj+VE0bI0/RNzeR+x ydU8Mi2195qpSlMxxHfjI6vH/SsyRCj3oyrqJD/WyXQ97IhMyg0v1B8Y9AOKLgzU1P3t 8pa6GvMiWsIis51RtDbrbCk7w9vEhOHy4aDE/ghyH1jkZdOSOv8NbmRwEx5bniVYMXZD KUJPQJDQIHpo66825r+7HHEZqnLBb7OGq6EBPXzOuqwUxUfysHnKM0ZSbPPYjKydcQlW J1o8OpXWR7PJ9F2U/1JifckNGE4MzKzYfG13nM5RMuj9+/U3EZf71QcSRd+6Dd+26nSK in4g==
- Envelope-to: lojban-list-archive@lojban.org
- In-reply-to: <d1a72031-b3ed-4164-bfba-bfa5fa65893bn@googlegroups.com> (Oleg Parashchenko's message of "Sun, 12 Jun 2022 01:32:55 -0700 (PDT)")
- List-archive: <https://groups.google.com/group/lojba>
- List-help: <https://groups.google.com/support/>, <mailto:lojban+help@googlegroups.com>
- List-id: <lojban.googlegroups.com>
- List-post: <https://groups.google.com/group/lojban/post>, <mailto:lojban@googlegroups.com>
- List-subscribe: <https://groups.google.com/group/lojban/subscribe>, <mailto:lojban+subscribe@googlegroups.com>
- List-unsubscribe: <mailto:googlegroups-manage+1004133512417+unsubscribe@googlegroups.com>, <https://groups.google.com/group/lojban/subscribe>
- Mailing-list: list lojban@googlegroups.com; contact lojban+owners@googlegroups.com
- References: <d1a72031-b3ed-4164-bfba-bfa5fa65893bn@googlegroups.com>
- Reply-to: lojban@googlegroups.com
- Sender: lojban@googlegroups.com
Oleg Parashchenko <olpa@uucode.com> writes:
> I've just released the first version of a lojban tokenizer. It is intended
> for use in machine learning applications and therefore is a bit different
> from a linguistic tokenizer. In particular, it does sub-word tokenization.
>
> Additionally, there is a lexer, which can be used to develop alternative
> tokenizers.
.uanai How is that different from any of the other Lojban parsers that
have been written? I am interested in your lexer, however. Which
version of the grammar did you use? The PEG? I'd be very curious to
see how your lexer distinguishes between lujvo and fu'ivla.
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/86letzxpe6.fsf%40cmarib.ramside.