Will a flagship (>60T training bytes) open-weights LLM from Meta which doesn't use a tokenizer be released in 2025?
Plus
3
Ṁ3442026
43%
chance
1D
1W
1M
ALL
Resolves YES if Meta releases weights of an LLM trained on at least 60T bytes of data (roughly equivalent to the 15T tokens used to train the Llama 3.1 models) in 2025 which does not use standard fixed-vocabulary tokenization.
A qualifying model must be released under a license roughly as permissive as Llama 3.1.
This market was spurred by recent research from Meta showing a proof-of-concept for a tokenizer-free LLM. A qualifying model from Meta does not need to use the patching technique from this paper as long as it's not using tokenization.
https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
Will OpenAI release a tokenizer with vocab size > 150k by end of 2024?
42% chance
When will the first fully open-source advanced LLM (data, code, weights) be released?
Will Meta release an open source language model that outperforms GPT-4 by the end of 2024
67% chance
Will OpenAI release an LLM moderation tool in 2024?
67% chance
Will Meta AI's MEGABYTE architecture be used in the next-gen LLMs?
42% chance
Will Meta ever deploy its best LLM without releasing its model weights up through AGI?
79% chance
Will any open-source Transformers LLM model that function as a dense mixture of experts be released by end of 2024?
46% chance
Will the next major LLM by OpenAI use a new tokenizer?
77% chance
When will OpenAI release a more capable LLM?
Will OpenAI release weights to a model designed to be easily interpretable (2024)?
8% chance