Q*Bert Reynolds@sh.itjust.workstoTechnology@lemmy.ml•1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.
4·
7 months agoSays 1-bit then goes on to describe inputs as -1, 0, or 1. That’s 2-bit. Am I missing something here?
I think they’re giving you shit for using github.