• 0 Posts
  • 116 Comments
Joined 2 years ago
cake
Cake day: July 21st, 2023

help-circle





  • Q4 will give you like 98% of quality vs Q8 and like twice the speed + much longer context lengths.

    If you don’t need the full context length, you can try loading the model at shorter context length, meaning you can load more layers on the GPU, meaning it will be faster.

    And you can usually configure your inference engine to keep the model loaded at all times, so you’re not loosing so much time when you first start the model up.

    Ollama attempts to dynamically load the right context lenght for your request, but in my experience that just results in really inconsistent and long time to first token.

    The nice thing about vLLM is that your model is always loaded, so you don’t have to worry about that. But then again, it needs much more VRAM.


  • In my experience anything similar to qwen-2.5:32B comes closest to gpt-4o. I think it should run on your setup. the 14b model is alright too, but definitely inferior. Mistral Small 3 also seems really good. anything smaller is usually really dumb and I doubt it would work for you.

    You could probably run some larger 70b models at a snails pace too.

    Try the Deepseek R1 - qwen 32b distill, something like deepseek-r1:32b-qwen-distill-q4_K_M (name on ollama) or some finefune of it. It’ll be by far the smartest model you can run.

    There are various fine tunes that remove some of the censorship (ablated/abliterated) or are optimized for RP, which might do better for your use case. But personally haven’t used them so I can’t promise anything.










  • I don’t understand how everyone can be so blind to the surveillance that already exists.

    Literally all your communications or purchase or browsing history, 90% of people’s photos and contacts, everything you ever say near your phone/smart devices, your health data with devices like fitbit, cm resolution spy satelites, 4D maps of the entire globe being created via services like Pokemon Go, phones create and store in the cloud high resolution 3D maps of your face, mesh networked devices like Alexa now surveil without you even having internet access, your home and your exact location down to a meter are already being live spied on. Not to mention full remote access to all your devices.

    Sometimes with a thin veneer of privacy on top of it, like Apple pretends to have.

    Basically the only part of you that the surveillance state doesn’t constantly surveil already is your butthole.

    Even avoiding just 10% of this surveillance in your daily life is almost impossible.