• jatone@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      13
      ·
      edit-2
      4 days ago

      quants are pretty basic. switching from floats to ints (faster instruction sets) are the well known issues. both those are related to information theory, but there are other things I legally can’t mention. shrug. suffice to say the model sizes are going to be decreasing dramatically.

      edit: the first two points require reworking the base infrastructure to support which is why they havent hit widespread adoption. but the research showing that 3 bits is as good as 64 is intuitive once you tie the original inspiration for some of the AI designs. that reduction alone means you can get 21x reduction in model size is pretty solid.

      • self@awful.systems
        link
        fedilink
        English
        arrow-up
        23
        ·
        4 days ago

        both those are related to information theory, but there are other things I legally can’t mention. shrug.

        hahahaha fuck off with this. no, the horseshit you’re fetishizing doesn’t fix LLMs. here’s what quantization gets you:

        • the LLM runs on shittier hardware
        • the LLM works worse too
        • that last one’s kinda bad when the technology already works like shit

        anyway speaking of basic information theory:

        but the research showing that 3 bits is as good as 64 is intuitive once you tie the original inspiration for some of the AI designs.

        lol

        • eestileib@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          12
          ·
          edit-2
          4 days ago

          Honestly, the research showing that a schlong that’s 3mm wide is just as satisfying as one that’s 64 is intuitive once you tie the original inspiration for some of the sex positions.

        • khalid_salad@awful.systems
          link
          fedilink
          English
          arrow-up
          6
          ·
          3 days ago

          It’s actually super easy to increase the accuracy of LLMs.

          import pytorch # or ollama or however you fucking dorks use this nonsense
          from decimal import Decimal
          

          I left out all the other details because it’s pretty intuitive why it works if you understand why floats have precision issues.

        • killingspark@feddit.org
          link
          fedilink
          English
          arrow-up
          10
          ·
          4 days ago

          I have seen these 3 bit ai papers on hacker news a few times. And the takeaway apparently is: the current models are being pretty shitty at what we want them to do, and we can reach a similar (but slightly worse) level of shittyness with 3 bits.

          But that doesn’t say anything about how both technologies could progress in the future. I guess you can compensate for having only three bits to pass between nodes by just having more nodes. But that doesn’t really seem helpful, neither for storage nor compute.

          Anyways yeah it always strikes me as a kind of trend that maybe has an application in a very specific niche but is likely bullshit if applied to the general case

          • V0ldek@awful.systems
            link
            fedilink
            English
            arrow-up
            12
            ·
            4 days ago

            If anything that sounds like an indictment? Like, the current models are so incredibly fucking bad that we could achieve the same with three bits and a ham sandwich

          • BlueMonday1984@awful.systems
            link
            fedilink
            English
            arrow-up
            2
            ·
            3 days ago

            Far as I can tell, the only real benefit here is significant energy savings, which would take LLMs from “useless waste of a shitload of power” to “useless waste of power”.