Comment

autotldr@lemmings.world [bot] ⁨1⁩ ⁨year⁩ ago

This is the best summary I could come up with:

When an algorithm or model can accurately guess the next piece of data in a sequence, it shows it’s good at spotting these patterns.

The study’s results suggest that even though Chinchilla 70B was mainly trained to deal with text, it’s surprisingly effective at compressing other types of data as well, often better than algorithms specifically designed for those tasks.

This opens the door for thinking about machine learning models as not just tools for text prediction and writing but also as effective ways to shrink the size of various types of data.

Over the past two decades, some computer scientists have proposed that the ability to compress data effectively is akin to a form of general intelligence.

The idea is rooted in the notion that understanding the world often involves identifying patterns and making sense of complexity, which, as mentioned above, is similar to what good data compression does.

The relationship between compression and intelligence is a matter of ongoing debate and research, so we’ll likely see more papers on the topic emerge soon.

The original article contains 709 words, the summary contains 175 words. Saved 75%. I’m a bot and I’m open source!

source

Sort:hotnew top

iopq@lemmy.world ⁨1⁩ ⁨year⁩ ago
How do those figures compare to state of the art compression?

source
- NegativeInf@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Chart This chart uses raw compression as well as adjusted. Adjusted includes the size of the model. For a lot of this, it really only works well on server scale data because the model for compressing them is so large. But it also leads some credence to other papers that show you can use compression to build generative models and k means to get decent results.
  
  source
  - iopq@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Lossless JPEG-XL and Webp are much better at compression:
    
    siipo.la/…/lossless-comparison-median-file-size-1…
    
    Source: siipo.la/…/whats-the-best-lossless-image-format-c…
    
    that means that 58.5% for PNG could be down to 30% when using a state-of-the-art lossless compression which is better than 48% by Chinchilla 70B
    
    source