Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

Implementing a spellchecker on 64 kB of RAM back in the 1970s led to a compression algorithm that's technically unbeaten and part of it is still in use today

⁨539⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨month⁩ ago⁩ by ⁨cm0002@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://www.pcgamer.com/software/implementing-a-spellchecker-on-64-kb-of-ram-back-in-the-1970s-led-to-a-compression-algorithm-thats-technically-unbeaten-and-part-of-it-is-still-in-use-today/

source

Comments

Sort:hotnewtop
  • NoSpotOfGround@lemmy.world ⁨1⁩ ⁨month⁩ ago

    The real meat of the story is in the referenced blog post: blog.codingconfessions.com/…/how-unix-spell-ran-i…:

    TL;DR

    If you’re short on time, here’s the key engineering story:

    • McIlroy’s first innovation was a clever linguistics-based stemming algorithm that reduced the dictionary to just 25,000 words while improving accuracy.
    • For fast lookups, he initially used a Bloom filter—perhaps one of its first production uses. Interestingly, Dennis Ritchie provided the implementation. They tuned it to have such a low false positive rate that they could skip actual dictionary lookups.
    • When the dictionary grew to 30,000 words, the Bloom filter approach became impractical, leading to innovative hash compression techniques.
    • They computed that 27-bit hash codes would keep collision probability acceptably low, but needed compression.
    • McIlroy’s solution was to store differences between sorted hash codes, after discovering these differences followed a geometric distribution.
    * Using Golomb's code, a compression scheme designed for geometric distributions, he achieved 13.60 bits per word—remarkably close to the theoretical minimum of 13.57 bits.
    
    • Finally, he partitioned the compressed data to speed up lookups, trading a small memory increase (final size ~14 bits per word) for significantly faster performance.
    source
    • ch00f@lemmy.world ⁨1⁩ ⁨month⁩ ago

      For anyone struggling, lemmy web interface added the colon into the URL for the blog post link. Here’s a clickable version without the colon:

      blog.codingconfessions.com/…/how-unix-spell-ran-i…

      source
      • 0x0@programming.dev ⁨1⁩ ⁨month⁩ ago

        here’s another

        source
      • NoSpotOfGround@lemmy.world ⁨1⁩ ⁨month⁩ ago

        Thanks, and sorry about that! I removed the colon from near my URL now, just in case.

        source
    • db2@lemmy.world ⁨1⁩ ⁨month⁩ ago

      Thank you

      source
    • potate@lemmy.ca ⁨1⁩ ⁨month⁩ ago

      The blog post is an incredible read.

      source
  • ColeSloth@discuss.tchncs.de ⁨1⁩ ⁨month⁩ ago

    Old school coding and game programing was magic. The clever tricks that nes game programmers came up with to work around hardware limitations was phenomenal. It went way beyond the bushes and clouds in mario being the same thing but in a different color.

    source
    • xavier666@lemm.ee ⁨1⁩ ⁨month⁩ ago

      I am still in awe of the fast inverse square root method used in QuakeIII. Good times.

      source
      • VindictiveJudge@lemmy.world ⁨1⁩ ⁨month⁩ ago

        IIRC, someone got with the author of that bit of code to ask how they came up with it, but they had simply learned it from someone else. So they tracked them down and found that they had also learned it from someone else. They eventually landed on Greg Walsh as the original author, but for a bit the code had no known origin.

        source
      • Blooper@lemmynsfw.com ⁨1⁩ ⁨month⁩ ago

        I read this article and I know it’s written in English, but I’ve accepted defeat in trying to understand it.

        I write code for a living and I’m doing my best to ignore the feelings of inadequacy I’m currently experiencing.

        source
        • -> View More Comments
    • REDACTED@infosec.pub ⁨1⁩ ⁨month⁩ ago

      Check out demoscene. The mind-blowing things they create with only with kilobytes…

      source
      • ColeSloth@discuss.tchncs.de ⁨1⁩ ⁨month⁩ ago

        Yeah. The average NES game was only 200kb.

        source
      • Regrettable_incident@lemmy.world ⁨1⁩ ⁨month⁩ ago

        I had a zx81, 1k ram, still could play pong.

        source
      • xavier666@lemm.ee ⁨1⁩ ⁨month⁩ ago

        Thanks for this. Got a burst of nostalgia

        source
      • noxypaws@pawb.social ⁨1⁩ ⁨month⁩ ago

        Here’s one of my recent-ish faves on GB, music is so damn catchy

        www.youtube.com/watch?v=GleZBHhOsmE

        source
    • General_Effort@lemmy.world ⁨1⁩ ⁨month⁩ ago

      nes game programmers

      Were these guys even Real Programmers?

      Here’s a great talk about a guy who worked on a 1982 game for the Atari 2600, a game console first released in 1977. It’s a fascinating insight into the early evolution of computing. They didn’t work around limitations. They used a machine to do whatever it could. If anyone has ever wondered by what standard C is a high-level language, this is for you. Or if you want to know how we ever could have developed something to connect the abstract logic of some algorithm with some glowing pixels on a screen.

      Pitfall Classic Postmortem With David Crane Panel at GDC 2011 (Atari 2600)

      There’s an ancient myth that a god created the first pair of tongs. Tongs need to be forged in a smithy. Obviously, you need tongs for that.

      source
    • jasoman@lemmy.world ⁨1⁩ ⁨month⁩ ago

      In oblivion on Xbox they even reboot the console on a loading screen to clear up ram.

      source
      • Romkslrqusz@lemm.ee ⁨1⁩ ⁨month⁩ ago

        *Morrowind

        source
        • -> View More Comments
    • sirboozebum@lemmy.world ⁨1⁩ ⁨month⁩ ago

      Restrictions and boundaries spur innovation.

      source
      • jdeath@lemm.ee ⁨1⁩ ⁨month⁩ ago

        any constraints, really. pretty cool!

        source
    • Valmond@lemmy.world ⁨1⁩ ⁨month⁩ ago

      The old scrollers in non-consoles (consoles had hardware scrollers) used funky tech too to reduce overdraw. Fun times.

      source
  • troyunrau@lemmy.ca ⁨1⁩ ⁨month⁩ ago

    Long article for one sentence of trivia and no info on the algo itself. The death of the internet is upon us.

    source
    • adespoton@lemmy.ca ⁨1⁩ ⁨month⁩ ago

      Doesn’t even name the algorithm, and somehow spells LZMA wrong, despite just having written it out longhand.

      Well, it’s PC Gamer.

      source
      • troyunrau@lemmy.ca ⁨1⁩ ⁨month⁩ ago

        Probably mostly AI written.

        source
    • GrabtharsHammer@lemmy.world ⁨1⁩ ⁨month⁩ ago

      I’d like to imagine they took the short trivia fact and applied the inverse of the compression algorithm to bloat it into something that satisfied the editor.

      source
    • rice@lemmy.org ⁨1⁩ ⁨month⁩ ago

      The blog post it links to has all the info, but it is more of a series of changes to the dictionary instead of 1 set thing

      source
  • L3s@lemmy.world ⁨1⁩ ⁨month⁩ ago

    !lemmysilver

    source
    • LemmySilverBot@lemmy.world [bot] ⁨1⁩ ⁨month⁩ ago

      Thank you for voting. You can vote again in 24 hours. leaderboard

      source
  • lud@lemm.ee ⁨1⁩ ⁨month⁩ ago

    What’s the Weissman score?

    source
    • fatal_internal_error@lemmy.world ⁨1⁩ ⁨month⁩ ago

      So it’s gonna be a dick measuring contest?

      source
      • agelord@lemmy.world ⁨1⁩ ⁨month⁩ ago

        I’ll measure the most.

        source
  • SirFasy@lemmy.world ⁨1⁩ ⁨month⁩ ago

    If it aint broke, don’t fix it.

    source
  • 0x0@programming.dev ⁨1⁩ ⁨month⁩ ago

    Only 1 GiB of RAM? Moooom!
    Shut up Johnny, Voyager’s still out there with way less.

    source
    • rmuk@feddit.uk ⁨1⁩ ⁨month⁩ ago

      Yeah, but I’ve not got two hundred Firefox tabs open on Voyager.

      source