Implementing a spellchecker on 64 kB of RAM back in the 1970s led to a compression algorithm that's technically unbeaten and part of it is still in use today

Submitted ⁨⁨10⁩ ⁨months⁩ ago⁩ by ⁨cm0002@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://www.pcgamer.com/software/implementing-a-spellchecker-on-64-kb-of-ram-back-in-the-1970s-led-to-a-compression-algorithm-thats-technically-unbeaten-and-part-of-it-is-still-in-use-today/

source

Comments

Sort:hotnew top

NoSpotOfGround@lemmy.world ⁨10⁩ ⁨months⁩ ago
The real meat of the story is in the referenced blog post: blog.codingconfessions.com/…/how-unix-spell-ran-i…:

TL;DR

If you’re short on time, here’s the key engineering story:

McIlroy’s first innovation was a clever linguistics-based stemming algorithm that reduced the dictionary to just 25,000 words while improving accuracy.

For fast lookups, he initially used a Bloom filter—perhaps one of its first production uses. Interestingly, Dennis Ritchie provided the implementation. They tuned it to have such a low false positive rate that they could skip actual dictionary lookups.

When the dictionary grew to 30,000 words, the Bloom filter approach became impractical, leading to innovative hash compression techniques.

They computed that 27-bit hash codes would keep collision probability acceptably low, but needed compression.

McIlroy’s solution was to store differences between sorted hash codes, after discovering these differences followed a geometric distribution.

* Using Golomb's code, a compression scheme designed for geometric distributions, he achieved 13.60 bits per word—remarkably close to the theoretical minimum of 13.57 bits.

Finally, he partitioned the compressed data to speed up lookups, trading a small memory increase (final size ~14 bits per word) for significantly faster performance.
source
- ch00f@lemmy.world ⁨10⁩ ⁨months⁩ ago
  For anyone struggling, lemmy web interface added the colon into the URL for the blog post link. Here’s a clickable version without the colon:
  
  blog.codingconfessions.com/…/how-unix-spell-ran-i…
  
  source
  - 0x0@programming.dev ⁨10⁩ ⁨months⁩ ago
    here’s another
    
    source
  - NoSpotOfGround@lemmy.world ⁨10⁩ ⁨months⁩ ago
    Thanks, and sorry about that! I removed the colon from near my URL now, just in case.
    
    source
- db2@lemmy.world ⁨10⁩ ⁨months⁩ ago
  Thank you
  
  source
- potate@lemmy.ca ⁨10⁩ ⁨months⁩ ago
  The blog post is an incredible read.
  
  source
ColeSloth@discuss.tchncs.de ⁨10⁩ ⁨months⁩ ago
Old school coding and game programing was magic. The clever tricks that nes game programmers came up with to work around hardware limitations was phenomenal. It went way beyond the bushes and clouds in mario being the same thing but in a different color.

source
- xavier666@lemm.ee ⁨10⁩ ⁨months⁩ ago
  I am still in awe of the fast inverse square root method used in QuakeIII. Good times.
  
  source
  - VindictiveJudge@lemmy.world ⁨10⁩ ⁨months⁩ ago
    IIRC, someone got with the author of that bit of code to ask how they came up with it, but they had simply learned it from someone else. So they tracked them down and found that they had also learned it from someone else. They eventually landed on Greg Walsh as the original author, but for a bit the code had no known origin.
    
    source
  - Blooper@lemmynsfw.com ⁨10⁩ ⁨months⁩ ago
    I read this article and I know it’s written in English, but I’ve accepted defeat in trying to understand it.
    
    I write code for a living and I’m doing my best to ignore the feelings of inadequacy I’m currently experiencing.
    
    source
    -> View More Comments
- REDACTED@infosec.pub ⁨10⁩ ⁨months⁩ ago
  Check out demoscene. The mind-blowing things they create with only with kilobytes…
  
  source
  - ColeSloth@discuss.tchncs.de ⁨10⁩ ⁨months⁩ ago
    Yeah. The average NES game was only 200kb.
    
    source
  - Regrettable_incident@lemmy.world ⁨10⁩ ⁨months⁩ ago
    I had a zx81, 1k ram, still could play pong.
    
    source
  - xavier666@lemm.ee ⁨10⁩ ⁨months⁩ ago
    Thanks for this. Got a burst of nostalgia
    
    source
  - noxypaws@pawb.social ⁨10⁩ ⁨months⁩ ago
    Here’s one of my recent-ish faves on GB, music is so damn catchy
    
    www.youtube.com/watch?v=GleZBHhOsmE
    
    source
- General_Effort@lemmy.world ⁨10⁩ ⁨months⁩ ago
  
  nes game programmers
  
  Were these guys even Real Programmers?
  
  Here’s a great talk about a guy who worked on a 1982 game for the Atari 2600, a game console first released in 1977. It’s a fascinating insight into the early evolution of computing. They didn’t work around limitations. They used a machine to do whatever it could. If anyone has ever wondered by what standard C is a high-level language, this is for you. Or if you want to know how we ever could have developed something to connect the abstract logic of some algorithm with some glowing pixels on a screen.
  
  Pitfall Classic Postmortem With David Crane Panel at GDC 2011 (Atari 2600)
  
  There’s an ancient myth that a god created the first pair of tongs. Tongs need to be forged in a smithy. Obviously, you need tongs for that.
  
  source
- jasoman@lemmy.world ⁨10⁩ ⁨months⁩ ago
  In oblivion on Xbox they even reboot the console on a loading screen to clear up ram.
  
  source
  - Romkslrqusz@lemm.ee ⁨10⁩ ⁨months⁩ ago
    *Morrowind
    
    source
    -> View More Comments
- sirboozebum@lemmy.world ⁨10⁩ ⁨months⁩ ago
  Restrictions and boundaries spur innovation.
  
  source
  - jdeath@lemm.ee ⁨10⁩ ⁨months⁩ ago
    any constraints, really. pretty cool!
    
    source
- Valmond@lemmy.world ⁨10⁩ ⁨months⁩ ago
  The old scrollers in non-consoles (consoles had hardware scrollers) used funky tech too to reduce overdraw. Fun times.
  
  source
troyunrau@lemmy.ca ⁨10⁩ ⁨months⁩ ago
Long article for one sentence of trivia and no info on the algo itself. The death of the internet is upon us.

source
- adespoton@lemmy.ca ⁨10⁩ ⁨months⁩ ago
  Doesn’t even name the algorithm, and somehow spells LZMA wrong, despite just having written it out longhand.
  
  Well, it’s PC Gamer.
  
  source
  - troyunrau@lemmy.ca ⁨10⁩ ⁨months⁩ ago
    Probably mostly AI written.
    
    source
- GrabtharsHammer@lemmy.world ⁨10⁩ ⁨months⁩ ago
  I’d like to imagine they took the short trivia fact and applied the inverse of the compression algorithm to bloat it into something that satisfied the editor.
  
  source
- rice@lemmy.org ⁨10⁩ ⁨months⁩ ago
  The blog post it links to has all the info, but it is more of a series of changes to the dictionary instead of 1 set thing
  
  source
L3s@lemmy.world ⁨10⁩ ⁨months⁩ ago
!lemmysilver

source
- LemmySilverBot@lemmy.world [bot] ⁨10⁩ ⁨months⁩ ago
  Thank you for voting. You can vote again in 24 hours. leaderboard
  
  source
lud@lemm.ee ⁨10⁩ ⁨months⁩ ago
What’s the Weissman score?

source
- fatal_internal_error@lemmy.world ⁨10⁩ ⁨months⁩ ago
  So it’s gonna be a dick measuring contest?
  
  source
  - agelord@lemmy.world ⁨10⁩ ⁨months⁩ ago
    I’ll measure the most.
    
    source
SirFasy@lemmy.world ⁨10⁩ ⁨months⁩ ago
If it aint broke, don’t fix it.

source
0x0@programming.dev ⁨10⁩ ⁨months⁩ ago
Only 1 GiB of RAM? Moooom!
Shut up Johnny, Voyager’s still out there with way less.

source
- rmuk@feddit.uk ⁨10⁩ ⁨months⁩ ago
  Yeah, but I’ve not got two hundred Firefox tabs open on Voyager.
  
  source