xkcd #2934: Bloom Filter

Submitted ⁨⁨8⁩ ⁨months⁩ ago⁩ by ⁨randomaccount43543@lemmy.world⁩ to ⁨xkcd@lemmy.world⁩

https://imgs.xkcd.com/comics/bloom_filter_2x.png

Alt text:

Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they’re the right one you can never be sure.

source

Comments

Sort:hotnew top

hydroptic@sopuli.xyz ⁨8⁩ ⁨months⁩ ago
Do you want to learn about probabilistic data structures?

maybe

no
source
LPThinker@lemmy.world ⁨8⁩ ⁨months⁩ ago
For anyone interested in learning more about bloom filters, this is a technical but extremely accessible and easy to follow introduction to them, including some excellent interactive visualizations: https://samwho.dev/bloom-filters/

source
- gbuttersnaps@programming.dev ⁨8⁩ ⁨months⁩ ago
  This was a great read, thanks for sharing!
  
  source
randomaccount43543@lemmy.world ⁨8⁩ ⁨months⁩ ago
explainxkcd.com/2934

source
Xeroxchasechase@lemmy.world ⁨8⁩ ⁨months⁩ ago
Is’nt bloom filter a shader that makes the picture look hazy and bright?

source
- Ephera@lemmy.ml ⁨8⁩ ⁨months⁩ ago
  That’s the bloom shader effect: en.wikipedia.org/wiki/Bloom_(shader_effect)
  
  But yeah, some people might refer to that as “bloom filter”, although it’s not what’s meant here.
  
  source
  - Xeroxchasechase@lemmy.world ⁨8⁩ ⁨months⁩ ago
    Thanks
    
    source
- Die4Ever@programming.dev ⁨8⁩ ⁨months⁩ ago
  en.m.wikipedia.org/wiki/Bloom_filter
  
  source
- rockSlayer@lemmy.world ⁨8⁩ ⁨months⁩ ago
  A bloom filter is a data structure that is most useful in creating a spell check algorithm
  
  source
  - hydroptic@sopuli.xyz ⁨8⁩ ⁨months⁩ ago
    That’s definitely not what they’re most useful for. I mean, you probably can use a bloom filter for implementing spell check, but saying that’s where they’re most useful severely misses the point of probabilistic set membership queries.
    
    Bloom filters and their relatives are great when you have a huge set of values – eg. 100s of millions of user IDs in some database – and you want to have a very fast way of checking whether some value might be in that set, without having to query the database. Naturally this assumes that you’ve prepopulated a bloom filter with whatever values you need to be checking.
    
    If the result of the bloom filter query is “nope”, you know that the value’s definitely not in the set, but if the result is “maybe” then you can go ahead and double-check by querying the database. This means that the vast majority of checks don’t have to hit that slow DB at all, and even though you’ll get some false positives this’ll still be much much much faster than having to go through that DB every time.
    
    source
    -> View More Comments
Audalin@lemmy.world ⁨8⁩ ⁨months⁩ ago
There’s a recent algorithm using somewhat similar ideas for approximate counting of unique objects in a stream with constant memory:

quantamagazine.org/computer-scientists-invent-an-…

source
- Mubelotix@jlai.lu ⁨8⁩ ⁨months⁩ ago
  I think I like hash-based probabilistic counting better, but this is interesting
  
  source