Alt text:
Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they’re the right one you can never be sure.
Submitted 8 months ago by randomaccount43543@lemmy.world to xkcd@lemmy.world
https://imgs.xkcd.com/comics/bloom_filter_2x.png
Alt text:
Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they’re the right one you can never be sure.
For anyone interested in learning more about bloom filters, this is a technical but extremely accessible and easy to follow introduction to them, including some excellent interactive visualizations: https://samwho.dev/bloom-filters/
This was a great read, thanks for sharing!
Is’nt bloom filter a shader that makes the picture look hazy and bright?
That’s the bloom shader effect: en.wikipedia.org/wiki/Bloom_(shader_effect)
But yeah, some people might refer to that as “bloom filter”, although it’s not what’s meant here.
Thanks
A bloom filter is a data structure that is most useful in creating a spell check algorithm
That’s definitely not what they’re most useful for. I mean, you probably can use a bloom filter for implementing spell check, but saying that’s where they’re most useful severely misses the point of probabilistic set membership queries.
Bloom filters and their relatives are great when you have a huge set of values – eg. 100s of millions of user IDs in some database – and you want to have a very fast way of checking whether some value might be in that set, without having to query the database. Naturally this assumes that you’ve prepopulated a bloom filter with whatever values you need to be checking.
If the result of the bloom filter query is “nope”, you know that the value’s definitely not in the set, but if the result is “maybe” then you can go ahead and double-check by querying the database. This means that the vast majority of checks don’t have to hit that slow DB at all, and even though you’ll get some false positives this’ll still be much much much faster than having to go through that DB every time.
There’s a recent algorithm using somewhat similar ideas for approximate counting of unique objects in a stream with constant memory:
I think I like hash-based probabilistic counting better, but this is interesting
hydroptic@sopuli.xyz 8 months ago
Do you want to learn about probabilistic data structures?