Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

A young computer scientist and two colleagues show that searches within data structures called hash tables can be much faster than previously deemed possible.

⁨392⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨Cat@ponder.cat⁩ to ⁨technology@lemmy.world⁩

https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/

source

Comments

Sort:hotnewtop
  • Valmond@lemmy.world ⁨2⁩ ⁨months⁩ ago

    The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected — even to the authors themselves.

    WTF?

    I mean first of all, the “article” is just crap IMO, like only hinting att “anazing” things. But also containing basic errors like the above. It’s like they don’t understand complexity, a constant average time on what? A specific size of a hash table? Or do they mean it scales linearly with its size? Just go through the whole table and voila you have constant time already.

    So if they did find something, it’s not well explained in the article IMO. Shame, cause those kind of things are interesting, IMO.

    source
    • blx@lemmy.zip ⁨2⁩ ⁨months⁩ ago

      “Constant average query time” is not that hard to understand. It means that sometimes access time is e.g. linear, and sometimes you get your content before executing the code. With a hash table large enough and full enough, this can be used to fetch content seconds, minutes, days, potentially years before the program even exists. That’s one hell of a breakthrough.

      source
      • ripcord@lemmy.world ⁨2⁩ ⁨months⁩ ago

        …before the program even exists…?

        source
    • RandomVideos@programming.dev ⁨2⁩ ⁨months⁩ ago

      They managed to get the time to O(1)

      source
      • deegeese@sopuli.xyz ⁨2⁩ ⁨months⁩ ago

        The article is discussing how to reduce the constant time factor which depends on the filling fraction, which is a speed-memory tradeoff when creating the hash table.

        The innovation described allows for the use of fuller tables which are resized less frequently, or faster insertion/retrieval for the existing filling fraction.

        source
  • meowmeowbeanz@sh.itjust.works ⁨2⁩ ⁨months⁩ ago

    Hash tables. The backbone of computing, optimized to death by generations of neckbeards convinced they’d squeezed out every drop of efficiency. Then some undergrad casually strolls in, ignores four decades of academic dogma, and yeets Yao’s conjecture into the sun. Turns out you can make insertion times collapse from (O(x)) to (O((\log x)^2))—if you’re naive enough to not know the “rules.”

    The real kicker? Non-greedy tables now achieve constant average query times, rendering decades of “optimal” proofs obsolete. Academia’s response? A mix of awe and quiet despair. This is why innovation thrives outside the echo chamber of tenured gatekeepers regurgitating theorems like stale propaganda.

    But let’s not pretend this changes anything practical tomorrow. It’s a beautiful math flex—a reminder that theoretical CS isn’t solved, just trapped in peer-reviewed groupthink. Forty years to disprove a conjecture. How many more sacred cows are grazing untouched?

    source
    • Presently42@lemmy.ca ⁨2⁩ ⁨months⁩ ago

      This is genuinely a beautifully written comment. I’d expect an author with mathematical or physical background to have written it, like Asimov or Pynchon. Bravo!

      source
      • meowmeowbeanz@sh.itjust.works ⁨2⁩ ⁨months⁩ ago

        Thanks for the compliment! For context, I do have an academic background, though no degree. My knowledge in computer science is self-taught, but I’ve built a solid foundation in physics, math (though it’s always humbling), philosophy, and literature. It’s less about formal credentials and more about chasing intellectual rabbit holes.

        Maybe that’s why I’m so allergic to gatekeeping nonsense. Academia’s obsession with rigid frameworks feels like a straitjacket for creativity. The beauty of CS—and science as a whole—is that it thrives on breaking rules, not worshipping them.

        As for Pynchon: he’s a postmodern literary juggernaut. His works are dense, chaotic, and packed with esoteric references—math, history, conspiracy theories. Comparing my comment to his writing? That’s high praise for anyone who thrives in the chaos of ideas.

        Anyway, the real credit goes to those audacious enough to challenge orthodoxy. They’re the ones who remind us that progress isn’t born from conformity but from questioning everything we think we know.

        source
      • steeznson@lemmy.world ⁨2⁩ ⁨months⁩ ago

        Not weird enough to be Pynchon but it was a good comment nonetheless

        source
        • -> View More Comments
    • UndercoverUlrikHD@programming.dev ⁨2⁩ ⁨months⁩ ago

      A lot of famous scientists make their breakthroughs at fairly young age, before their mind gets locked into a certain paradigm. Look up Thomas Kuhn’s The Structure of Scientific Revolutions, which forwards some interesting ideas on how science is conducted.

      source
      • meowmeowbeanz@sh.itjust.works ⁨2⁩ ⁨months⁩ ago

        The irony of citing Kuhn here isn’t lost on me. His Structure of Scientific Revolutions is practically a manual for how entrenched paradigms suffocate innovation. The young, unjaded minds he describes are precisely the ones who can dismantle decades of “consensus” with a single insight. But let’s not romanticize this—most breakthroughs don’t come from genius alone but from ignorance of the so-called rules.

        That said, the real tragedy is how academia weaponizes peer review to enforce conformity. Paradigm shifts like these aren’t celebrated; they’re tolerated begrudgingly, often posthumously. Yao’s conjecture stood for 40 years not because it was unassailable but because questioning it was career suicide. Imagine how many more revolutions we’d see if the system didn’t punish dissent.

        source
  • tyler@programming.dev ⁨2⁩ ⁨months⁩ ago

    This is incredible, but why does the article end by stating that this might not have any immediate applications? Shouldn’t this immediately result in more efficient hash tables in everyday programming languages?

    source
    • OhNoMoreLemmy@lemmy.ml ⁨2⁩ ⁨months⁩ ago

      Hash trees are super efficient when they’re not nearly full. So the standard trick is just to resize them when they’re too close to capacity.

      The new approach is probably only going to be useful in highly memory constrained applications, where resizing isn’t an option.

      source
      • deegeese@sopuli.xyz ⁨2⁩ ⁨months⁩ ago

        Hash tables are used in literally everything and they always need to minimize resizing because it’s a very expensive operation.

        I suspect this will silently trickle into lots of things once it gets picked up by standard Python and JavaScript platforms, but that will take years.

        source
      • Sekoia@lemmy.blahaj.zone ⁨2⁩ ⁨months⁩ ago

        So… databases? Especially in data centers? Still a nice boost in that case

        source
    • source_of_truth@lemmy.world ⁨2⁩ ⁨months⁩ ago

      I don’t think the speed of hash tables is a problem in most applications.

      source
      • zkfcfbzr@lemmy.world ⁨2⁩ ⁨months⁩ ago

        Hash tables are often used behind the scenes. dicts and sets in python both utilize hash tables internally, for example.

        source
        • -> View More Comments
      • lime@feddit.nu ⁨2⁩ ⁨months⁩ ago

        anything that deserializes arbitrary json will put it into a hash table, right? it would definitely speed up the web.

        source
        • -> View More Comments
    • rockSlayer@lemmy.world ⁨2⁩ ⁨months⁩ ago

      Infrastructural APIs are much slower to change, and in a lot of cases the use of those APIs are dependent on a specific version. The change will definitely occur over time as the practical limitations are discovered

      source
  • BeigeAgenda@lemmy.ca ⁨2⁩ ⁨months⁩ ago

    Here’s a link to the paper “Tiny Pointers” arxiv.org/pdf/2111.12800 those editors at Quantamagazine writes their summary in a strange fashion, for instance using x in stead of n which is normally used in computer science when talking about big O notation.

    source
    • deegeese@sopuli.xyz ⁨2⁩ ⁨months⁩ ago

      In the article, x is not the size of the hash table, it is the inverse of the table’s filling fraction. A 1000-element table that is 90% full has x=10, N=1000.

      Since they’re not discussing scaling of data sizes, would be confusing to use O(N) notation or people would make that assumption.

      source
    • Lojcs@lemm.ee ⁨2⁩ ⁨months⁩ ago

      This is the paper the article is about: arxiv.org/pdf/2501.02305

      source
  • gandalf_der_12te@discuss.tchncs.de ⁨2⁩ ⁨months⁩ ago

    Why are these articles even published? They are so incredibly non-technical that it’s just impossible to get the point as a reader.

    Why even link these articles here? Throw that pop-science in the trash bin, where it belongs.

    source
  • Mubelotix@jlai.lu ⁨2⁩ ⁨months⁩ ago

    Very sus. This article is not technically counvincing at all. Probably a few students who dream of inventing the next big thing. Hash tables complexity was mathematically proven to be generally optimal. If you think you optimized it it’s because your tests don’t reflect the common real world

    source
    • Capsicones@lemmy.blahaj.zone ⁨2⁩ ⁨months⁩ ago

      The paper was published by IEEE and with professors as co-authors. Only the first author is a student. And I wouldn’t dismiss it out of hand like that because of a magazine article. Students come up with breakthroughs all the time. The paper itself says it disproves Yao’s conjecture. I personally plan to implement and benchmark this because the results seem so good. It could be another fibonacci heap situation, but maybe not. Hash tables are so widely used, that it might even be worthwhile to make special hardware to use this on servers, if our current computer architecture is only thing that holds back the performance.

      source
  • Harbinger01173430@lemmy.world ⁨2⁩ ⁨months⁩ ago

    I don’t care about the techno able those group of C’s students did.

    When will I be able to enjoy faster dicts in my python code?

    source