Comment on China solves 'century-old problem' with new analog chip that is 1,000 times faster than high-end Nvidia GPUs

Melobol@lemmy.ml ⁨2⁩ ⁨days⁩ ago

I asked chatgtp to explain the paper - here is what it said - so you don’t have to:

Many computing tasks (especially in things like signal processing, wireless communications, scientific computing, and AI) boil down to solving equations like A x = b (a matrix times a vector equals another vector). Nature +1

Traditionally these are solved in digital computers (with floating-point arithmetic) and for large problems this can be slow and energy-intensive. Nature +1

An alternative is analogue computing where you do operations more directly in hardware (for example using resistive memory devices) rather than converting everything to the digital domain. These can potentially be much faster and more energy-efficient. Nature +1

But analogue computing has historically had a big problem: precision (how accurate the answers are) and scalability (how large a problem you can handle). This paper addresses those issues.

What they did

They used resistive random-access memory (RRAM) chips — specifically memory devices where each cell’s conductance (i.e., how easily it lets current through) acts like a number in a matrix. Nature +1

They built an analogue system that does two key steps:

A low-precision analogue matrix inversion (LP-INV) step.

A high-precision analogue matrix-vector multiplication (HP-MVM) step, using bit-slicing (splitting the number into parts) to boost precision. Nature +1

They also developed a method called “BlockAMC” (Block Analog Matrix Computing) — this partitions a large matrix into blocks so that the analogue method can be scaled to larger sizes. Nature

They built the hardware: RRAM chips in a foundry (40-nm CMOS process) with a 1 transistor-1 resistor (1T1R) configuration, supporting 3-bit multilevel conductance (so eight states). Nature

They experimentally solved a 16×16 real‐valued matrix inversion with ~24-bit fixed-point precision (which is comparable to 32-bit floating point) using their analogue system. Nature

They also demonstrated a real‐world application: detection in a “massive MIMO” wireless-communication system (16×4 and 128×8 antenna setups) using high-order modulation (256-QAM). Their analogue solver matched the performance of a digital processor in two/three cycles. Nature

They measured the speed (the analogue inversion circuit converged in ~120 ns for 4×4) and estimated that their approach could offer ~1000× higher throughput and ~100× better energy efficiency than state-of-the-art digital processors for the same precision. Nature

Why it matters

If you can solve matrix equations much faster and with much less energy, that opens up possibilities for e.g. base stations in wireless networks (where there are many antennas), real-time signal processing, AI training, scientific simulation, etc.

Using analogue hardware like RRAM arrays helps overcome the “von Neumann bottleneck” (the slowdown/energy cost caused by moving data between memory and processor) because the memory is the compute. Nature

The fact that they reached high precision (comparable to digital float32) is important because one of the big criticisms of analogue computing has been that it’s too “noisy/low precision” for serious tasks. This shows you can do it.

The scalability (through their BlockAMC approach) means this isn’t just a toy demonstration of a 2×2; they show up to 16×16 and hint at larger.

Important caveats & challenges

Their currently demonstrated arrays for LP-INV are small (8×8) and scaling to much larger arrays still has engineering challenges (device reliability, wiring resistance, noise, etc.). Nature

The BlockAMC algorithm introduces some overhead when you scale up. The complexity isn’t strictly constant for arbitrary large matrix sizes; there is some cost. Nature

While they show big energy/throughput gains in estimates, real‐world integration (with all peripheries: DACs, ADCs, control logic) will still need refinement.

Applications: They show wireless signal detection (MIMO) which is great, but other domains (scientific computing, general AI) may have different requirements (matrix size, sparsity, conditioning).

The analogue computing world still has to deal with variability, drift, calibration, faults in memory cells, etc. The paper mentions some of these (e.g., stuck-at faults) and how to mitigate them. Nature

In everyday terms

Imagine you have a huge table of numbers (a matrix) and you need to solve for a vector x so that when the matrix multiplies x you get some result b. This is like solving a system of linear equations. Normally, a computer does this step‐by‐step in digital form and it takes time and energy (especially for large tables). What these researchers did is build a physical piece of hardware where the table of numbers is literally encoded in a memory chip (via conductances) and the solving is done via analogue electrical flows. Because electricity flows in parallel and instantly (relative to digital clocked logic), it can be much faster and more efficient. They also built in ways to ensure the answers are very accurate (not just approximate) and to scale up the method to realistic sizes. In short: they brought back some of the old “analogue computing” idea, but using modern memory chips, and showed it can match digital precision while running faster / lower-power.

source
Sort:hotnewtop