Comment

Comment on NVIDIA CEO Huang urges faster AI development—to make it safer

just_another_person@lemmy.world ⁨1⁩ ⁨year⁩ ago

TPU units are specific to individual model frameworks, and engineers avoid using them for that reason. The most successful adoptions for them so far are vendor locked-in NN Models a la Amazon (Trainium), and Google (Coral), and neither of them has wide adoption since they have limited scopes. The GPU game being flexible in this arena is exactly why companies like OpenAI are struggling to justify the costs in using them over TPUs: it’s easy to run up front, but the cost is insane, and TPU is even more expensive in most cases. It’s also inflexible should you need to do something like multi-model inference (detection+evaluation+result…etc).

As I said, ASICs are single purpose, so you’re stuck running a limited model engine (Tensorflow) and instruction set. They also take a lot of engineering effort to design, so unless you’re going all-in on a specific engine and thinking you’re going to be good for years, it’s short sighted to do so. If you read up, you’ll see the most commonly deployed edge boards in the world are…Jetsons.

Enter FPGAs.

FPGAs have speedup improvements for certain things like transcoding and inference in the 2x-5x range for specific workloads, and much higher for ML purposes and in-memory datasets (think Apache Ignite+Arrow workloads), and at a massive reduction in power and cooling, so obviously very attractive for datacenters to put into production. The newer slew of chips out are even reprogrammable “on the fly”, meaning a simple context switch and flash can take milliseconds, and multi-purpose workloads can exist in a single application, where this was problematic before.

So unless you’ve got some articles about the most prescient AI companies currently using GPUs and moving to ASIC, the field is wide open for FPGA, and the datacenter adoption of such says it’s the path forward unless Nvidia starts kicking out more efficient devices.

source

Sort:hotnew top

drdabbles@lemmy.world ⁨1⁩ ⁨year⁩ ago
Now ask open AI to type for you what the draw backs of FPGA is. Also the newest slew of chips is using partially charged NAND gates instead of FPGA.

Almost all ASIC being used right now is implementing the basic math functions, activations, etc. and the higher level work is happening in more generalized silicon. You can not get the transistor densities necessary for modern accelerator work in FPGA.

source
- just_another_person@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Friend, I do this for a living, and I have no idea why you’re even bringing gating into the equation, because it doesn’t even matter.
  
  I’m assuming you’re a big crypto fan, because that’s about all I could say of ASIC in an HPC type of environment to be good for. Companies who pay the insane amounts of money for “AI” right now want a CHEAP solution, and ASIC is the most short-term, e-wastey, inflexible solve to that problem. When you get a job in the industry and understand the different vectors, let’s talk. Otherwise, you’re just spouting junk.
  
  source
  - drdabbles@lemmy.world ⁨1⁩ ⁨year⁩ ago
    
    I’m assuming you’re a big crypto fan
    
    Swing and a miss.
    
    because that’s about all I could say of ASIC in an HPC type of environment to be good for
    
    Really? Gee, I think switching fabrics might have a thing to tell you. For someone that does this for a living, to not know the extremely common places that ASICs are used is a bit of a shock.
    
    want a CHEAP solution
    
    Yeah, I already covered that in my initial comment, thanks for repeating my idea back to me.
    
    and ASIC is the most short-term
    
    Literally being atabled to the Intel tiles in Sapphire Rapids and beyond. Used in every switch, network card, and millions of other devices. Every accelerator you can list is an ASIC. Shit, I’ve got a Xilinx Alveo 30 in my basement at home. But yeah, because you can get an FPGA instance in AWS, you think you know that ASICs aren’t used. lmao
    
    e-wastey
    
    I’ve got bad news for you about ML as a whole.
    
    inflexible
    
    Sometimes the flexibility of a device’s application isn’t the device itself, but how it’s used. Again, if I can do thousands, tens of thousands, or hundreds of thousands of integer operations in a tenth of the power, and a tenth of the clock cycles, then load those results into a segment of activation functions that can do the same, and all I have to do is move this data with HBM and perhaps add some cheap ARM cores, bridge all of this into a single SoC product, and sell them on the open market, well then I’ve created every single modern ARM product that has ML acceleration. And also nvidia’s latest products.
    
    Woops.
    
    When you get a job in the industry
    
    I’ve been a hardware engineer for longer than you’ve been alive, most likely. I built my first FPGA product in the 90s. I strongly suspect you just found this hammer and don’t actually know what the market as a whole entails, let alone the long LONG history of all of these things.
    
    Do look up ASICs in switching, BTW. You might learn something.
    
    source
    ZahzenEclipse@kbin.social ⁨1⁩ ⁨year⁩ ago
    You're suck a dick for no reason. It definitely bolsters your claims your an old school tech guy lol
    
    source
    -> View More Comments
    just_another_person@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Let’s just shut this down right now. If you built FPGAs ever, it was in college in the 90s, at an awful program of a US university that trained you in SQL on the side and had zero idea of how hardware works. I’m sorry for that.
    
    The world has changed since 30 years ago, and the future of integer operations is in reprogrammable chips. All the benefit of a fab chip, and none of the downside in a cloud environment.
    
    The very idea that you think all these companies are looking to design and build their own single purpose chips for things like inference shows you have zero idea of where the industry is headed.
    
    You’re only describing how ASIC is used in switches, cool. That’s what it’s meant for. That’s not how general use computing works in the world anymore, buddy. It’s never going to be a co-proc in a laptop that can load models and do general inference, or be a useful function for localized NN. It’s simply for the single purpose uses as you said.
    
    source
    -> View More Comments