Comment

NotANumber@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago

To be more specific this is an MLP (Multi-Layer Perceptron). Neural Network is a catch all term that includes other things such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks, Diffusion models and of course Transformers.

What you are arguing online is some variant of a Generative Pre-Trained Transformer, which do have MLP or MoE layers but that’s only one part of what they are. They also have multi-headed attention mechanisms and embedding + unembedding vectors.

I know all this and wouldn’t call myself a machine learning expert. I just use the things. Though I did once train a simple MLP like the one in the picture. I think it’s quite bad calling yourself a machine learning expert and not knowing all of this stuff and more.

source

Sort:hotnew top

Aceticon@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
Right, if I understood it correctly, what you see as “IF” is the multi-headed attention stuff.

However the Genetic Algorithms stuff is something completelly different from Neural Networks: it’s basically an Evolutionary method of finding the best “formula” to process inputs to generate the desired output by assessing different variants of the “formula” with the training data, picking the best ones and then generating a new generation of variants from the best ones and assessing those and keep doing it until the error rate is below a certain value.

As far as I can tell Genetic Algorithms can’t really scale to the size of something like an LLM (the training requirements would be even more insane) though that technique could be used to train part of a Neural Network or to create functional blocks that worked together with NNs.

And yeah, MLPs trained via simple Backpropagation are exactly what I’m familiar with, having learned that stuff 3 decades ago as part of my degree when that was the pinnacle of NN technology and model architectures were still stupidly simple. That’s why I would be shocked if a so-called ML “expert” didn’t recognize that, as it’s the most basic form of Neural Network there is and it’s being doing the rounds for ages (that stuff was literally used to in automated postal code recognition in letters for automated mail sorting back in the 90s).

I would expect that for people doing ML a simple MLP is as recognizable as binary is for programmers - sure people don’t work at that level anymore, but at they should at least recognize it.

source
- NotANumber@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
  Yes genetic algorithms are something different. Though they are used sometimes in training or arcitecting NNs, but not at the scale of modern LLMs.
  
  Fyi you can have all or nothing outputs from a perceptron or other network. It all depends on the activation function. Most LLMs don’t use that kind of activation function, but it is possible. Have you heard of bitnet? They use only one of three states for the neuron output in an LLM. It’s interesting stuff.
  
  source
  - Aceticon@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
    I haven’t heard of bitnet.
    
    Then again my knowledge of Machine Learning is 3 decades old (so, even before Recurrent Neural Networks were invented, much less Attention) and then some more recent reading up on LLMs from an implementation point of view to understand at least a bit how they work (it’s funny how so much of the modern stuff is still anchored in 3 decades old concepts).
    
    source
- NotANumber@lemmy.dbzer0.com ⁨5⁩ ⁨weeks⁩ ago
  Reading back over this I think you have me confused with another commentor. I don’t mention anything about IF in the commend you are replying to. Someone else did though.
  
  source
Holytimes@sh.itjust.works ⁨5⁩ ⁨weeks⁩ ago
I can’t help but read MLP as my little pony and now I’m picturing you training a series of marshmallow horses to pretend to be human for the profits of our corporate overlords on social media.

source