Comment on Obvious bias in AI-generated content might actually be useful (if used wisely)
j4k3@lemmy.world 1 month ago
A LLM is like a reflection of your prompt in the mirror of the training data and distortion created by the QKV alignment bias implementation and configuration. It is only a simulacrum of yourself. The underlying profile the model creates of you ultimately forms your ideal informational counterpart. It is the alignment that does much of the biasing.
In the case of the gender of doctors, it is probably premature to call it a bias in the model as opposed to a bias in the implementation of the interface. The first point of call would likely be to look into the sampling techniques used in the zero shot and embedding models. These models are processing the image and text to convert them to numbers. Then there is a ton of potential issues in the sigma/guidance/sampling algorithm and how it is constrained. I tend to favor ADM adaptive sampling. I can get away with a few general PID settings, but need to dial it in for specific imagery when I find something I like. This is the same PID tuning you might find in a precision temperature sensor and controller. There range of ways that the can be constrained will largely determine the path that is traveled through the neural layers of the model. Like if I’m using an exponential constraint for guidance, that exponential aspect is how much of the image is derived at which point. With exponential, very little of the image comes from early layers of the model, but this builds to where later layers of the neural network are where the majority of the image is resolved. The point at which this ends is largely just a setting. This timing also impacts how many layers of alignment the image is subjected to in practice. Alignment ensures our cultural norms, but is largely a form of overtraining and causes a lot of peripheral issues. For instance, the actual alignment is on the order of a few thousand parameters per layer, whereas each model layer is on the order of tens of millions of parameters.
When the noise is constrained it is basically like an audio sine wave getting attenuated. The sampling and guidance is controlling the over and undershoot of the waveform to bring it into a desired shape. These undulations are passing through the model to find a path of least resistance. Only with tensor ranks there are far more than the 4 dimensions of Cartesian space plus time. These undulations and the sampling techniques used may have a large impact on the consistency of imagery generated. This pattern is not necessarily present in the model itself, but instead can be an artifact of the technique used to sample and guide the output.
There are similar types of nuances present in the text embedding and zero shot models.
There is also some potential for issues in the randomization of noise seeds. Computers are notoriously bad at generating truly random numbers.
I’m no expert. In abstract simplification, this is my present understanding, but I’m no citable source and could easily be wrong in some aspects of this. It is however my functional understanding while using, tweaking, and modding some basic aspects of the model loader source code.
gaiussabinus@lemmy.world 1 month ago
I would like to see the performance of a FFT optimized AI. I imagine cpu performance would be amazing.