Yeah, you’re right that it is different from simply stealing content. However the LLMs still use protected material as input and it seems that at least parts of those works can be uniquely identified in the output. That can be considered problematic, even if the data is deconstructed into embeddings inbetween input and output.