That’s a fundamental misunderstanding of how diffusion models work. These models extract concepts and can effortlessly combine them to new images.
If it learns woman + crown = queen
and queen - woman + man = king
it is able to combine any such concept together
As Stability has noted. any model that has the concept of naked and the concept of child in it can be used like this. They tried to remove naked for Stable Diffusion 2 and nobody used it.
Nobody trained these models on CSAM and the problem is a dilemma in the same way a knife is a dilemma. We all know a malicious person can use a knife for murder, including of children Yet society has decided that knives sufficient other uses that we still allow their sale pretty much everywhere.
No they’ll train on laundered model output. Like every llama.
The investment thesis they the data is valuable is bonkers. It’s not. Not only has it been exfiltrated and can be laundered in a dozen ways, Reddit also won’t be able to effectively assert copyright.
Look at Facebook. It’s full of reposted quora content now with AI images and AI laundered text.
Reddit is dead