Think might be this:
What is distillation?
Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.
Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.
they are building a plant in Arizona, but i doubt it’ll ever be as good as Taiwan can do, not just because Taiwan has the skills but if Taiwan doesn’t have this then what’s the point of protecting it? It’s sort of a way to say, if you want to to continue to access the best chips in the world you should protect us from China