161.

How has DeepSeek improved the Transformer architecture?

epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture

This Gradient Updates issue goes over the major changes that went into DeepSeek’s most recent model.