Lecture - Theories of Deep Learning MT25, XV, A few things we missed and a summary
Dropout
Skip connections
Tokenisation
- Tokens are probably different in a chat context vs a coding context
How sparse can you make your nets before losing loads of accuracy?
Major omissions
- How well does depth improve generalisation error?