Lecture - Theories of Deep Learning MT25, XV, A few things we missed and a summary
- Dropout
- Skip connections
- Tokenisation
- Tokens are probably different in a chat context vs a coding context
- How sparse can you make your nets before losing loads of accuracy?
- Major omissions
- How well does depth improve generalisation error?