Efficient Neural Networks
Update Oct 2024
UC Berkeley has a AI systems, taught in various versions by Ion Stoica, Joseph Gonzalez, Matei Zaharia (what a cast 😲). It’s basically just a paper reading course that is updated each iteration. I highly recommend starting there - they do a much better job than me.
Disclaimer
There are better resources than this if you are trying to learn more about efficient neural networks.
- https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey (more than just llms)
- https://mlsys.stanford.edu/cs528/
- https://pll.harvard.edu/course/fundamentals-tinyml and https://tinyml.seas.harvard.edu/
- Matei Zaharia’s AI-Systems Course
- https://github.com/mosharaf/eecs598/tree/w24-genai and https://github.com/mosharaf/eecs598/tree/w21-ai
- https://sites.google.com/view/efficientml/home?authuser=0
- https://hanlab.mit.edu/courses/2023-fall-65940
- https://dlsyscourse.org/
- https://www.fast.ai/ (meh, but has some okay content)
Papers
The idea is to log some classical papers and some papers that are newer for each category.
I aim to keep updating this list as I read more.
- Quantization and Dynamic Quantization methods
- BinaryConnect: Training Deep Neural Networks with binary weights during propagations and Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution
- DNQ: Dynamic Network Quantization
- CPT: Efficient Deep Neural Network Training via Cyclic Precision
- Pruning of neurons and bits
- Knowledge Distillation
- Federated Learning
- Federated Learning: Strategies for Improving Communication Efficiency and Communication-Efficient Learning of Deep Networks from Decentralized Data
- Decentralized Federated Averaging
- A Field Guide to Federated Optimization
- Fed-ensemble: Improving Generalization through Model Ensembling in Federated Learning
- Auxo: Efficient Federated Learning via Scalable Client Clustering
- Matrix Operation Optimization (Algorithmic and Hardware)
- Improving the speed of neural networks on CPUs
- Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models and Monarch: Expressive Structured Matrices for Efficient and Accurate Training
- Implementing block-sparse matrix multiplication kernels using Triton
- Sputnik (not a paper but super cool library to reimplement)
- Efficient Block Approximate Matrix Multiplication and Fixed-sparsity matrix approximation from matrix-vector products
- Model Compression