Paper Recommendation 2022
Tips:
- Check [Project]/[Blog] first for a brief understanding of the paper, and watch their introduction video if available.
- Only about 2 papers are chosen for each area. If you find a paper interesting, please use [Connected Papers] to find related works!
Backbone
-
[2015] Deep Residual Learning for Image Recognition [Paper]
Most known paper and most used backbone in CV.
-
[2020] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [Paper]
Transformer is mainly used in NLP tasks, but vision transformer has been a recent trend.
Further readings:
Perception (Classification, Segmentation, Detection)
2D
- [2014] Fully Convolutional Networks for Semantic Segmentation [Paper]
- [2015] You Only Look Once: Unified, Real-Time Object Detection [Paper]
3D
Different from 2D images, 3D data have many representations (voxel volumes, point clouds, meshes, and implicit functions). Since dense 3D convolutions are generally limited by GPU memory, an inevitable topic in 3D perception is to leverage the sparsity of 3D data.
-
[2016] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [Paper]
Point Cloud is an efficient 3D representation widely used in 3D tasks, and this paper first proposed an effective way of processing point clouds.
Further readings:
-
[2019] Point-Voxel CNN for Efficient 3D Deep Learning [Paper]
Sparse 3D convolution is another popular way for handling point clouds.
Further readings:
Generation
- [2014] Generative Adversarial Networks [Paper]
-
[2018] A Style-Based Generator Architecture for Generative Adversarial Networks [Paper] [code]
StyleGAN series are famous for its realistic image generation ability.
Further readings:
-
[2021] DALL-E: Zero-Shot Text-to-Image Generation [Blog] [Paper] [Online dalle-mini]
Further readings: DALL-E 2, Imagen, Parti...
3D Reconstruction
-
A general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline still widely used today.
-
[2019] DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation [Paper]
Implicit 3D representation by learning an MLP to predict signed distance functions (SDF).
-
[2020] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [Project] [Paper]
Photo-realistic 3D scene novel view synthesis from only RGB video or images.
Further reading: