In recent years, Deep Learning has become a dominant Machine Learning tool for a wide variety of domains. One of its biggest successes has been in Computer Vision where the performance in problems such object and action recognition has been improved dramatically. In this course, we will be reading up on various Computer Vision problems, the state-of-the-art techniques involving different neural architectures and brainstorming about promising new directions.
Please sign up here in the beginning of class.
This class is a graduate seminar course in computer vision. The class will cover a diverse set of topics in Computer Vision and various Neural Network architectures. It will be an interactive course where we will discuss interesting topics on demand and latest research buzz. The goal of the class is to learn about different domains of vision, understand, identify and analyze the main challenges, what works and what doesn't, as well as to identify interesting new directions for future research.
Prerequisites: Courses in computer vision and/or machine learning (e.g., CSC320, CSC420, CSC411) are highly recommended (otherwise you will need some additional reading), and basic programming skills are required for projects.
back to top
This class uses piazza. On this webpage, we will post announcements and assignments. The students will also be able to postquestions and discussions in a forum style manner, either to their instructors or to their peers.
back to top
We will have an invited speaker for this course:
as well as several invited lectures / tutorials:
Each student will need to write two paper reviews each week, present once or twice in class (depending on enrollment), participate in class discussions, and complete a project (done individually or in pairs).
The final grade will consist of the following | |
---|---|
Participation (attendance, participation in discussions, reviews) |
15% |
Presentation (presentation of papers in class) |
25% |
Project (proposal, final report) |
60% |
back to top
The first class will present a short overview of neural network architectures, however, the details will be covered when reading on particular topics. Readings will touch on a diverse set of topics in Computer Vision. The course will be interactive -- we will add interesting topics on demand and latest research buzz.
back to top
Date | Topic | Reading / Material | Speaker | Slides | ||
---|---|---|---|---|---|---|
Jan 12 | Admin & Introduction(s) | Sanja Fidler | admin | |||
Convolutional Neural Networks | ||||||
Jan 19 | Convolutional Neural Nets(tutorial) | Resources: Stanford's cs231 class, VGG's Practical CNNTutorial Code: CNN Tutorial for TensorFlow, Tutorial for caffe, CNNTutorial for Theano |
Yukun Zhu (invited) |
[pdf] [code] |
||
Image Segmentation | Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs [PDF] [code] L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L Yuille |
Shenlong Wang | [pdf] [code] |
|||
Jan 26 | Very Deep Networks | Highway Networks [PDF] [code] Rupesh Kumar Srivastava, Klaus Greff, Jurgen Schmidhuber Deep Residual Learning for Image Recognition [PDF] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun |
Renjie Liao (invited) |
[pdf] | ||
Object Detection | Rich feature hierarchies for accurate object detection and semantic segmentation [PDF] [code] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [PDF] [code (Matlab)] [code (Python)] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun |
Kaustav Kundu | [pdf] | |||
Feb 2 | Stereo Siamese Networks |
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches [PDF] [code] Jure Žbontar, Yann LeCun Learning to Compare Image Patches via Convolutional Neural Networks [PDF] [code] Sergey Zagoruyko, Nikos Komodakis |
Wenjie Luo | [pdf] | ||
Depth from Single Image | Designing Deep Networks for Surface Normal Estimation [PDF] Xiaolong Wang, David Fouhey, Abhinav Gupta |
Mian Wei | [pptx] [pdf] | |||
Feb 9 | Image Generation | Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [PDF] Alec Radford, Luke Metz, Soumith Chintala Generating Images from Captions with Attention [PDF] Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov |
Elman Mansimov (invited) |
[pdf] | ||
Domain Adaptation, Zero-shot Learning | Simultaneous Deep Transfer Across Domains and Tasks [PDF] Eric Tzeng, Judy Hoffman, Trevor Darrell Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions [PDF] Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov |
Lluis Castrejon | [pdf] | |||
Recurrent Neural Networks | ||||||
Feb 23 | RNNs and Neural Language Models | Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models [PDF] [code] Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel Skip-Thought Vectors [PDF] [code] Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler |
Jamie Kiros (invited) |
|||
Mar 1 | Modeling Words | Efficient Estimation of Word Representations in Vector Space [PDF] [code] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean |
Eleni Triantafillou |
[pdf] | ||
Describing Videos | Sequence to Sequence -- Video to Text [PDF] Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko |
Erin Grant |
[pdf] | |||
Image-based QA | Ask Your Neurons: A Neural-based Approach to Answering Questions about Images [PDF] Mateusz Malinowski, Marcus Rohrbach, Mario Fritz |
Yunpeng Li |
[pdf] | |||
Mar 8 | Variational Autoencoders | Auto-Encoding Variational Bayes [PDF] Diederik P Kingma, Max Welling Tutorial: Bayesian Reasoning and Deep Learning [PDF] Shakir Mohamed |
Yura Burda (invited) |
[pdf] | ||
Text-based QA | End-To-End Memory Networks [PDF] Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus |
Marina Samuel |
[pdf] | |||
Neural Reasoning | Recursive Neural Networks Can Learn Logical Semantics [PDF] Samuel R. Bowman, Christopher Potts, Christopher D. Manning |
Rodrigo Toro Icarte |
[pdf] | |||
Mar 15 | Neural Programming | Neural GPUs Learn Algorithms [PDF] Lukasz Kaiser, Ilya Sutskever Neural Programmer-Interpreters [PDF] Scott Reed, Nando de Freitas Neural Programmer: Inducing Latent Programs with Gradient Descent [PDF] Arvind Neelakantan, Quoc V. Le, Ilya Sutskever |
Jimmy Ba (invited) |
|||
Conversation Models | A Neural Conversational Model [PDF] Oriol Vinyals, Quoc Le |
Caner Berkay Antmen |
[pdf] | |||
Sentiment Analysis | Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank [PDF] Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts |
Zhicong Lu |
[pdf] | |||
Mar 22 | Video Representations | Unsupervised Learning of Video Representations using LSTMs [PDF] Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov |
Kamyar Ghasemipour |
[pdf] | ||
CNN Visualization | Explaining and Harnessing Adversarial Examples [PDF] Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy |
Neill Patterson |
[pdf] | |||
Mar 29 | Direction Following (Robotics) | Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences [PDF] Hongyuan Mei, Mohit Bansal, Matthew R. Walter |
Alan Yusheng Wu |
[pdf] | ||
Visual Attention | Recurrent Models of Visual Attention [PDF] Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu |
Matthew Shepherd |
[pdf] | |||
Music | A First Look at Music Composition using LSTM Recurrent Neural Networks [PDF] Douglas Eck, Jurgen Schmidhuber Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network [PDF] Andrew J.R. Simpson, Gerard Roma, Mark D. Plumbley |
Charu Jaiswal |
[pdf] | |||
Music generation | Overview of music generation | Urban Jezernik (invited) |
||||
Pose and Attributes | PANDA: Pose Aligned Networks for Deep Attribute Modeling [PDF] Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev |
Sidharth Sahdev |
[pptx] | |||
Image Style | A Neural Algorithm of Artistic Style [PDF] [code] Leon A. Gatys, Alexander S. Ecker, Matthias Bethge |
Nancy Iskander |
[pdf] | |||
Apr 5 | Human gaze | Where Are They Looking? [PDF] Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba |
Abraham Escalante |
[pdf] | ||
Instance Segmentation | Monocular Object Instance Segmentation and Depth Ordering with CNNs [PDF] Ziyu Zhang, Alex Schwing, Sanja Fidler, Raquel Urtasun Instance-Level Segmentation with Deep Densely Connected MRFs [PDF] Ziyu Zhang, Sanja Fidler, Raquel Urtasun |
Min Bai |
[pdf] | |||
Scene Understanding | Attend, Infer, Repeat: Fast Scene Understanding with Generative Models [PDF] S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey E. Hinton |
Namdar Homayounfar |
[pdf] | |||
Reinforcement Learning | Playing Atari with Deep Reinforcement Learning [PDF] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller |
Jonathan Chung |
[pdf] | |||
Medical Imaging | Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning [PDF] Oren Z. Kraus, Lei Jimmy Ba, Brendan Frey |
Alex Lu |
[pptx] | |||
Humor | We Are Humor Beings: Understanding and Predicting Visual Humor [PDF] Arjun Chandrasekaran, Ashwin K Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh |
Shuai Wang |
[pdf] |
back to top