AMA: Yoshua Bengio (self.MachineLearning)

Yoshua Bengio ( http://www.iro.umontreal.ca/~bengioy/yoshua_en/index.html ) is one of the machine learning professors who led the deep learning renaissance of 2006, along with Geoff Hinton and Yann LeCun. His research work focuses on advancing machine learning to the point that it can be used to solve artificial intelligence applications. He is one of the last deep learning professors to remain completely in academia, after several other deep learning professors have joined companies such as Google and Facebook.

Yoshua will answer your questions on Thursday from 1-2PM EST. I am one of his grad students and I am creating this thread in advance so that people who are not able to go online at that time can post questions ahead of time. I'll post a verification from my Google+ account in the comments, and Yoshua will post a verification of his username on Thursday.


頭 200 則留言 顯示所有215

[–]mdooder 41 指標 1 年 前 

Hello Prof. Bengio, What motivates you to stay in academia? What do you think about corporate research labs in terms of productivity and innovation when compared to academic labs. Does research flexibility (doing what you want, more or less) play a large role in this decision?

  • 永久連結

[–]yoshua_bengioProf. Bengio 29 指標 1 年 前 

I like academia because I can choose what to work on, I can choose to work on long-term goals, I can work for the benefit of humanity rather than for a specific company, and I can talk about my work freely. Note that to different degrees, my esteemed colleagues in large industrial labs also enjoy some of that freedom.

  • 永久連結
  • 上層留言

[–]alecradford 30 指標 1 年 前* 

Hi there! I'm an undergrad and your work combined with Hinton's is a huge inspiration to me! A bunch of questions, so feel free to answer all or none!

Hinton semi-recently offered an awesome MOOC on Coursera over NNs. The resources and lectures it provided are what allowed me and many others to build homebrew nets and really get into the field. It would be a great resource if another researcher at the forefront of the field offered their own take, do you have any plans for something like this?

As a leading professor in the field, how do you personally view the resurgence of interest in modern NN applications? Do you believe it's well deserved recognition, guilty of overhype, some mixture of the two, or something completely different! On a similar note, how do you feel about the portrayal of modern NN research in popular literature?

I'm interested in using unsupervised techniques to learn automated data augmentations/corruptions for increasing generalization performance, which I hope is a promising hybrid of supervised and unsupervised learning that's different from traditional pretraining. A lot of advances have been made using "simple" data augmentations/corruptions pioneered in your lab like gaussian noise corruption and what we now call input dropout in the context of DAEs. Preliminary results on MNIST seem successful (~0.8% permutation invariant) and I can send code if you are interested but admittedly I'm just an undergrad with no formal research experience. Do you see this as an area with potential and could you point me to any resources or papers that you are aware of - I've had a hard time finding them.

No one has a crystal ball, but what do you see as the most interesting areas of research for continuing to advance your work? The last few years has seen purely supervised techniques make a lot of headroom riding off the success of dropout, for instance.

Thank you so much for doing this AMA, it's great to have you here on /r/MachineLearning!

  • 永久連結

[–]yoshua_bengioProf. Bengio 22 指標 12 月 前 

I have no clear plan for a MOOC but I might do one eventually. In the meantime, I write a new and more complete book on deep learning (with Ian Goodfellow and Aaron Courville). Some draft chapters should come out in the next few months and feedback from the community and students would be great. Note that Hugo Larochelle (formerly a PhD with me and a post-doc with Hinton) has great videos on deep learning http://www.youtube.com/playlist?list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH (and slides on his web page).

I believe that the recent surge of interest in NNets just means that the machine learning community wasted many years not exploring them, in the 1996-2006 decade, mostly. There is also hype, especially if you consider the media. That is unfortunate and dangerous, and will be exploited especially by companies trying to make a quick buck. The danger is to see another bust when wild promises are not followed by outstanding results. Science mostly moves by small steps and we should stay humble.

I have no crystal ball but I believe that improving our ability to model joint distributions (either in an unsupervised way or conditioned on some input, either explicitly or implicitly through learning of good representations) is going to be crucial for future progress of deep learning towards AI-level machine understanding of the world around us.

Another easy prediction is that we need to and will make progress towards efficiently training much larger models. This involves improvements in the way we train model (the numerical optimization involved), as well as in ways to do it computationally more efficiently (e.g. through parallelization and other tricks that avoid doing the computation associated with all the parts of the network for every example).

You can find out more in my arxiv paper on "looking forward": http://arxiv.org/abs/1305.0445

  • 永久連結
  • 上層留言

載入更多留言 (1 回覆)

[–]Sigmoid_Freud 14 指標 1 年 前 

Traditional (deep or non-deep) Neural Networks seem somewhat limited in the sense that they cannot keep any contextual information. Each datapoint/example is viewed in isolation. Recurrent Neural Networks overcome this, but they seem to be very hard to train and have been tried in a variety of designs with apparently relatively limited success.

Do you think RNNs will become more prevalent in the future? For which applications and using what designs?

Thank you very much for taking your time to do this!

  • 永久連結

[–]yoshua_bengioProf. Bengio 16 指標 1 年 前 

Recurrent or recursive nets are really useful tools for modelling all kinds of dependency structures on variable-sized objects. We have made progress on ways to train them and it is one of the important areas of current research in the deep learning community. Examples of applications: speech recognition (especially the language part), machine translation, sentiment analysis, speech synthesis, handwriting synthesis and recognition, etc.

  • 永久連結
  • 上層留言

[–]omphalos 2 指標 1 年 前 

I'd be curious to hear his thoughts on any intersection between liquid state machines (one approach to this problem) and deep learning.

  • 永久連結
  • 上層留言

[–]yoshua_bengioProf. Bengio 11 指標 1 年 前* 

Liquid state machines and echo state networks do not learn the recurrent weights, i.e., they do not learn the representation. Instead, learning good representations is the central purpose of deep learning. In a way, the echo-state / liquid state machines are like SVMs, in the sense that we put a linear predictor on top of a fixed set of features. The features are functions of the past sequence through the smartly initialized recurrent weights, in the case of echo state networks and liquid state machines. Those features are good, but they can be even better if you learn them!

  • 永久連結
  • 上層留言

[–]yoshua_bengioProf. Bengio 6 指標 12 月 前 

See the answer I already gave there:http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfpboj8

  • 永久連結
  • 上層留言

[–]Noncomment 3 指標 12 月 前 

Did you mean recursion?

  • 永久連結
  • 上層留言

[–]omphalos 2 指標 12 月 前 

Thank you for the reply. Yes I understand the analogy to SVMs. Honestly I was wondering about something more along the lines of using the liquid state machine's untrained "chaotic" states (which encode temporal information) as feature vectors that a deep network can sit on top of, and thereby construct representations of temporal patterns.

  • 永久連結
  • 上層留言

[–]rpascanu 3 指標 12 月 前 

I would add that ESNs or LSMs can provide insights in why certain things don't work or work for RNNs. So having a good grasp of them could definitely be useful for deep learning. An example is Ilya's work on initialization (jmlr.org/proceedings/papers/v28/sutskever13.pdf‎), where they show that an initialization based on the one proposed by Herbert Jaeger for ESNs is very useful for RNNs as well.

They also offer quite a strong baseline most of the time.

  • 永久連結
  • 上層留言

[–]freieschaf 2 指標 1 年 前 

Take a look at Schmidhuber's page on RNNs. There is quite a lot of info on them, and especially on LSTMNN, an architecture of RNN designed precisely for tackling the issue of vanishing gradient when training RNNs and so allowing them to keep track of a longer context.

  • 永久連結
  • 上層留言

[–]PasswordIsntHAMSTER 13 指標 1 年 前 

Hi Prof. Bengio, I'm an undergrad at McGill University doing research in type theory. Thank you for doing this AMA!

Questions:

  • My field is extremely concerned with formal proofs. Is there a significant focus on proofs in machine learning too? If not, how do you make sure to maintain scientific rigor?

  • Is there research being done about the use of deep learning for program generation? My intuition is that eventually we could use type theory to specify a program and deep learning to "search " for an instantiation of the specification, but I feel like we're quite far from that.

  • Can you give me examples of exotic data structure used in ML?

  • How would I get into deep learning starting from zero? I don't know what resources to look at, though if I develop some rudiments I would LOVE to apply for a research position on your team.

  • 永久連結

[–]yoshua_bengioProf. Bengio 10 指標 12 月 前 

There is a simple way that you get scientific rigor without proof, and it's used throughout science: it's called the scientific method, and it relies and experiments and hypothesis-testing ;-) Besides, math is getting into more deep learning papers. I have been interested for some time in proving properties of deep vs shallow architectures (see papers with Delalleau, and more recently with Pascanu). With Nicolas Le Roux I worked on the approximation properties of RBMs and DBNs. I encourage you to also look at the papers by Montufar. Fancy math there.

Deep learning from 0? there is lots of material out there, some listed in deeplearning.net:

  • My 2009 paper/book (a new one is on the way!): http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf

  • Hugo Larochelle's neural networks course & youtube videos: http://www.youtube.com/playlist?list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH (slides on his webpage)

  • Practical recommendations for training deep nets: http://www.google.com/url?q=http%3A%2F%2Farxiv.org%2Fabs%2F1206.5533&sa=D&sntz=1&usg=AFQjCNFJClbJs-wyBb46aPwER1ZfOB_kng

  • A recent review: https://arxiv.org/abs/1206.5538

  • 永久連結
  • 上層留言

[–]PokerPirate 2 指標 1 年 前 

On a related note, I am doing research in probabalistic programming languages. Do you think there will ever be a "deep learning programming language" (whatever that means) that makes it easier for nonexperts to write deep learning models?

  • 永久連結
  • 上層留言

[–]ian_goodfellow[S] 5 指標 12 月 前 

I am one of Yoshua's graduate students and our lab develops a python package called Pylearn2 that makes it relatively easy for non-experts to do deep learning:

https://github.com/lisa-lab/pylearn2

You'll still need to have some idea of what the algorithms are meant to be doing, but at least you won't have to implement them yourself.

  • 永久連結
  • 上層留言

[–]nxvd 5 指標 1 年 前 

It's not a programming language in the usual sense, but Theano is a pretty neat way to describe and train neural network architectures, however deep they are and whatever their characteristics. It's actually developed by people in Dr. Bengio's lab if I'm not mistaken.

  • 永久連結
  • 上層留言

[–]serge_cell 2 指標 1 年 前 

IMHO definitely should be. There are several open source packages with similar functionality right now, and different research papers refer to different packages for results reproduction. Would be great if one wouldn't have to install and learn new package to reproduce result, but just use ready made cfg or script in dl language. Would improve reproducibility too - results reproduced with different implementation are more relatable.

  • 永久連結
  • 上層留言

[–]PokerPirate 1 指標 1 年 前 

There are several open source packages with similar functionality right now

links?

  • 永久連結
  • 上層留言

[–]serge_cell 3 指標 1 年 前 

I mostly familiar with convolutional networks, so most of packages here are for CNN and autoencoders
Fastest:
1. cuda-convnet - most used gpgpu implementation, used in other packages too
https://code.google.com/p/cuda-convnet/ there are also several forks on github
2. caffe
https://github.com/BVLC/caffe
3. NNforge
http://milakov.github.io/nnForge/
Based on cuda-convnet, but include more staff:
4. pylearn2
https://github.com/lisa-lab/pylearn2
other staff:
http://deeplearning.net/software_links/

  • 永久連結
  • 上層留言

[–]polyguo 2 指標 1 年 前 

What probabilistic programming languages are you researching? Any experience with Church? I have an internship this summer with someone who does research using PPLs and it would be immensely useful to me if you could point me to resources that would allow me to get more familiar with the subject matter. Papers and actual code would be best.

  • 永久連結
  • 上層留言

[–]PokerPirate 1 指標 1 年 前 

Have you been to http://probmods.org? It's a pretty thorough tutorial.

  • 永久連結
  • 上層留言

[–]polyguo 2 指標 1 年 前 

I'm actually taking the probabilistic graphical models course in Coursera and i got a copy of Koller's book. I'm familiar with the theory, I've yet to see mature code written in PPLs.

And, yes, I've been to the site. I'm actually going to be working with one of the authors.

  • 永久連結
  • 上層留言

[–]PokerPirate 1 指標 1 年 前 

I've yet to see mature code written in PPLs

me too :)

  • 永久連結
  • 上層留言

[–]dwf 1 指標 1 年 前 

Is there a significant focus on proofs in machine learning too?

Machine learning is a big field. The folks who submit to COLT would be big on proofs. Others, not as much. Empirical study counts for a lot.

  • 永久連結
  • 上層留言

[–]orwells1 1 指標 12 月 前 

Can't see a reply so this might help:

“There is a strong oral tradition in training neural networks so if you read the papers it will be hard to understand how to do it, really the best thing is to just spend a couple of years next to someone who does it and ask them a lot of questions. Because there are a lot of those, so, to get results there are a lot of things you need to do and there are really boring and they are really hacky, and you don’t want to write them in your papers so you don’t, and so if you try and get into the field it can still be done, and people have done it but you need to be prepared for a lot of trial and error.”

Ilya Sutskever https://vimeo.com/77050653 2013, 1:05:13

  • 永久連結
  • 上層留言

載入更多留言 (1 回覆)

[–]wardnath 15 指標 1 年 前* 

Dr. Bengio, In your paper Big Neural Networks Waste Capacity you suggest that gradient descent does not work as well with a lot of neurons as it does with fewer. (1) Why do the increased interactions create worse local minima? (2) Do you think hessian free methods like in (Martens 2010) are sufficient to overcome these issues?

Thank You!

Ref: Dauphin, Yann N., and Yoshua Bengio. "Big neural networks waste capacity." arXiv preprint arXiv:1301.3583 (2013).

Martens, James. "Deep learning via Hessian-free optimization." Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010.

  • 永久連結

[–]dhammack 9 指標 1 年 前 

I think the answer to this one is that the increased interactions just lead to more curvature (off diagonal Hessian terms). Gradient descent, as a first-order technique, ignores curvature (it assumes the Hessian is the identity matrix). So what happens is that gradient descent is less effective in bigger nets because you tend to "bounce around" minima.

  • 永久連結
  • 上層留言

[–]yoshua_bengioProf. Bengio 9 指標 1 年 前 

This is essentially in agreement with my understanding of the issue. It's not clear that we are talking about local minima, but what I call 'effective local minima', because training gets stuck (they could also be saddle points or other kinds of flat regions). We also know that 2nd order methods don't do miracles, in many cases, so something else is going on that we do not understand yet.

  • 永久連結
  • 上層留言

[–]ian_goodfellow[S] 10 指標 1 年 前 

Verification post: https://plus.google.com/103174629363045094445/posts/2fqbkyYULAf

  • 永久連結

[–]hf98hf43j2klhf9 7 指標 1 年 前 

We should try to request Yann LeCunn as well, he seems to be open to the idea.

  • 永久連結
  • 上層留言

載入更多留言 (1 回覆)

[–]Megatron_McLargeHuge 11 指標 1 年 前 

With the recent success of maxout and hinge activations, how relevant is the older work on RBM pretraining using various contrastive divergence tweaks? What do you think is still worth investigating about stochastic models?

How biologically plausible is maxout, and should we care?

  • 永久連結

[–]yoshua_bengioProf. Bengio 4 指標 12 月 前* 

The older work on RBM and auto-encoders is certainly still worth further investigation, along with the construction of other novel unsupervised learning procedures.

For one thing, unsupervised procedures (and pre-training) remain a key ingredient to deal with the semi-supervised and transfer learning cases (and domain adaptation, and non-stationary data), when the number of labeled examples of the new classes (or of the changed distribution) is small. This is how we won the two 2011 transfer learning competitions (held at ICML and NIPS).

Furthermore, looking farther into the future, unsupervised learning is very appealing for other reasons:

  • take advantage of huge quantitities of unlabeled data

  • learn about the statistical dependencies between all the variables observed so that you can answer NEW questions (not seen during training) about any subset of variables given any other subset

  • it's a very powerful regularizer and can help the learner to disentangle the underlying factors of variation, making much easier to solve new tasks from very few examples

  • it can be used in the supervised case when the output variable (to be predicted) is a very high-dimensional composite object (like an image or a sentence), i.e., a so-called structured output

Maxout and other such pooling units do something that may be related to the local competition (often through inhibitory interneurons) between neighboring neurons in the same area of cortex.

  • 永久連結
  • 上層留言

[–]ian_goodfellow[S] 3 指標 12 月 前 

Right now pretraining does seem to be helpful for preventing overfitting in cases where there is very little labeled training data available. It now longer seems to be necessary as an optimization technique for deep networks, since we can just use the piecewise linear activation functions that are easy to optimize even for very deep networks.

Probabilistic models are still useful for tasks like classification with missing input (because they can reason about the missing inputs), or tasks where the goal is to repair damaged inputs (example: photo touchup) or infer the values of missing inputs, or where the task is just to generate realistic samples of data. It can also often be useful to have a probabilistic model that you use as part of a larger system. For example, if you want to use a neural net as part of an HMM, the HMM requires that its observation and transition models provide real probabilities.

Rectified linear units were partially motivated by biological plausibility concerns, because some neuroscientific evidence suggests that real neurons rarely operate in the regime where they reach their maximum firing rate.

I'm the grad student who came up with maxout, and I didn't have any biological plausibility concerns in mind when I came up with it. After I started using maxout for machine learning, another of Yoshua's grad students, Caglar Gulcehre, told me that there is some neuroscientific evidence for a function similar to maxout but with an absolute value being used in the deeper layers of the cortex. I don't know much about this myself. One thing about maxout that makes it a little bit difficult to explain in biological terms is the fact that maxout units can take on negative values. This is a bit awkward for a biological neurons since it's not possible to have a negative firing rate. But maybe biological neurons could use some average firing rate to indicate 0, and indicate negative values by firing less often than that.

My main interest is in engineering intelligent systems, not necessarily understanding how the human brain works. Because that's what my interest is, I am not very concerned with biological plausibility. Right now it seems easier to make progress in machine learning just by working from first principles than by reverse-engineering the brain. We don't have good enough sensor equipment to extract the kind of information from the brain that we would need to make reverse engineering it convenient.

  • 永久連結
  • 上層留言

[–]jkyle1234 13 指標 1 年 前* 

Hello Prof. Bengio, thank you for the AMA. What recommendations would you have for someone who is not a PHD in getting started with Deep Learning.

  • 永久連結

[–]yoshua_bengioProf. Bengio 4 指標 12 月 前 

See some of the pointers I put above:http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq7a3s

  • 永久連結
  • 上層留言

[–]32er234 1 指標 12 月 前 

Something wrong with the link

  • 永久連結
  • 上層留言

[–]uber_kerbonaut 1 指標 12 月 前 

maybe he's referring to this onehttp://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfpn5yp

  • 永久連結
  • 上層留言

[–][deleted] 12 指標 1 年 前 

Dear Yoshua, thanks for doing this!

You are, to my knowledge, the only ML academic to publicly (and wonderfully!) speculate about the sociocultural perspectives afforded by the vantage of deep representation learning. In your fascinating article "Culture vs Local Minima" you touch on many important things, some of which I'm very curious about:

  • You describe how individuals learn by being immersed in culture. We both agree that they don't always learn very wholesome things. If you were king of the world, and you could prescribe a set of concepts that should be a part of every childhood learning trajectory, what would those be and to what end?

  • A corollary of "cultural immersion" is that the specific process of learning is not evident to the learner, the world simply "is" in a particular way. The author David Foster Wallace phrased this phenomenon as akin to fish having to figure out what water is. In your opinion, is this phenomenon an experiential byproduct of the neural architecture, or does it confer some learning benefit?

  • Why do you think that cultural trends become entrenched and cause their learners to fight to stay in (what could be argued to be) local optima - like e.g. the conflicts between various religious institutions and Enlightenment philosophy, or patriarchal society vs the suffragettes, etc.? Is this a case of very pernicious parameters, or is there some benefit to the learners in question?

  • Do you have an opinion on such concepts as mindfulness meditation, and if so, how do you think they relate to the exploration of "idea space"?

Again, thanks a lot for taking the time. In the space of human ideas you are a trailblazer, and we are immensely richer for your presence!

  • 永久連結

[–]yoshua_bengioProf. Bengio 9 指標 1 年 前 

I am not a social scientist or a psychologist, so my opinions on these subjects should be taken as such. My opinion is that many learners stay entrenched in their beliefs because these beliefs have become part of their identity, their definition of who they are, and it's harder and scary to change that. There may also be a more computational aspect related to the notion of effective local minima (the optimization getting stuck). I believe that a lot of what our brain does is try to bring coherence to all of our experience, in order to construct a better model of the world. Mathematically, this may be related to the problem of inference, by which a learner searches for plausible explanations (latent variables) of the observed data. In stochastic models, inference is done by a form of stochastic exploration of configurations (and a Markov chain really looks like a series of free associations). Meditation and other time spent not doing anything directed but just thinking may well be useful to help us explore in this way. Sometimes it clicks, i.e., we find an explanation that fits well with many things. This is also how scientific ideas often seem to emerge (for me at least).

  • 永久連結
  • 上層留言

[–]yoshua_bengioProf. Bengio 10 指標 1 年 前 

Verification post: https://plus.google.com/112504130537129706790/posts/eqdBAysAyqR

  • 永久連結

[–]vondragon 8 指標 1 年 前 

I live in Montreal, working in the technology startup world. Very interested in your work, thank you for doing this AMA Professor Bengio. I worked hard to filter down to one question:

There seems to be a lot of disinterest from Machine Learning specialists and academics in general towards ML competitions hosted by Kaggle and the like. I recognize the odds of winning are quite low, making a the return on the investment of your time even worse, but it would seem to be even worse for ML enthusiasts that are flocking to participate. It would seem a few hours from an ML domain expert could be really beneficial on the right open datasets. Can you imagine an open, collaborative approach to competitive machine learning where experts and enthusiasts work effectively together?

  • 永久連結

[–]EJBorey 10 指標 1 年 前 

Here's an example where experts won a Kaggle contest: http://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/ And here, where they won the Netflix Prize: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

But I think the reason why they don't work on the problems is that the bad ML researchers won't win and therefore not publish, while the good ones would get paid millions of dollars by companies to answer the same questions! Why do it for free?

  • 永久連結
  • 上層留言

[–]vondragon 6 指標 1 年 前 

I would estimate that a majority of the time ML 'experts' do win the competitions, but they might not be recognized experts.

When a "non-expert" does win, they typically make up for their lack of domain sepecific ML knowledge by being an expert in a related domain like stats, math, programming, etc.

I think the dataset is an important factor to conisider here. Is it possible for an ML researcher to spend an insignificant amount of their time to apply some of their knoweldge building the model, at which point a larger crowd of less specialized people can compete on the remaining work?

  • 永久連結
  • 上層留言

[–]PasswordIsntHAMSTER 2 指標 1 年 前 

I'm in Montreal too, where do you work? o.O

  • 永久連結
  • 上層留言

[–]vondragon 1 指標 1 年 前 

Near Sherbrooke =D

  • 永久連結
  • 上層留言

[–]dwf 2 指標 12 月 前 

ML researchers are usually trying to push the methodological envelope, but that's often not required to solve some arbitrary domain problem. Usually dealing with the mountain of annoyances of real-world data sources is what takes up the majority of the time, and then a random forest, boosted tree ensemble or SVM will do an acceptable job (especially compared to the usually pitiful posted baseline). Doing really, really well may require some finesse but also a large time investment, that won't typically be rewarded in an academic incentive structure (as far as being rewarded monetarily, there's also something seriously wrong with the economics of Kaggle, as is well-articulated by this lightning talk; anyone who's any good and has a clue what they're worth won't bother).

In short, winning competitions is usually only useful to an academic if it demonstrates a particular research-related point.

  • 永久連結
  • 上層留言

[–]marvinalone 9 指標 1 年 前 

What's your opinion of Solomonoff Induction and AIXI? I'm just starting to read up on the topic, and I can't quite decide whether it's serious work, or a fringe theory by a small group of people who all cite each other.

  • 永久連結

[–]dylanbyte 2 指標 1 年 前 

I am interested in this also.

  • 永久連結
  • 上層留言

[–]eaturbrainz 2 指標 1 年 前 

Not Bengio, but reasonably well-versed in this specific topic.

It's serious work by theoreticians. You need a freaking Turing oracle to make those algorithms work, and all the relevant proofs are about global optimality in the presence of that Turing oracle, not about how good a learning/error rate you're going to get out of a finite sample with limited computing power (as you're going to need to build real algorithms).

That said, Schmidhuber and Hutter (who invented AIXI) have publication and competition records like nobody fucking else.

  • 永久連結
  • 上層留言

[–]dwf 2 指標 12 月 前 

I'll just say that while the IDSIA group's competition record and benchmark results are impressive, it's important to compare apples to apples. Comparing a method that uses elastic distortions and other dataset augmentation strategies against a method that doesn't doesn't tell you anything about either method; it's been known for decades that more data helps, and that you can sometimes acquire more data by artificially augmenting a given training set with distortions. It's important to not conflate impressive engineering with scientific novelty.

  • 永久連結
  • 上層留言

[–]EJBorey 9 指標 1 年 前 

We have all been hearing about the performance achievable via deep learning (in academic journals such as the New York Times, no less!). I've also heard that it's difficult for non-experts to get these techniques to work: Ilya Sutskever says that there is a weighty oral tradition about the design and training of deep networks and that the best way to learn how is to work for years with someone who is already an expert (source: http://vimeo.com/77050653).

I studied machine learning but not deep learning. Going back to grad school is not really an option for me. How can I learn how to design, build, and train deep neural networks without access to the oral tradition? Could you write it down for us somewhere?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

See the pointers I put above:http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq6wf0

http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq7a3s

  • 永久連結
  • 上層留言

[–]EJBorey 1 指標 12 月 前 

The second link is broken.

Do Hugo Larochelle's videos answer the questions here:http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq4rvi ?

  • 永久連結
  • 上層留言

[–]dylanbyte 2 指標 1 年 前 

Related to this: would it be possible to use a Bayesian approach to try and encode some of this folk-lore knowledge?

What is the road-map to making deep learning accessible to all?

Thank you.

  • 永久連結
  • 上層留言

[–]yoshua_bengioProf. Bengio 8 指標 12 月 前 

Hyper-parameter optimization has already been found to be a useful way to (partially) automate the search for good configurations in deep learning.

The idea is to automate the process of selecting the knobs, bells and whistles of machine learning algorithms, and especially of deep learning algorithms. We call such "knobs" hyper-parameters. They are different from the parameters that are learned during training, in that they are typically set by hand, by trial and error, or through a dumb and extensive exploration of all combinations of values (called "grid search"). Deep learning and neural networks in general involve many more such knobs to be tuned, and that was one of the reasons why many practitioners stayed far from neural networks in the past. It gave the impression of deep learning as a "black art", and it remains true that strong expertise helps a lot, but the research on hyper-parameter optimization is helping to move towards a more fully automated deep learning.

The idea of optimizing hyper-parameters is old, but had not had as much visible success until recently. One of the main early contributors to this line of work (before it was applied to machine learning hyper-parameter optimization) is Frank Hutter (along with collaborators), who devoted his PhD thesis (2009) to algorithms for optimizing knobs that are typically set by hand in general in software systems. My former PhD student James Bergstra and I worked on hyper-parameter optimization a couple of years ago and we first proposed a very simple alternative, called "random sampling" to standard methods (called "grid search"), which works very well and is very easy to implement.

http://jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

We then proposed using for deep learning the kinds of algorithms Hutter had developed for other contexts, called sequential optimization and this was published at NIPS'2011, in collaboration with another PhD student who devoted his thesis to this work, Remi Bardenet, and his supervisor Balazs Kegl (previously a prof in my lab, now in France).

http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf

This work has been followed up very successfully by researchers at U. Toronto, including Jasper Snoek (then a student of Geoff Hinton), Hugo Larochelle (who did his PhD with me) and Ryan Adams (now a faculty at Harvard) with a paper at NIPS'2012 where they showed that they could push the state-of-the-art on the ImageNet competition, helping to improve the same neural net that made Krizhevsky, Sutskever and Hinton famous for breaking records in object recognition.

http://www.dmi.usherb.ca/~larocheh/publications/gpopt_nips.pdf

Snoek et al put out a software that has since been used by many researchers, called 'spearmint', and I found out recently that Netflix has been using it in their new work aiming to take advantage of deep learning for movie recommendations:

http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html

  • 永久連結
  • 上層留言

[–]james_bergstra 1 指標 12 月 前* 

Plug for Bayesian Optimization and Hyperopt:

FWIW my take is that Bayesian Optimization + Experts designing the search spaces for SMBO algorithms is the way to deal with this: e.g. other post and ICML paper on tuning ConvNets

The Hyperopt Python package provides SMBO for ConvNets, NNets, and (soon) a range of classifiers from scikit-learnhyperopt-sklearn.

Sign up for Hyperopt-announce to get alerts about new stuff such as upcoming Gaussian-Process and regression-tree-based SMBO search algorithms similar to Jasper Snoek's Spearmint and Frank Hutter's SMAC software.

  • 永久連結
  • 上層留言

[–]EJBorey 2 指標 1 年 前 

Actually, I wasn't asking about the Bayesian optimization work that Jasper Snoek et al. are doing, because I don't think it will be possible to automate away all human judgement in the design of these things. Rather, I wanted to know how to quickly acquire the necessary intuition without postdoc-ing in Bengio, Hinton, or LeCunn's labs.

Deep learning will never be practical if there's only 10 people on the planet who can get it to work! Is there a way to quickly become one of the savants?

  • 永久連結
  • 上層留言

[–]orwells1 1 指標 12 月 前* 

Hello, same here. I fit the bill of their intended phd students (according to Y. Lecun's page, awesome math + coder), but wanted to avoid more phd/post-docs. I went through a reasonable number of papers, but in most there are either explanations missing or later the authors comment online on the "human in the loop optimization"/"tricks of the trade"/"black magic". I'm not sure if I should be investing much more of my time alone, if the full knowledge is not there. Is it? Thanks a lot for doing this!

  • 永久連結
  • 上層留言

[–]serge_cell 9 指標 1 年 前 

Hi Prof. Bengio, There were some work on applying "higher" math - algebraic/tropical geometry, category theory, to deep learning. Notably, John Healy several years ago claimed improving neural net (ART1) with category theory. What's your opinion on this approach? Will it be only toy model in foreseeable future, or there is some promise in this approach in your opinion?

  • 永久連結

[–]yoshua_bengioProf. Bengio 4 指標 12 月 前 

See the above suggestions http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq7a3sRegarding algebraic/tropical geometry, look at the work of Morton & Montufar.

  • 永久連結
  • 上層留言

[–]polyguo 2 指標 1 年 前 

Source? I'm extremely interested in the intersection between Programming Language Theory and Machine Learning. This seems to be right there.

  • 永久連結
  • 上層留言

[–]serge_cell 2 指標 1 年 前 

Healy:
http://www.ece.unm.edu/~mjhealy/
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.6807
Tropical geometry
Tropical geometry of statistical models
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.242.9890

  • 永久連結
  • 上層留言

載入更多留言 (1 回覆)

[–]n_dimensional 8 指標 1 年 前 

Dear Prof. Bengio,

I am about to finish my PhD in computational neuroscience and I am very interested in the "gray area" between neuroscience and machine learning.

What aspects of brain computation do you think are (or will be) most relevant for machine learning?

If you could know the answer to one question about how the brain computes information, what would that be?

Thanks!

  • 永久連結

[–]yoshua_bengioProf. Bengio 6 指標 12 月 前 

Understanding how learning proceeds in brains is clearly the subject most relevant to machine learning. We don't have a clue of how brains can learn in the kinds of efficient ways that we are able to implement in artificial neural networks, so this could be really important, and a place where information could flow both ways between machine learning research and computational neuroscience.

  • 永久連結
  • 上層留言

[–]exellentpossum 21 指標 1 年 前 

When asked about sum product networks, one of the original Google Brain team members told me he's not interested in tractable models.

What's your opinion about sum product networks? They made a big splash at NIPS one year and now they've disappeared.

  • 永久連結

[–]yoshua_bengioProf. Bengio 6 指標 1 年 前 

There are many kinds of intractabilities that show up in different places with various learning algorithms. The more tractable the easier to deal with in general, but it should not be at the price of losing crucial expressive power. I don't have a sufficiently clear mental fix on the expressive power of SPNs to know who much we lose (if any) through this parametrization of a joint distribution. In any case, all the interesting models that I know of suffer from intractability of minimizing the training criterion wrt the parameters (i.e. training is fundamentally hard, at least in theory). SVMs and other related kernel machines do not suffer from that problem, but they may suffer from poor generalization unless you provide them with the right feature space (which is precisely what is hard, and what deep learning is trying to do).

  • 永久連結
  • 上層留言

[–]celestec 3 指標 1 年 前 

Hi exellentpossum, I am studying some machine learning on my own, and have not yet come across "tractable models." What exactly is a tractable model? (Searching on my own didn't help much...) Sorry if this is a dumb question.

  • 永久連結
  • 上層留言

[–]exellentpossum 3 指標 1 年 前 

In the context of sum product networks, it means that inference is tractable or doesn't suffer from the exponential growth in computational cost when you add more variables.

This comes at a price though, sum product networks can only represent certain types of distributions. More specifically, probability distributions where its parameterization can be expressed as a product of factors (when multiplied out this creates a much larger polynomial). I'm not sure of the exact scope of distributions this encompasses, but it does include hierarchical mixture models.

  • 永久連結
  • 上層留言

[–]Scrofuloid 3 指標 1 年 前* 

Not quite. All graphical models can be represented as products of factors, and deep belief networks and such are special cases of graphical models. Inference in graphical models is usually considered intractable in the treewidth of the graph. So, in conventional graphical model wisdom, low-treewidth graphical models were considered 'tractable', and high-treewidth models were 'intractable', so you'd have to use MCMC or BP or other approximate algorithms to solve them.

Any graphical model can be compiled into an SPN-like structure (an arithmetic circuit, or AC). The problem is that in the worst-case, the resulting circuit can be exponentially large. So even though inference is still linear in the size of the circuit, it's potentially exponential in the size of the original graphical model. But it turns out certain high-treewidth graphical models can still be compiled into compact circuits, so you can still do efficient inference on them. This means that there are certain high-treewidth graphical models on which inference is tractable -- kind of a surprise to the graphical models community.

You can think of ACs and SPNs as a way to compactly represent context-specific independences. They can compactly represent distributions that would result in high-treewidth graphical models if you tried to represent them in the usual graphical models way. The difference between ACs and SPNs is that ACs are compiled from Bayesian networks, as a means of performing inference on them. SPNs directly use the circuit to represent a probability distribution. So instead of training a graphical model and hoping you can compile it into a compact circuit (AC), you directly learn a compact circuit that fits your training data (SPN).

  • 永久連結
  • 上層留言

[–]exellentpossum 1 指標 1 年 前 

I agree, SPNs can represent any probability distribution. But there is a certain set which can be represented efficiently. Can you be more specific about this set of distributions which can take advantage the factorization property of SPNs (a distribution with a reasonably sized circuit)?

  • 永久連結
  • 上層留言

[–]Scrofuloid 1 指標 1 年 前 

Hm. I don't know if there's a one-line way to characterize that set of distributions. It includes all low-treewidth graphical models, and some high-treewidth distributions with context-specific independences. Poon & Domingos' paper had a section relating SPNs to various other representations.

  • 永久連結
  • 上層留言

載入更多留言 (1 回覆)

[–][刪除] 1 年 前 

[deleted]

載入更多留言 (1 回覆)

[–]BeatLeJuce 7 指標 1 年 前 

  1. Why do Deep Networks actually work better than shallow ones? We know a 1-Hidden-Layer Net is already an Universal Approximator (for better or worse), yet adding additional fully connected layer usually helps performance. Were there any theoretical or empirical investigations into this? Most papers I read just showed that they WERE better, but there were very few explanations as to why -- and if there was any explanation. then it was mostly speculation.. what is your view on the matter?

  2. What was your most interesting idea that you never managed to publish?

  3. What was funniest/weirdest/strangest paper you ever had to peer-review?

  4. If I read your homepage correctly, you teach your classes in French rather than English. Is this a personal preference or mandated by your University (or by other circumstances)?

  • 永久連結

[–]yoshua_bengioProf. Bengio 6 指標 12 月 前 

Being a universal approximator does not tell you how many hidden units you will need. For arbitrary functions, depth does not buy you anything. However, if your function has structure that can be expressed as a composition, then depth could help you save big, both in a statistical sense (less parameters can express a function that has a lot of variations, and so need less examples to be learned) and in a computational sense (less parameters = less computation, basically).

I teach in French because U. Montreal is a French-language university. However, three quarters of my graduate students are non-francophones, so it is not a big hurdle.

  • 永久連結
  • 上層留言

[–]rpascanu 1 指標 12 月 前 

Regarding 1, there are some work in this direction. You can check out these papers:

http://arxiv.org/abs/1312.6098 (about rectifier deep MLPs),

http://arxiv.org/abs/1402.1869 (about deep MLPs with piecewise-linear activations),

RBM_Representational_Efficiency.pdf,

http://arxiv.org/abs/1303.7461.

Basically the universal approximator theorem says that a one layer MLP can approximate any function if you allow yourself an infinite number of hidden units which in practice one can not do. One advantage of deep models over shallow one is that they can be (exponentially) more efficient at representing certain family of functions (arguably the family of functions we actually care about).

  • 永久連結
  • 上層留言

[–]shanwhiz 6 指標 1 年 前 

We have seen deep learning work really well for image/video/sound. Do you foresee it working for text classification as well? Most papers that have tried text/document classification using deep learning have not done better than the conventional SVM/Bayes. What are your thoughts on this?

  • 永久連結

[–]yoshua_bengioProf. Bengio 9 指標 1 年 前 

I predict that deep learning will have a big impact in natural language processing. It has already had an impact, in part due to an old idea of mine (from NIPS'2000 and a 2003 paper in JMLR): represent words by a learned vector of attributes, learned so as to model the probability distribution of sequences of words in natural language text. The current challenge is to learn distributed representations for sequences of words, phrases and sentences. Look at the work of Richard Socher, which is pretty impressive. Look at the work of Tomas Mikolov, who beat the state of the art in language models using recurrent networks and who found that these distributed representations magically capture some form of analogical relationships between words. For example, if you take the representation for Italy minus the representation for Rome, plus the representation for Paris, you get something close to the representation for France: Italy - Rome + Paris = France. Similarly, you get that King - Man + Woman = Queen, and so on. Since the model was not trained explicitly to do these things, this is really amazing.

  • 永久連結
  • 上層留言

[–]hapagolucky 10 指標 1 年 前 

I see more and more pop media articles extolling deep learning as a panacea that will make AI a reality (Wired is especially guilty of this). Given the AI winters of the 1970's and 1980's that arose from overhyped expectations, what can deep learning and ML researchers and advocates do to mitigate this from happening again?

  • 永久連結

[–]yoshua_bengioProf. Bengio 4 指標 12 月 前 

Stick to the scientific ways of demonstrating advances (which often is lacking from companies branding themselves as doing deep learning). Avoid overselling. Stay humble while not using our motivation associated with the long-term vision that brought us here in the first place.

  • 永久連結
  • 上層留言

載入更多留言 (1 回覆)

[–][deleted] 7 指標 1 年 前 

Hi Bengio. I'm a masters candidate in robotics, mostly doing reinforcement learning mushed together with some ML regression methods for the identification of interesting value functions and state space representations.

How is your work life balance? Do you have fun? What sorts of things do you do to unwind?

I'm considering doing a PhD, but I literally feel like just getting a part-time job and doing independent research, because the academic environment can be pretty stifling.

Also, Montreal seems really fun!

J

  • 永久連結

[–]yoshua_bengioProf. Bengio 16 指標 1 年 前 

Life balance. That is tough. Many prominent scientists will tell you the same story. My inclination is to work as much as I can: that is probably part of the reasons for my early success, but this may threaten my health and personal life. We live in an environment which puts so much pressure on us that it is easy to forget that we are humans and we need breaks and to take care of our body (I have some health issues that I cannot just ignore) and our relationships with other humans. Some kind of self-discipline helps, but I found that what works best is to cultivate what is rewarding and pleasurable and the same time is good for me and my physical and emotional well-being. For example I like very much to walk (many ideas come!), not to speak about eating healthily and enjoying a romantic relationship based on authenticity and where I can really be myself.

Oh, and yes, Montreal IS fun ;-)

The advantage of academia is that you can focus on research and that you can benefit enormously from the interactions with other researchers. Research is a collective enterprise. This is NOT like what you tend to see in science-fiction movies. Never forget that!

  • 永久連結
  • 上層留言

[–][deleted] 1 指標 12 月 前 

This is really refreshing to hear!

I have been struggling with balance as well. I think I should find my balanced way of being a scientist as well, and find a supervisor who wants to be my long term colleague and friend - not just a pedantic sort of guide and disciplinary figure. Perhaps giving up on academia is the easy way out. Perhaps what I really need to do is make more inspirational friends, and help join and build the community I want to be a part of.

Thanks so much for the candid response! It's very eye opening. I hope you keep being awesome and inspiring people like me! (but no so much that we keep losing so much sleep on our work :p)

  • 永久連結
  • 上層留言

[–]Derpscientist 6 指標 1 年 前* 

Dr Bengio,

I'd like to thank you for the amazing research and software(theano, pylearn2) that your lab has contributed.

What are your feelings on Hinton and LeCun moving to industry?

What about academia and publishing your research is more valuable than the floating point overflow of money you could make at private companies?

Are you nervous that machine learning will go the way of time-series analysis, where a lot of advanced research takes place behind closed doors because the intellectual property is so valuable?

Given the recent advancements in training discriminative neural networks, what role do you envision generative neural networks play in the future?

  • 永久連結

[–]yoshua_bengioProf. Bengio 8 指標 12 月 前 

I think that with Hinton & LeCun in industry, there will be more rapid advance in applying deep learning to really interesting and large-scale problems. The down side may be a temporarily reduced offer in terms of supervising new graduate students for deep learning. However, there are many young faculty who are at the forefront of deep learning research and who are eager to take new strong students. And the fact that deep learning is being used heavily in industry means that more students get to know about the field and are excited to jump into it.

Personally, I prefer the freedom of academia over more zeros in my salary. See also what I wrote above:http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfpbc1g

I believe that a lot of research will continue to happen in academia and that in the large industrial labs the incentive to publish will remain high.

I think that generative networks are very important for the future. See what I wrote above about unsupervised learning (the two are not synonym, but often come together, especially since we found the generative interpretation of auto-encoders, see the work with Guillaume Alain, http://arxiv.org/pdf/1305.6663.pdf):

http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq7v4v

  • 永久連結
  • 上層留言

[–]quaternion 1 指標 1 年 前 

Could you provide additional info on who and what you are referring to with time series analysis?

  • 永久連結
  • 上層留言

[–]tryolabs_feco 9 指標 1 年 前* 

Hi Yoshua, very excited about this AMA, thank you for your time. I have a few questions:
- What are the biggest challenges in ML nowadays?
- What are the most interesting and/or creative ways you have seen people/businesses using ML?
- What does the future of Machine Learning look like?

  • 永久連結

[–]freieschaf 4 指標 1 年 前 

Last year I did my undergrad thesis on NLP using probabilistic models and neural networks partly inspired by your work. I became interested and at that point I considered doing further work on NLP. Currently I am pursuing an MSc degree taking several related courses.

But, after several months, I haven't found NLP to be as motivating as I was expecting it to be; research on this area seems to be a little stagnant, from my limited point of view. What do you think are some challenges that are making or going to make this field move forward?

Thanks for taking the time to answer some questions here!

  • 永久連結

[–]yoshua_bengioProf. Bengio 8 指標 1 年 前 

I believe that the really interesting challenge in NLP, which will be the key to actual "natural language understanding", is the design of learning algorithms that will be able to learn to represent meaning. For example, I am working on ways to model sequences of words (language modeling) or to translate a sentence in one language into a corresponding one in another language. In both of these cases we are trying to learn a representation of the meaning of a phrase or sentence (not just of a single word). In the case of translation, you can think of it like an auto-encoder: the encoder (that is specialized to French) can map a French sentence into its meaning representation (represented in a universal way), while a decoder (that is specialized to English) can map this to a probability distribution over English sentences that have the same meaning (ie. you can sample a plausible translation). With the same kind of tool you can obviously paraphrase, and with a bit of extra work, you can do question answering and other standard NLP tasks. We are not there yet, and the main challenges I see have to do with numerical optimization (it is difficult not to underfit neural networks, when they are trained on huge quantities of data). There are also more computational challenges: we need to be able to train much larger models (say 10000x bigger), and we can't afford to wait 10000x more time for training. And parallelizing is not simple but should help. All this will of course not be enough to get really good natural language understanding. To to this well would basically allow to pass some Turing test, and it would require the computer to understand a lot of things about how our world works. For this we will need to train such models with more than just text. The meaning representation for sequences of words can be combined with the meaning representation for images or video (or other modalities, but image and text seem the most important for humans). Again, you can think of the problem as translating from one modality to another, or of asking whether two representations are compatible (one expresses a subset of what the other expresses). In a simpler form, this is already how Google image search works. And traditional information retrieval also fits the same structure (replace "image" by "document").

  • 永久連結
  • 上層留言

[–]akshayxyz 1 指標 12 月 前 

I am not from academia, but ever since I have started following machine learning stuff, I keep getting interesting ideas/problems to solve. Here is one I got few years back.

You take simple math word problems, e.g. simple ratio/proportion, rate/motion, age, give/take etc. word problems, they can (have to) be translated to a bunch of constants, unkown(s) and math relations/concepts, eventually to find some unknown(s). And every one who understands the concepts, will come up with similar equations, and definitely one correct answer. You can view it as a NLP problem.. How to solve it? Well I don't know, may be trying to first extract basic concepts/relations from standard (and simple) word problems?

Thinking aloud - you may start by doing something like "part of (math) speech" tagging...or, get some labeled data ( problem -> math equation), and see if you can find some hidden factors/relations defining the translations...

  • 永久連結
  • 上層留言

[–]deeperredder 4 指標 1 年 前* 

While deep nets have helped move the state of the art forward in natural language text understanding, the improvements there haven't really been significant. Where do you think significant progress can come from in that field?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

I do think that significant progress will come in the area of natural language processing, most importantly, natural language understanding. Progressively, though (because full understanding is essentially AI-level understanding of the world around us). See my previous answer:

http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfpje92

  • 永久連結
  • 上層留言

[–]CyberByte 11 指標 1 年 前 

What will be the role of deep neural nets in Artificial General Intelligence (AGI) / Strong AI?

Do you believe AGI can be achieved (solely) by further developing these networks? If so: how? If not: why not, and are they still suitable for part of the problem (e.g. perception)?

Thanks for doing this AMA!

  • 永久連結

[–]davidscottkrueger 6 指標 12 月 前 

Hi! My name's David Krueger; I'm a Master's student in Bengio's lab (LISA).

My response is: it is not clear what their role will be. AGI may be theoretically achievable solely by developing NNs, (especially if we include RNNs), but this is not how it will actually take place.

What incompetentrobot said is literally false, but there is a kernel of truth, which is that Deep Learning (so far) just provides a set of methods for solving certain well-defined types of general Machine Learning problems (such as function approximation, density estimation, sampling from complex distributions, etc.).

So the point is that the contributions of the Deep Learning community haven't been about solving fundamentally new kinds of problems, but rather finding better ways to solve fundamental problems.

  • 永久連結
  • 上層留言

載入更多留言 (5 回覆)

[–]willis77 8 指標 1 年 前 

Have you observed practical applications where deep learning succeeds but traditional ML fails? i.e. not simply improving the state of the art on an image benchmark by X%, but a case where an intractable problem is made tractable, solely via deep learning?

  • 永久連結

[–]yoshua_bengioProf. Bengio 9 指標 12 月 前 

There is a constructed task on which all the traditional black-box machine learning that were tried failed, and where some deep learning variants work reasonably well (and where guiding the hidden representation completely nails the task, showing the importance of looking for algorithms that can discover good intermediate representations that disentangle the underlying factors). Note that many deep learning approaches also failed so this is interesting. Seehttp://arxiv.org/abs/1301.4083. What's particular about this task is that it is the composition of two much easier tasks (detecting objects, performing a logical operation on the result), i.e., it intrinsically requires more depth than a simple object recognition task.

  • 永久連結
  • 上層留言

[–]SnowLong 2 指標 1 年 前* 

I believe no one had commercially deployed system that could search untagged images up until deep convolutional nets hugely improved state of art on the ImageNet benchmark. It took less then half a year for Google to implement search in personal galleries after promising results were shown. So in a way traditional method failed - non were good enouph to actually put into production...

  • 永久連結
  • 上層留言

[–]Should_I_say_this 7 指標 1 年 前 

Can you describe what you are currently researching, first by bringing us up to speed on the current techniques used and then what you are trying to do to advance that?

  • 永久連結

[–]SnowLong 8 指標 1 年 前 

I think your question was answered by Yousua here:

Deep Learning of Representations: Looking Forward

Yoshua Bengio

arXiv:1305.0445v2 [cs.LG] 7 Jun 2013

  • 永久連結
  • 上層留言

[–]Should_I_say_this 1 指標 1 年 前 

This is excellent thanks!

  • 永久連結
  • 上層留言

[–]dwf 4 指標 12 月 前 

Following on work Ian and I did on maxout, I recently did some work empirically interrogating how and why dropout works, focusing on the rectified linear case. More recently I've been working on hyperparameter optimization.

  • 永久連結
  • 上層留言

[–]exellentpossum 3 指標 1 年 前 

It would be cool if members from Bengio's group could also answer this (like Ian).

  • 永久連結
  • 上層留言

[–]rpascanu 7 指標 12 月 前 

I've done some work lately on the theory side (showing that deep models can be more efficient than shallow ones):

  • http://arxiv.org/abs/1402.1869

  • http://arxiv.org/abs/1312.6098

I've been spending quite a bit of time on natural gradient, and I'm currently exploring variants of the algorithm, and I'm interested in how it addresses non-convex optimization specific problems.

  • http://arxiv.org/abs/1301.3584

And, of course, recurrent networks which have been the focus of my PhD since I started. Particularly I worked on understanding the difficulties of training them (http://arxiv.org/abs/1211.5063) and how depth can be added to RNNs (http://arxiv.org/abs/1312.6026).

  • 永久連結
  • 上層留言

[–]caglargulcehre 5 指標 12 月 前 

Hi, My name is Caglar Gulcehre and I am PhD student at Lisa lab. You can access my academic page from here,http://www-etud.iro.umontreal.ca/~gulcehrc/.

I have done some works related to Yoshua Bengio's "Culture and Local Minima" paper, basically we focused on empirically validating the optimization difficulty on learning high level abstract problems:http://arxiv.org/abs/1301.4083

Recently I've started working on Recurrent neural networks and we have a joint work with Razvan Pascanu, Kyung Hyun Cho and Yoshua Bengio: http://arxiv.org/abs/1312.6026

I've also worked on a new kind of activation function in which we claim to be more efficient in terms of representing complicated functions compared to regular activation functions i.e, sigmoid, tanh,...etc:

http://arxiv.org/abs/1311.1780

Nowadays I am working on Statistical Machine Translation and learning&generating sequences using RNNs and what not. But I am still interested in optimization difficulty for learning high level(or abstract) tasks.

  • 永久連結
  • 上層留言

[–]ian_goodfellow[S] 5 指標 1 年 前 

I'm helping Yoshua write a textbook, and working on getting Pylearn2 into a cleaner and better documented state before I graduate.

  • 永久連結
  • 上層留言

[–]exellentpossum 1 指標 12 月 前 

Any particular developments in deep learning that you're excited about?

  • 永久連結
  • 上層留言

[–]ian_goodfellow[S] 5 指標 12 月 前 

I'm very excited about the extremely large scale neural networks built by Jeff Dean's team at Google. The idea of neural networks is that while an individual neuron can't do anything interesting, a large population of neurons can. For most of the 80s and 90s, researchers tried to use neural networks that had fewer artificial neurons than a leech. In retrospect, it's not very surprising that these networks didn't work very well, when they had such a small population of neurons. With the modern, large-scale neural networks, we have nearly as many neurons as a small vertebrate animal like a frog, and it's starting to become fairly easy to solve complicated tasks like reading house numbers out of unconstrained photos: http://www.technologyreview.com/view/523326/how-google-cracked-house-number-identification-in-street-view/ I'm joining Jeff Dean's team when I graduate because it's the best place to do research on very large neural networks like this.

  • 永久連結
  • 上層留言

[–]Letter_Guardian 3 指標 1 年 前 

Hi Prof. Bengio,

Thank you for doing this AMA. Questions:

  1. How much do you think we can actually accomplish in the big data challenge?

  2. Do you think data alone is sufficient to solve practical problems, as opposed to use some kind of expert knowledge?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

At the end of the day there is only data. Expert knowledge is also coming from past experience: either communicated by some humans (recently, or in past generations, through cultural evolutio) or from genetic evolution (which also relies on experience to engrave knowledge into genes). What this may potentially say is that we may need different kinds of optimization methods and not just those based on local descent (like most learning algorithms).

All that being said, if I try to solve a practical problem in the short term, it can be very useful to use prior knowledge. There are many ways that this has been done in deep learning, either through preprocessing, architecture and/or training objective (e.g. especially through regularizers and pre-training strategies). However, I much prefer when the data can override the prior that is injected (and this is also theoretically more sound, as one consider that more and more data can be exploited).

  • 永久連結
  • 上層留言

[–]FuzzySets 3 指標 1 年 前 

I'm currently finishing up my undergrad in philosophy of science and logic and I am trying to make the switch to computer science for masters work with the intention of pursuing machine learning at the phd level. Besides filling in the obvious knowledge gaps in mathematics and basic programming skills, what are some of the things a person in my position could do to make themselves a more attractive candidate for your field of work? Thanks so much for visiting us a r/MachineLearning!

  • 永久連結

[–]yoshua_bengioProf. Bengio 10 指標 12 月 前 

Read deep learning papers and tutorials, starting from the introductory material and moving your way up. Take notes on your reading, trying to summarize what you learned.

Implement some of these algorithms yourself, from scratch, to make sure you understand the math for real, implementing variants of these, not just a copycat of a pseudo-code you found in a paper.

Play with these implementations on real data, maybe competing in Kaggle competitions. The point is that a lot is learned by actually putting your hands in data and playing with variants of these algorithms (this is true in general for machine learning).

Write about your experiences and results and thoughts in a blog. Initiate contact with researchers in the field and ask them if they would like to you to work remotely on some of the projects and ideas they have. Try to do an internship.

Apply to graduate school in a lab that actually does these things.

Is the roadmap clear enough?

  • 永久連結
  • 上層留言

[–]karmicthreat 3 指標 1 年 前 

So I've had a desire to get deep into Deep Learning and general machine learning for a while. I'm currently taking the computational neurology course coursera offers. I'll follow that up with the ML and NN courses.

Where do you recommend someone go from there? I've not seen much that is at the grad level out there.

  • 永久連結

[–]last_useful_man 1 指標 1 年 前 

computational neurology course coursera offers

https://www.coursera.org/courses?orderby=upcoming&search=computational%20neurology

(comes up empty) - care to clarify? Clinical neurology perhaps?

  • 永久連結
  • 上層留言

[–]karmicthreat 2 指標 1 年 前 

Sorry, I meant computational neuroscience. Which makes sense, since neurology would be more the study of disorders of the nerves. Which while interesting I'm not really after that particular aspect of the CNS.

  • 永久連結
  • 上層留言

[–]last_useful_man 1 指標 1 年 前 

Holy moly, that exists! https://www.coursera.org/course/compneuro

Awesome, thank you!

  • 永久連結
  • 上層留言

[–]lars_ 3 指標 1 年 前 

Hi! The guys behind the Blue Brain project intend to build a working brain by reverse engineering the human brain. I heard Hinton be critical of this approach in a talk. I got the impression that he believed the kind of work that is done within ML would be more likely to lead to a general strong AI.

Let's imagine we are some time in the future, and we have created strong artificial intelligence - that passes the Turing test, and generally passes as alive and conscious. If we look at the code for this AI, do you think it would mostly be a result of reverse engineering the human brain, or would it be mostly made of parts that we humans have invented on our own?

  • 永久連結

[–]yoshua_bengioProf. Bengio 6 指標 12 月 前 

I don't think that Hinton was critical of the idea of reverse-engineering the brain, i.e., to consider what we can learn from the brain in order to build intelligent machines. I suspect he was critical of the approach in which one tries to get all the details right without an overarching computational theory that would explain why the computation makes sense (especially from a machine learning perspective). I remember him making that analogy: imagine copying all the details of a car (but with an imperfect copy), putting them together, and then turning on the key and hoping for the car to move forward. It's just not going to work. You need to make sense of these details.

  • 永久連結
  • 上層留言

[–]redkk 3 指標 1 年 前 

Hi Sir, I am a self-learner trying to train a sparse autoencoder with linear/relu units. What would be a suitable sparsity cost which is differentiable? I saw something that uses KL divergence but could not understand it. Is sparsity-inducing formula a holy grail or secret? Thanks, KK.

  • 永久連結

[–]yoshua_bengioProf. Bengio 5 指標 12 月 前 

Not a holy grail or secret. With a denoising auto-encoder setup and rectifiers, you easily get sparsity, especially with an L1 penalty. With sigmoids you are better off with the KL divergence penalty. It just says that the output of the units should be close to some small target (like 0.05) in average, but instead of penalizing squared difference it uses the KL divergence, which is more appropriate for comparing probabilities. My colleague Roland Memisevic is more involved than I am in experimenting with such things and could probably tell you more.

  • 永久連結
  • 上層留言

[–]evc123 3 指標 1 年 前 

Hi Prof Bengio,

Is it possible to get into Lisa-Lab without any Machine learning/Deep Learning publications? The university I'm attending does a tiny bit of research in computer vision, bioinformatics, and 1980s-era neural networks; but none of it as contemporary or as in-depth as the research at Lisa-Lab and the other labs listed on Deeplearning.net

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

We have taken such candidates recently, especially if they are strong in math and computer science. Note that we have pretty much filled the open positions for Fall 2014, though.

  • 永久連結
  • 上層留言

[–]ddebarr 3 指標 12 月 前 

As EJBorey says, "I've heard that it's difficult for non-experts to get these techniques to work." Was is the most promising work being done to automate the configuration of deep learning networks? Thanks!

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

Please see this reply: http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq884k

  • 永久連結
  • 上層留言

[–]SnowLong 6 指標 1 年 前 

Is there attempts to apply neural nets to the task of machine translation?

When do you think NN based approaches replace statistical methods in commercially deployed MT systems? I mean in speech recognition(all major industry players) and vision(Google, Baidu) tasks NNs are already deployed...

  • 永久連結

[–]yoshua_bengioProf. Bengio 5 指標 12 月 前 

I just started a page that lists some of the papers on neural nets for machine translation:https://docs.google.com/document/d/1lqo5N1LzVWNPy1sYuujNa5vVNmyP5Zjv6VtEVgcFr6k

Briefly, since neural nets already beat n-grams on language modeling, you can first use them to replace the language-modeling part of MT. Then you can use them to replace the translation table (after all it's just another table of conditional probabilities). Other fun stuff is going on. The most exciting and ambitious approaches would completely scrap the current MT pipeline and learn to do end-to-end MT purely with a deep model. The interesting aspect of this is that the output is structured (it is a joint distribution over sequences of words), not a simple point-wise prediction (because there are many translations that are appropriate for a given source sentence).

  • 永久連結
  • 上層留言

[–]SnowLong 1 指標 12 月 前 

Thank you! Insights help and I'm starting to read papers so thanx for the list too (:

  • 永久連結
  • 上層留言

[–]EJBorey 1 指標 1 年 前 

Sure. Here's a New York Times article that talks about real-time machine translation from English into Mandarin:http://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html

  • 永久連結
  • 上層留言

[–]SnowLong 3 指標 1 年 前 

I saw that video from MS, very impressive one. But I do not believe MT part was done using NNs. Speech recognition - YES. Speech synthesis - most likely. MT - nope.

  • 永久連結
  • 上層留言

[–]Two-Tone- 5 指標 1 年 前 

What are your thoughts on Google acquiring all of these different AI related companies the last year or so?

  • 永久連結

[–]totes_meta_bot 2 指標 1 年 前* 

This thread has been linked to from elsewhere on reddit.

  • [/r/compsci] Deep learning pioneer Yoshua Bengio taking questions for his AMA in /r/MachineLearning

  • [/r/artificial] Deep learning pioneer Yoshua Bengio AMA: Thursday 1-2PM EST in /r/MachineLearning

  • [/r/Futurology] Deep learning pioneer Yoshua Bengio taking questions for his AMA in /r/MachineLearning

I am a bot. Comments? Complaints? Send them to my inbox!

  • 永久連結

[–]EJBorey 2 指標 1 年 前 

Any advice on hiring your students? What is compelling to the modern machine learning PhD?

  • 永久連結

[–]kablunk 2 指標 1 年 前* 

Sorry for being so mundane: What as yet unexplored fields do you see machine learning being applied to in the future?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

I would rather ask about fields where machine learning is NOT going to be applied ;-)

  • 永久連結
  • 上層留言

[–]sssub 2 指標 1 年 前 

Dear Prof. Bengio,

In Neuroinformatics several researchers work in the field of 'Reservoir Computing' (random sparse RNN with a linear read-out which is trained). Comparing this architecture to 'Deep networks' I see a lot of similarities in both approaches. There seems to be a strong link between learning abstract features in deep architectures and plasticity mechanisms in spiking reservoirs.

I would very much like to hear your opinion on this

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

Biological motivation is indeed very interesting, but learning the recurrent weights is crucial to get computational competence, as I wrote there:

http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfpboj8

  • 永久連結
  • 上層留言

[–]rpascanu 1 指標 12 月 前 

Correct me if I'm wrong, but the Reservoir Computing paradigm assumes that the reservoir (or recurrent and input to hidden weight matrices) are randomly sampled (from carefully crafted distribution) and not learned. By plasticity mechanism you refer here to RC methods that use some local learning mechanism of the weights ?

If not, I believe one can answer your question along this line. Both RC approaches and DL approaches are trying to extract useful features from data. However RC does not learn this feature extractor, while DL does. Of course, as you pointed out, there are a lot of similarities. There are a lot of things DL could learn from RC research and the other way around it.

  • 永久連結
  • 上層留言

[–]sssub 1 指標 12 月 前 

Yes, I am referring to local biologically-inspired learning mechanisms. An example being Spike-timing dependent plasticity (STDP) which is then investigated in reservoir systems. Such architectures look a lot like autoencoders.

  • 永久連結
  • 上層留言

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前* 

"Looking a lot like" is interesting, but we need a theory of how this enables doing something useful, like capturing the distribution of the data, or approximately optimizing a meaningful criterion.

  • 永久連結
  • 上層留言

[–]US932H923 2 指標 12 月 前 

Who are some of the people you have a lot of respect for?

What was the last fiction book that you've read?

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

I have a lot of respect for a lot of people! One clue is who I cite! Another is who I invite at the workshops and conferences I organize.

  • 永久連結
  • 上層留言

[–]m4linka 2 指標 12 月 前 

Dear Prof. Bengio.

In my experience with using different neural networks models, it seems that either a good initialization (for example via pretraining, or the sort of guided learning) or the structure (think of the convolutional net) or standard regularization like l2 norm is crucial for learning. In my opinion all of them are special forms of the regularization. Therefore, it looks that 'without prior assumptions, there is no learning'. In the era of 'big data' we can slowly decrease the influence of the regularization part - and therefore develop more 'data-driven' approaches.

Nonetheless, still some form of regularization is needed. For me it seems there is a complexity gap between training networks from scratch (and keeping the regularization as small as possible), and using regularized networks (structure, l2 norm, pre-training, smart initialization, ...). Something like P-hard vs NP-hard in the complexity theory.

Are you aware of any literature that tackle this problem from the formal or experimental perspective?

  • 永久連結

[–]yoshua_bengioProf. Bengio 4 指標 12 月 前 

In a theoretical sense, you would imagine that as the amount of data goes to infinity priors become useless. Not so, I believe. Not only because of the potentially exponential gains (in terms of number of examples saved) of some priors, but also because there are computational implications of some priors. For example, the depth prior can save you both statistically and computationally, when it allows you to represent a highly variable function with a reasonable number of parameters. Another example is the time for training. If (effective) local minima are an issue, then even with more training data, you would get stuck in poor solutions, that a good initialization (like pre-training) could avoid. Unless you make both the amount of data and computation resources to infinity (and not just "large"), I think some forms of broad priors are really important.

  • 永久連結
  • 上層留言

[–]m4linka 1 指標 12 月 前 

Not only because of the potentially exponential gains (in terms of number of examples saved) of some priors

That is interesting. Could you point out some literature on this topic?

  • 永久連結
  • 上層留言

[–]davidscottkrueger 1 指標 12 月 前 

According to yesterday's talk, the private dataset network in this paper was trained without regularization, suggesting that with enough data it may not be needed (although it likely depends on the dataset/task).http://arxiv.org/pdf/1312.6082v2.pdf

  • 永久連結
  • 上層留言

[–]US932H923 2 指標 12 月 前 

When you're learning something new, do you spend time trying to figure out how the learning process is happening in your own brain?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

Typically not. I get too excited when something clicks. My brain races and my urge is to write my understanding down or talk about it.

But at other times, I do marvel on this phenomenon and I think about it.

  • 永久連結
  • 上層留言

[–]DavidJayHarris 2 指標 12 月 前 

Hi Professor Bengio, thanks so much for answering our questions. I was wondering what you thought of stochastic feedforward methods like Tang and Salakhutdinov presented at NIPS last year.

It seems to me like a great way to get some of the benefits of stochastic methods (especially the ability to predict at multiple modes) while retaining the efficiency of feedfoward methods that can be trained by backprop. It seems like there are some interesting parallels between this approach and the stochastic networks your lab has been working on, and I'd love to hear your thoughts on the comparison.

Thanks again!

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

I very much like their paper. We have been working on very similar stuff!

  • 永久連結
  • 上層留言

[–]nxvd 2 指標 12 月 前 

Hello Dr. Bengio,

Thank you for your time. There are two questions I would like to ask you, if you don't mind:

  • How is the atmosphere in your lab?
  • What do you look for in a graduate student?
  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

I consider one of my greatest success is to have contributed to a collaborative, open, and collegial atmosphere in the lab. The common good is not an idle concept, here. It also helps to make students a lot more motivated, they enjoy their time here and contributing to group efforts.

  • 永久連結
  • 上層留言

[–]dhammack 1 指標 1 年 前 

If I were summarizing the results from deep models, I'd say that deep models are excelling in problems that humans held the previous state-of-the-art (vision/audio/language).

Do you know of any successes in problems of the opposite nature; problems where statistical methods are already better than humans? One example I can think of is the Merk Kaggle challenge won by George Dahl, but I'd love to hear of some more.

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

Yes, I know of some such cases, in the realm of recommendation systems or fraud detection, when the number of input variables is large and cannot be easily visualized or digested by a human. Although I don't know of head-to-head comparisons with human performance, the sheer speed advantage makes it impractical to even consider humans for such jobs (except maybe to consider the few cases flagged by a machine).

  • 永久連結
  • 上層留言

[–]zach_will 4 指標 1 年 前 

Hi Professor!

I always find myself resorting to ensembles and random forests in my projects (I think I can just internalize decision trees much better than deep learning). Could you offer the flip side for why I should be excited about neural networks?

(I mostly work with "medium-sized" data, and it usually fits on a single machine.)

Thanks!

  • 永久連結

[–]yoshua_bengioProf. Bengio 4 指標 12 月 前 

I wrote some papers explaining why decision trees are doomed to generalize poorly:

http://www.iro.umontreal.ca/~lisa/pointeurs/bengio+al-decisiontrees-2010.pdf

The key point is that decision trees (and many other machine learning algorithms) partition the input space and then allocate separate parameters to each region. Thus no generalization to new regions or across regions. No way you can learn a function which needs to vary across a number of distinguished regions that is greater than the number of training examples. Neural nets do not suffer from that and can generalize "non-locally" because each parameter is re-used over many regions (typically HALF of all the input space, in a regular neural net).

  • 永久連結
  • 上層留言

[–]kablunk 3 指標 1 年 前 

What are some things that self-taught machine learning scientists lack that those trained in a formal environment (university or similar) have?
(I'm asking as a member of the first group)

  • 永久連結

[–]SuperFX 4 指標 1 年 前 

There seems to be a recent trend where a lot of deep learning researchers have moved to industry, ostensibly to gain access to very large data sets. Do you think deep learning research within academia can continue to flourish without such access? Or is the field invariably moving toward HPC and massive data sets as perquisites?

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

I think that there are plenty of huge datasets available for free out there. Think about all of wikipedia, all of youtube, etc. Not to mention: all of the internet.

Computing power is another question, but actually in some countries like Canada, the government is encouraging (or forcing) scientists to share computational resources. The result is that I have access to more computational power than most of my american colleagues. Plus, the cost of computing power continues to go down.

  • 永久連結
  • 上層留言

[–]javiermares 3 指標 1 年 前* 

Professor Bengio,

What do you think of Ray Kurzweil's PRTM? Do you think any of its characteristics could be implemented on current deep learning techniques to improve their capabilities?

Thank you.

  • 永久連結

[–]yohamoha 3 指標 1 年 前 

Hello, professor. I have a question that I always ask experts in their fields: In your field of study, what is the best book/paper you know of? Why? (here "best" can have any meaning, as long as it's specified)

Thanks.

  • 永久連結

[–]yoshua_bengioProf. Bengio 5 指標 12 月 前 

There are too many good papers.

My students have put together a list of papers to read for the new students of the lab:

https://docs.google.com/document/d/1IXF3h0RU5zz4ukmTrVKVotPQypChscNGf5k6E25HGvA

  • 永久連結
  • 上層留言

[–]hltt 1 指標 1 年 前 

Do you think of any other interesting deep learning approaches to NLP than Recursive Neural Network from Richard Socher ?

  • 永久連結

[–]rpascanu 1 指標 12 月 前 

RNNs as in recurrent neural networks (e.g. Tomas Mikolov's work) are also very interesting IMHO.

  • 永久連結
  • 上層留言

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

Indeed.

  • 永久連結
  • 上層留言

[–][deleted] 1 指標 1 年 前 

Hi professor Yoshua Bengio.

Do you think that machine learning as we understand it today will be the basis of future AI?

Which is a bigger obstacle to making AI stronger, hardware limitations or algorithmic/software problems? What is the biggest obstacle to making AI better in general?

What do you think of Ray Kurzweil's prediction that an AI will pass the Turing test by 2029? He has placed a bet on this prediction.

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

I won't bet on the year that AI will pass the Turing test, but I will certainly bet that machine learning will be a central technology to future AI.

The biggest obstacle to improving AI is to improve machine learning. To improve ML enough to get there, there are still many obstacles. Only some of them have to do with computing power. Others are more conceptual. For example I am convinced that there are still fundamental obstacles to learning the joint distribution of many variables for AI-like tasks. I also think that we have not even scratched the surface of the optimization challenges involved in training very large deep nets. Then there is reinforcement learning, which will be clearly necessary and on which advances are clearly needed (see the recent exciting work by the DeepMind people, on learning to play 80's Atari games, and presented at the Deep Learning Workshop at NIPS, which I organized).

  • 永久連結
  • 上層留言

[–][deleted] 1 指標 12 月 前 

Thank you for your response.

  • 永久連結
  • 上層留言

[–]edersantana 1 指標 1 年 前 

Which suggestions would you give to a young professor building a new research lab on machine learning, neural networks and such? What do you think are the most important aspects about lab environment, hardware and software resources? What about international cooperation? Also, How to be competitive worldwide?

  • 永久連結

[–]yoshua_bengioProf. Bengio 6 指標 12 月 前 

Focus on your research.

Engage in collaboration and discussion with scientists from which you can learn.

Read. Read. Read.

Focus on your research.

Nourish your graduate students intellectually and at a personal relational level, like a father with his children.

Go to the best conferences of your field. Talk. Talk. Talk.

Keep thinking about the long term and steering back in the directions that you believe are promising, even though it's tempting to follow the trend and do incremental contributions. Believe in yourself.

Focus on your research.

  • 永久連結
  • 上層留言

[–]sixwings 1 指標 1 年 前 

Professor Bengio,

Thank you for taking our questions. How do you respond to this criticism of Deep Learning from Jeff Hawkins:

Hawkins, author of On Intelligence, a 2004 book on how the brain works and how it might provide a guide to building intelligent machines, says deep learning fails to account for the concept of time. Brains process streams of sensory data, he says, and human learning depends on our ability to recall sequences of patterns: when you watch a video of a cat doing something funny, it’s the motion that matters, not a series of still images like those Google used in its experiment. “Google’s attitude is: lots of data makes up for everything,” Hawkins says.

Source: Deep Learning

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

See the replies below. There is plenty of deep learning work involving temporal structure. More will come, for sure.

  • 永久連結
  • 上層留言

[–]richardabrich 1 指標 1 年 前 

Recurrent neural networks model temporal relationships implicitly. They're often used for speech recognition. There has been some work on deep recurrent neural networks. [1,2]

[1] http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf

[2] http://papers.nips.cc/paper/5166-training-and-analysing-deep-recurrent-neural-networks.pdf

  • 永久連結
  • 上層留言

[–]rpascanu 1 指標 12 月 前 

http://arxiv.org/abs/1312.6026.

RNN are also used in NLP. Some other interesting work that goes towards recurrent models (for scene parsing now) is this: http://arxiv.org/abs/1306.2795

  • 永久連結
  • 上層留言

[–]davidscottkrueger 1 指標 12 月 前 

Of course, this cannot be taken as a valid criticism of the promise or potential of Deep Learning, because DL can account for the concept of time.

However, I think the point he is making about systems that interact with the world in real time vs. systems that don't is huge, and currently, DL's big successes are not in real-time applications.

I think a greater emphasis on real-time methods across the board would be a good thing. And I think that Reinforcement Learning will ultimately be more important than supervised/unsupervised learning.

  • 永久連結
  • 上層留言

[–]hf98hf43j2klhf9 1 指標 1 年 前 

[META] In the comments at the verification page it looks like Yann LeCun is open to the AMA idea as well! Should we try to request him as well?

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

It would be fun.

  • 永久連結
  • 上層留言

[–]32er234 1 指標 12 月 前 

You sending him an email will probably be more effective than all of us trying to bombard his social media pages ;-)

  • 永久連結
  • 上層留言

[–]IdoNotKnowShit 1 指標 1 年 前 

Bonjour professeur Bengio! Thank you so much for this AMA! Here are a few questions of mine (not chosen i.i.d.):

Where does deep learning show promise? And in what application would it be an absolutely horrible choice?

Why do stacked RBMs work? Is this something that can be explained in a throughly formal manner or is there still some magic that needs to be unraveled?

What would you say is the relationship between ensemble learning and deeply layered learning?

Can you describe some of the work your lab/grad students is/are doing and why you support it?

What are some of the best things about living in Montreal?

How do you like to approach a research question? What kind of working environment do you prefer?

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前* 

There is no such thing as magic, except in our emotional interpretation. I believe that I have a fairly rounded interpretation of why stacks of RBMs or regularized auto-encoders work so well. I have written about this, see in particular the 2013/2013 review paper with Courville & Vincent:

http://arxiv.org/abs/1206.5538

(also published in PAMI 2013)

I don't know of relationships between ensemble learning and deep layered learning besides the beautiful interpretation of dropout. For example, see http://arxiv.org/abs/1312.6197

My students have written a few words about studying in Montreal, for new graduate candidates:

http://www.iro.umontreal.ca/~bengioy/yoshua_en/index_files/open_positions.html

Montreal is a large city with 4 universities, a very rich cultural tradition, near nature, and where the quality of life (including security) is among the best (the 4th best in North-America, according to Mercer). Cost of life is substantially less than in other similar-sized North-American cities.

  • 永久連結
  • 上層留言

[–]moseconseco2 1 指標 1 年 前 

Can you talk about the connection, if there is one, between big, structured knowledge projects like Google'sKnowledge Graph (built largely on the entity graph Freebase) and deep learning?

Is it significant that the data of the knowledge graph has this recursive network structure that looks a lot like the layers of abstraction in a deep learning setup?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

There is plenty of room in the Knowledge Graph project for machine learning, and so for deep learning. In particular, you want ML to help you guess the missing attributes of objects in the graph and even guess the missing relationships (so that you can even automatically insert new objects in the graph, based on some of their attributes).

  • 永久連結
  • 上層留言

[–]strayadvice 1 指標 12 月 前 

This question is regarding deep learning. From what I understand, the success of deep neural networks on a training task relies on choosing the right meta parameters, like network depth, hidden layer sizes, sparsity constraint, etc. And there are papers on searching for these parameters using random search. Perhaps some of this relies on good engineering as well. Is there a resource where one could find "suggested" meta parameters, maybe for specific class of tasks? It would be great to start with these tested parameters, then searching/tweaking for better parameters for a specific task.

What is the state of research on dealing with time series data with deep neural nets? Deep RNN's perhaps?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 12 月 前 

Regarding the first question you asked, please refer to what I wrote earlier about hyper-parameter optimization (including random search);

http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfq884k

James Bergstra continues to be involved in this line of work.

  • 永久連結
  • 上層留言

[–]rpascanu 2 指標 12 月 前 

What is the state of research on dealing with time series data with deep neural nets? Deep RNN's perhaps?

Here are a list of more recent work. The idea of Deep RNN's (or hierarchical ones) is older, and both Jurgen Schmidhuber and Yoshua have papers about it since the 90's.

  • http://arxiv.org/abs/1306.2795

  • http://arxiv.org/abs/1312.6026

  • http://arxiv.org/abs/1308.0850

  • http://papers.nips.cc/paper/5166-training-and-analysing-deep-recurrent-neural-networks.pdf

  • 永久連結
  • 上層留言

[–]james_bergstra 2 指標 12 月 前 

I think having a database of known-configurations that make for good starting points for search is a great way to go.

That's pretty much my vision for the "Hyperopt" sub-projects on github: http://hyperopt.github.io/

The hyperopt sub-projects specialized for nnets, convnets, and sklearn currently define priors over what hyperparameters make sense. Those priors take the form of simple factorized distributions (e.g. number of hidden layers should be 1-3, hidden units per layer should be e.g. 50-5000). I think there's room for richer priors, different parameterizations of the hyperparameters themselves, and better search algorithms for optimizing performance over hyperparameter space. Lots of interesting research possibilities. Send me email if you're interested in working on this sort of thing.

  • 永久連結
  • 上層留言

[–][刪除] 12 月 前 

[deleted]

[–]yoshua_bengioProf. Bengio 4 指標 12 月 前 

Initially, 90% intuition, 10% math.

Then more math comes. Then you try it out and you find problems and you update your intuition and your math... etc.

And intuition comes from letting a problem sit in your head for a while, reading about it, asking yourself the question, working with it, talking with others about it, etc.

  • 永久連結

[–]32er234 1 指標 12 月 前 

Is fluency in French a pre-requisite to becoming your student? Does it matter at all?

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

Not a pre-requisite at all. Most new students know very little or no French when I recruit them.

  • 永久連結
  • 上層留言

[–]32er234 1 指標 12 月 前 

Given three candidates, none of which have much experience in ML, who would you rather chose as a potential student (other dimensions being equal):

  • Someone experienced in applied statistics (say, psychology research, or epidemiology), knows R

  • Someone who is very good at software development and knows some numpy/scipy, Matlab

  • Pure math undergrad who has little exposure to either programming or "real world" data

  • 永久連結

[–]yoshua_bengioProf. Bengio 2 指標 12 月 前 

I can afford many students. I would not evaluate based on the above features but also based on an interview, in which all aspects come together. Strength in math is an excellent predictor of success in machine learning research, and so math undergrads with good programming skills are very high on my list of preferences. Strong software development is also very important for many of the projects we have, which involve big data and big models, where computational efficiency and top-notch collective programming are really important.

  • 永久連結
  • 上層留言

[–]andrewff 1 指標 12 月 前 

I know I'm a little late to the party, but I was just wondering if you thought there was any room for an evolving topologies algorithm such as NEAT within deep learning? In some ways, techniques like dropout and dropconnect approach an evolving topolgy type methodolgy, but overall the idea of an evolving topology is not entirely captured by such techniques.

Thanks for doing this AMA!

  • 永久連結

[–]rishok 1 指標 12 月 前 

Hello Prof. Bengio. I am a student from Denmark.

I am trying to add your Maxout Networks solution to the sparse autoencoder to see the potential benefits ... do you have any pre comment?

Can we be allowed to see more updates on your DL book .. hehe

  • 永久連結

[–]meiyordrummer123 1 指標 7 月 前 

Hello professor Bengio I tried to run the Matlab toolbox that you have for DBN and I run at same time the Plearn app, but I want to know how can run a similar process between them?, because it is some options on plearn that are so different with the Matlab schemes and it would be useful to prototype a faster application.

Thank you

JMM

  • 永久連結

[–]sasaram 1 指標 1 年 前 

Hi Prof. Bengio, very happy to see you here.

  1. What is the difference between learning and deep learning ? For example, the neural network language model (using RNNS) published in work by Mikolov is referred as a deep learning method , can you point out the reason or maybe explain (deep learning) by using another example of a deep learning method ?
  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 1 年 前 

Recurrent nets are deep in the sense that the computation they perform (when you consider unfolding them in time) corresponds to a very deep network (albeit with shared weights across layers).

My definition of deep is that you have multiple levels of representation, with the i-th level obtained as a learned function of the representations at the lower levels. I also insist that the number of such levels be data-dependent, and I expect that higher-level representations capture more abstract features of the data which can only be obtained by the composition of the features at the lower levels, i.e., they are highly non-linear functions of the raw input.

  • 永久連結
  • 上層留言

[–]melipone 1 指標 1 年 前 

Hi! No experience with deep learning, here. The introduction says that deep learning advances in machine learning can be used to solve artificial intelligence problems. Does that mean solving the consciousness/self-awareness problem or is it in a narrow sense?

  • 永久連結

[–]yoshua_bengioProf. Bengio 5 指標 1 年 前 

Deep learning is not about consciousness or self-awareness but about something that I consider much more important from a practical point of view as well as much more challenging: allowing computers to understand the world around us. I believe that we will have fairly intelligent machines, that understand the world around us, but have no "consciousness" or rather no "self" in any way close to what humans have. Not because it would be difficult to introduce that, but because it would not be necessary in order to produce a lot of useful technology. Not to speak of the fact that once you introduce self in intelligent machines, you have to worry about Asimov's rules etc.

  • 永久連結
  • 上層留言

[–]anne-nonymous 1 指標 1 年 前 

There are some robots who are self-aware :). Seriously.

http://www.scientificamerican.com/article/automaton-robots-become-self-aware/

  • 永久連結
  • 上層留言

[–]dnoup -2 指標 1 年 前 

What exactly is deep learning and how it differ from conventional ML?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 1 年 前 

I have my definition above:

http://www.reddit.com/r/MachineLearning/comments/1ysry1/ama_yoshua_bengio/cfqay3e

Keep in mind that deep learning is part of machine learning.

  • 永久連結
  • 上層留言

[–]augustus2010 -3 指標 1 年 前 

Could you explain Rationale behind sparse and deep learning?

  • 永久連結

[–]yoshua_bengioProf. Bengio 3 指標 1 年 前 

I have already explained why deep learning is interesting. It is a broad prior and it brings both statistical and computational advantages, where it is an appropriate prior.

Sparsity is another prior: it assumes that for any given input, only a small subset of all the possible concepts known to the learner are relevant. Again, it is useful where it is applicable.

I believe that both priors are useful for many real-world problems where we want AI.

  • 永久連結
  • 上層留言

[–]melipone -9 指標 1 年 前 

Did we get just 5 responses to ~100 questions?

  • 永久連結

[–]Noncomment 4 指標 1 年 前 

I'm actually very impressed with this AMA. He answered almost all of the questions and put a lot of effort into the responses. Your comment was premature.

  • 永久連結
  • 上層留言

[–]vinnl -16 指標 1 年 前 

Do you know all terms mentioned in questions here?

  • 永久連結

你可能感兴趣的:(机器学习与数据挖掘,杰出科学家)