Google engineers talk about the importance of Bengio's deep learning papers

The paper "Understanding Deep Learning Needs to Rethink Generalization" has caused people to ponder, and many people are puzzled. Also discussed on Quora. Google Brain Engineer Eric Jang believes that the deep learning mechanism can promote the application of deep learning around life, and Zhang et al.2016 may become an important weather vane.
In 2017, many machine learning researchers are trying to solve a problem: How does a deep neural network work? Why are they good at solving practical problems?
Even if people don't care much about theoretical analysis and algebra, understanding the working mechanism of deep learning can help us promote the application of deep learning in real life.
The paper "Understanding deep learning requires rethinking generalizaTIon" shows some interesting features of neural networks. In particular, the neural network has enough ability to remember randomly entered data. In the SGD optimization settings, the training set error can be reduced to the ImageNet-sized data set.
This runs counter to the following classic narrative: "Deep learning miraculously discovers low-level, intermediate, and advanced features, just as the mammalian brain V1 system exhibits behavior when learning to compress data."
During 2012-2015, many researchers used "inductive bias" to explain how deep networks reduce test errors, suggesting some form of generalization.
However, if a deep network can memorize random data, this indicates that the inductive bias is also compatible with memory and does not fully explain generalization capabilities (eg, convolution/pooling architecture, Dropout, batchnorm, etc.).
Part of the reason for this paper's attention is that it won the "Perfect score" and ICLR2017 Best Paper Award in the ICLR review. This has caused people to talk hotly, so there is a little feedback loop. I think this is a good paper because it presents a question that no one has asked and provides strong experimental evidence to prove some very interesting results.
However, I think it takes 1-2 years for the deep learning community to agree on whether a paper is important. Especially for those non-analytical, empirically concluded conclusions.
Tapabrata Ghosh pointed out that some researchers believe that although deep networks have memory capabilities, this may not be something that deep networks do in practice. This is because the time required to "remember" a semantically meaningful data set is shorter than the time required to remember the random data, indicating that the deep network can take advantage of the existing semantic laws in the training set.
I think that Zhang et al. 2016 may become an important indicator in understanding the way the deep network works, but it does not solve the problem of deep network generalization. Maybe someone will challenge the point of this paper right away. This is the essence of experimental science.
In short, this paper is considered to be very important because it demonstrates that deep learning learns random databases in a memory manner. Then I raised the question of how deep networks learn non-random data sets.
Here are my comments on generalization issues:
A high-capacity parametric model with good optimization goals absorbs data like a sponge. I think the deep network optimization goal is very "lazy" but powerful: deep networks can have a semantic feature hierarchy when providing correct model bias and compatibility with input data. But if it is not convenient to optimize, the deep learning network will be optimized in a way that only memorizes the data.
What we lack now is the way to control the degree of memory vs. generalization, and the inability to use powerful tools like weight regularization and dropout.

CBD vape pen
Suizhou simi intelligent technology development co., LTD , https://www.msmvape.com