This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
structured_factorization [2017/11/24 00:53]
structured_factorization [2018/10/27 12:07] (current)
Line 304: Line 304:
 http://​openaccess.thecvf.com/​content_cvpr_2017/​papers/​Yu_On_Compressing_Deep_CVPR_2017_paper.pdf On Compressing Deep Models by Low Rank and Sparse Decomposition http://​openaccess.thecvf.com/​content_cvpr_2017/​papers/​Yu_On_Compressing_Deep_CVPR_2017_paper.pdf On Compressing Deep Models by Low Rank and Sparse Decomposition
 +https://​arxiv.org/​abs/​1711.00811 Expressive power of recurrent neural networks
 + A certain class of deep convolutional networks -- namely those that correspond to the Hierarchical Tucker (HT) tensor decomposition -- has been proven to have exponentially higher expressive power than shallow networks. ​
 +https://​arxiv.org/​pdf/​1712.05134.pdf Learning Compact Recurrent Neural Networks with Block-Term Tensor
 +Block-Term tensor
 +decomposition,​ which greatly reduces the parameters
 +of RNNs and improves their training efficiency. Compared
 +with alternative low-rank approximations,​ such as tensortrain
 +RNN (TT-RNN), our method, Block-Term RNN (BTRNN),
 +is not only more concise (when using the same rank),
 +but also able to attain a better approximation to the original
 +RNNs with much fewer parameters. On three challenging
 +tasks, including Action Recognition in Videos, Image
 +Captioning and Image Generation, BT-RNN outperforms
 +TT-RNN and the standard RNN in terms of both prediction
 +accuracy and convergence rate. Specifically,​ BT-LSTM
 +utilizes 17,388 times fewer parameters than the standard
 +LSTM to achieve an accuracy improvement over 15.6% in
 +the Action Recognition task on the UCF11 dataset.
 +https://​arxiv.org/​pdf/​1801.02144v1.pdf Covariant Compositional Networks For Learning Graphs
 +Most existing neural networks for learning graphs address permutation invariance by conceiving of the network as a message passing scheme, where each node sums the feature vectors coming from its neighbors. We argue that this imposes a limitation on their representation power, and instead propose a new general architecture for representing objects consisting of a hierarchy of parts, which we call Covariant Compositional Networks (CCNs). Here, covariance means that the activation of each neuron must transform in a specific way under permutations,​ similarly to steerability in CNNs. We achieve covariance by making each activation transform according to a tensor representation of the permutation group, and derive the corresponding tensor aggregation rules that each neuron must implement. Experiments show that CCNs can outperform competing methods on standard graph learning benchmarks.
 +https://​arxiv.org/​abs/​1804.07090 Low Rank Structure of Learned Representations
 +In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically,​ we show that this has implications for compression and robustness to adversarial examples.
 +https://​papers.nips.cc/​paper/​3904-guaranteed-rank-minimization-via-singular-value-projection.pdf ​
 +https://​arxiv.org/​abs/​1805.04582v1 TensOrMachine:​ Probabilistic Boolean Tensor Decomposition
 +https://​arxiv.org/​abs/​0909.4061 Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions
 +https://​arxiv.org/​abs/​1802.05983v2 Disentangling by Factorising
 +https://​arxiv.org/​pdf/​1810.10531.pdf A mathematical theory of semantic development
 +in deep neural networks
 +The synaptic weights of the neural
 +network extract from the statistical structure of the environment
 +a set of paired object analyzers and feature synthesizers associated
 +with every categorical distinction. The bootstrapped,​ simultaneous
 +learning of each pair solves the apparent Gordian knot of knowing
 +both which items belong to a category, and which features are important
 +for that category: the object analyzers determine category
 +membership, while the feature synthesizers determine feature importance,
 +and the set of extracted categories are uniquely determined
 +by the statistics of the environment.