Convolutional Kernel Networks

we have preferred to use L-BFGS-B on 300 000 pairs of randomly selected training data points, and initialize W with the K-means algorithm. L-BFGS-B is a parameter-free state-of-the-art batch method, which is not as fast as SGD but much easier to use. We always run the L-BFGS-B algorithm for 4 000 iterations, which seems to ensure convergence to a stationary point. Memory Networks