# Publications

A multivariate CSC with rank-1 constrain algorithm designed to study brain activity waveforms

Frequency-specific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (8-12 Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to state-of-the-art running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of non-sinusoidal mu-shaped patterns, along with their topographic maps related to the somatosensory cortex.

Understanding the Learned Iterative Soft Thresholding Algorithm with matrix factorization

Jun 2017,

Jun 2017,

*preprint Arxiv*
This paper aims to extend the results from our previous paper studying the mechanisms of LISTA and gives an acceleration certificate for generic dictionaries.

Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). These methods are optimal in the class of first-order methods for non-smooth, convex functions. However, they do not exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks, coined LISTA, was proposed in Gregor and Le Cun (2010), which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

We consider the problem of building shift-invariant representations for long signals in the context of distributed processing. We propose an asynchronous algorithm based on coordinate descent called DICOD.

We consider the problem of building shift-invariant representations for long signals in the context of distributed processing. We propose an asynchronous algorithm based on coordinate descent called DICOD to efficiently solve the l1-minimization problems involved in convolutional sparse coding. This algorithm leverages the weak temporal dependency of the convolution to reduce the interprocess communication to a few local messages. We prove that this algorithm converges to the optimal solution and that it scales with superlinear speedup, up to a certain limit. These properties are illustrated with numerical experiments and our algorithm is compared to the state-of-the-art methods used for convolutional sparse coding

Additional training step for deep networks to optimize the use of the features learnerd during the classical training. A link with existing kernel methods is then discussed.

One of the main challenges of deep learning methods is the choice of an appropriate training strategy. In particular, additional steps, such as unsupervised pre-training, have been shown to greatly improve the performances of deep structures. In this article, we propose an extra training step, called post-training, which only optimizes the last layer of the network. We show that this procedure can be analyzed in the context of kernel theory, with the first layers computing an embedding of the data and the last layer a statistical model to solve the task based on this embedding. This step makes sure that the embedding, or representation, of the data is used in the best possible way for the considered task. This idea is then tested on multiple architectures with various data sets, showing that it consistently provides a boost in performance.

Understanding Trainable Sparse Coding with Matrix Factorization

Sep 2016, In proceedings of

Sep 2016, In proceedings of

*International Conference on Learning Representations (ICLR)*
In this paper we study the mechanisms behind LISTA.

Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, that are optimal in the class of first-order methods for non-smooth, convex functions, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). However, these methods don't exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks was proposed in Gregor10, coined LISTA, which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

Distributed Convolutional Sparse Coding via Message Passing Interface

Dec 2015, In

Dec 2015, In

*NIPS Workshop Nonparametric Methods for Large Scale Representation Learning*
Asynchronous algorithm to solve the convolutional sparse coding. This algorithm can be implemented using the MPI framework.

We consider the problem of building shift-invariant representations of signals from sensors with a large frequency of acquisition. We propose a distributed algorithm for convolutional sparse coding called DICOD that is based on coordinate descent and scales up with a speed up that is quadratic with respect to the number of processing units. Indeed, our implementation avoids sharing variables between cores and does not require any lock or synchronization at every step. We present theoretical results and empirical evidence of convergence of DICOD, and also provide numerical comparisons with respect to widely used algorithms for convolutional sparse coding.

Groupement automatique pour l’analyse du spectre singulier

Sep 2015, In proceedings of

Sep 2015, In proceedings of

*the Groupe de Recherche et d'Etudes en Traitement du Signal et des Images (GRETSI)*
This paper introduces several automatic grouping strategies for Singular Spectrum Analysis (SSA) components in a unified framework.

This paper introduces several automatic grouping strategies for Singular Spectrum Analysis (SSA) components. This step is useful to retrieve meaningful insight about the temporal dynamics of the series. A unifying framework is proposed to evaluate and compare the efficiency of different original methods compared to the existing ones.

Détection de pas à partir de données d'accélérométrie

Sep 2015, In proceedings of

Sep 2015, In proceedings of

*the Groupe de Recherche et d'Etudes en Traitement du Signal et des Images (GRETSI)*
This article presents a method for step detection from accelerometer signals based on template matching.

This article presents a method for step detection from accelerometer signals based on template matching. This method uses a library of step templates extracted from real data in order to not only count the steps but also to retrieve the start and end times of each step. The algorithm is tested on a large database of 300 recordings, composed of healthy patients and patients with various orthopaedic troubles. Simulations show that even with only 20 templates, our method achieves remarkable results with a 97% recall and a 96% precision, is robust and adapts well to pathological subjects.