# Presentations

Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate the Lasso resolution. However, the reason why learning the weights of such a network would accelerate sparse coding are not clear. In this talk, we look at this problem from the point of view of selecting adapted step sizes for ISTA.

Sparse coding is typically solved by iterative optimization techniques, such as the Iterative Shrinkage-Thresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. However, the reason why learning the weights of such a network would accelerate sparse coding are not clear. In this talk, we look at this problem from the point of view of selecting adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most large-scale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate if the learned algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that learning step sizes can effectively accelerate the convergence when the solutions are sparse enough.

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

26 Feb 2020, At

26 Feb 2020, At

*GT PASADENA*
Convolutional dictionary learning algorithm adapted to capture waveforms from MEG signals.

Frequency-specific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (8--12\, Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to state-of-the-art running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of non-sinusoidal mu-shaped patterns, along with their topographic maps related to the somatosensory cortex.

Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate the Lasso resolution. However, the reason why learning the weights of such a network would accelerate sparse coding are not clear. In this talk, we look at this problem from the point of view of selecting adapted step sizes for ISTA.

Sparse coding is typically solved by iterative optimization techniques, such as the Iterative Shrinkage-Thresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. However, the reason why learning the weights of such a network would accelerate sparse coding are not clear. In this talk, we look at this problem from the point of view of selecting adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most large-scale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate if the learned algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that learning step sizes can effectively accelerate the convergence when the solutions are sparse enough.

In this talk, we show that adaptive analytic step sizes can be used to improved ISTA. This steps can also be learned using a neural network architecture simlar to LISTA, which is theoretically the only way to learn to accelerate ISTA asymptotically.

Sparse coding is typically solved by iterative optimization techniques, such as the Iterative Shrinkage-Thresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. In this paper, we study the selection of adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most large-scale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that our method is competitive with state-of-the-art networks when the solutions are sparse enough.

Best Practices & Pitfalls in Applying Machine Learning to Magnetic Resonance Imaging

11 May 2019, At

11 May 2019, At

*Invited talk @ ISMRM - Montreal, Canada*
In this talk, I cover the concept of generalization for supervised learning, with a focus on model selection and the importance of sample size.

In this talk, I cover the concept of generalization for supervised learning, with a focus on model selection and the importance of sample size.

Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals

16 Apr 2019, At

16 Apr 2019, At

*Parietal seminar - INRIA, Saclay*
DiCoDiLe: a distributed and asynchronous algorithm, employing locally greedy coordinate descent and an asynchronous locking mechanism that does not require a central server.

Convolutional dictionary learning (CDL) estimates shift invariant basis adapted to multidimensional data. CDL has proven useful for image denoising or inpainting, as well as for pattern discovery on multivariate signals. As estimated patterns can be positioned anywhere in signals or images, optimization techniques face the difficulty of working in extremely high dimensions with millions of pixels or time samples, contrarily to standard patch-based dictionary learning. To address this optimization problem, this work proposes a distributed and asynchronous algorithm, employing locally greedy coordinate descent and an asynchronous locking mechanism that does not require a central server. This algorithm can be used to distribute the computation on a number of workers which scales linearly with the encoded signal's size. Experiments confirm the scaling properties which allows us to learn patterns on large scales images from the Hubble Space Telescope.

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

28 Mar 2019, At

28 Mar 2019, At

*Séminaire SMILE - Paris*
Convolutional dictionary learning algorithm adapted to capture waveforms from MEG signals.

Frequency-specific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (8--12\, Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to state-of-the-art running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of non-sinusoidal mu-shaped patterns, along with their topographic maps related to the somatosensory cortex.

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

19 Feb 2019, At

19 Feb 2019, At

*MLMDA seminar - CMLA, ENS Paris-Saclay*
Rank-1 constrined Convolutional Dictionary Learning algorithm for neural time series and distributed algorithm for CDL (DiCoDiLe)

Frequency-specific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (8--12\,Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to state-of-the-art running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of non-sinusoidal mu-shaped patterns, along with their topographic maps related to the somatosensory cortex.

In a second part, I will present my recent work on a distributed algorithm for CDL, called DiCoDiLe.

In a second part, I will present my recent work on a distributed algorithm for CDL, called DiCoDiLe.

Using the Dictionary Structure for efficient Convolutional Dictionary Learning

31 Jan 2019, At

31 Jan 2019, At

*Seminaire du LTCI - Télécom paritech*
Convolutional Dictionary Learning (CDL) is a popular tool in image processing for denoising or inpainting. This technique extends dictionary learning to learn adapted basis that are shift invariant. It can also be used in the context of large multivariate signals to learn and localize recurring patterns. In this talk, I will focus on how the structure of the dictionary can be leveraged to design efficient CDL algorithm and to improve the interpretability of the recovered patterns in MEG signals.

Convolutional Dictionary Learning (CDL) is a popular tool in image processing for denoising or inpainting. This technique extends dictionary learning to learn adapted basis that are shift invariant. It can also be used in the context of large multivariate signals to learn and localize recurring patterns. In this talk, I will focus on how the structure of the dictionary can be leveraged to design efficient CDL algorithm and to improve the interpretability of the recovered patterns in MEG signals.

Gregor and Le Cun (2010) have shown empirically it is possible to estimate the solution of the sparse coding with a neural network called LISTA trained with back-propagation. In the first part of this talk, I will show the link between the performance of this network and the particular structure of the dictionary and in particular the Gram matrix of the problem. This structure is shown to be sufficient to explain the performance of LISTA and numerical experiments show it is also necessary. (Joint work with J. Bruna).

Then, I will focus on the convolutional sparse coding (CSC). The particular properties of the band-circulant matrices allow to derive an efficient algorithm, called Locally Greedy Coordinate Descent (LGCD), to solve the CSC. This algorithm is very efficient in this context as it prioritizes important updates, still managing low iteration complexity. This algorithm can also be used in an asynchronous and distributed setting. We show that even though the updates are only performed locally, without global communication, this distributed algorithm converges to the optimal solution of the CSC. Numerical experiments show that this algorithm has better performance than state-of-the-art algorithms for CSC and that it accelerates super linearly with the number of cores used. (Joint work with L. Oudre, N. Vayatis and A. Gramfort).

In the final part of my talk, I will introduce a novel rank 1 constraint for the learned atoms for practical problems in neuroscience. This constraint, inspired from the underlying physical model for neurological signals, is then used to highlight relevant structure in MEG signals. (Join work with T. Dupré la Tour, M. Jas and A. Gramfort).

Gregor and Le Cun (2010) have shown empirically it is possible to estimate the solution of the sparse coding with a neural network called LISTA trained with back-propagation. In the first part of this talk, I will show the link between the performance of this network and the particular structure of the dictionary and in particular the Gram matrix of the problem. This structure is shown to be sufficient to explain the performance of LISTA and numerical experiments show it is also necessary. (Joint work with J. Bruna).

Then, I will focus on the convolutional sparse coding (CSC). The particular properties of the band-circulant matrices allow to derive an efficient algorithm, called Locally Greedy Coordinate Descent (LGCD), to solve the CSC. This algorithm is very efficient in this context as it prioritizes important updates, still managing low iteration complexity. This algorithm can also be used in an asynchronous and distributed setting. We show that even though the updates are only performed locally, without global communication, this distributed algorithm converges to the optimal solution of the CSC. Numerical experiments show that this algorithm has better performance than state-of-the-art algorithms for CSC and that it accelerates super linearly with the number of cores used. (Joint work with L. Oudre, N. Vayatis and A. Gramfort).

In the final part of my talk, I will introduce a novel rank 1 constraint for the learned atoms for practical problems in neuroscience. This constraint, inspired from the underlying physical model for neurological signals, is then used to highlight relevant structure in MEG signals. (Join work with T. Dupré la Tour, M. Jas and A. Gramfort).

Learning Recurring Patterns in Large Signals with Convolutional Dictionary Learning

24 Jan 2019, At

24 Jan 2019, At

*Seminar day - CosmoStat, CEA, Orsay*
How convolutional dictionary learning can be used in the context of large multivariate signals to learn and localize recurring patterns: modelisation and computational aspects.

Convolutional dictionary learning has become a popular tool in image processing for denoising or inpainting. This technique extends dictionary learning to learn adapted basis that are shift invariant. This talk will discuss how this technique can also be used in the context of large multivariate signals to learn and localize recurring patterns. I will discuss both computational aspects, with efficient iterative and distributed convolutional sparse coding algorithms, as well as a novel rank 1 constraint for the learned atoms. This constraint, inspired from the underlying physical model for neurological signals, is then used to highlight relevant structure in MEG signals.

DICOD: Distributed Convolutional Coordinate Descent for Convolutional Sparse Coding

12 Jul 2018, At

12 Jul 2018, At

*ICML - Stockholm, Sweden*
In this talk, we introduce DICOD, a distributed convolutional sparse coding algorithm to build shift invariant representations for long signals

In this talk, we introduce DICOD, a distributed convolutional sparse coding algorithm to build shift invariant representations for long signals. This algorithm is designed to run in a distributed setting, with local message passing, making it communication efficient. It is based on coordinate descent and uses locally greedy updates which accelerate the resolution compared to greedy coordinate selection. We prove the convergence of this algorithm and highlight its computational speed-up which is super-linear in the number of cores used. We also provide empirical evidence for the acceleration properties of our algorithm compared to state-of-the-art methods.

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

29 May 2018, At

29 May 2018, At

*Parietal seminar - INRIA, Saclay*
A multivariate CSC with rank-1 constrain algorithm designed to study brain activity waveforms

Frequency-specific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (8-12 Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to state-of-the-art running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of non-sinusoidal mu-shaped patterns, along with their topographic maps related to the somatosensory cortex.

Convolutional Sparse Representations -- application to physiological signals and interpretability for Deep Learning

Convolutional representations extract recurrent patterns which lead to the discovery of local structures in a set of signals. They are well suited to analyze physiological signals which requires interpretable representations in order to understand the relevant information. Moreover, these representations can be linked to deep learning models, as a way to bring interpretability in their internal representations. In this dissertation, we describe recent advances on both computational and theoretical aspects of these models.

Our main contribution in the first part is an asynchronous algorithm, called DICOD, based on greedy coordinate descent, to solve convolutional sparse coding for long signals. Our algorithm has super-linear acceleration. We also explored the relationship of Singular Spectrum Analysis with convolutional representations, as an initialization step for convolutional dictionary learning.

In a second part, we focus on the link between representations and neural networks. Our main result is a study of the mechanisms which accelerate sparse coding algorithms with neural networks. We show that it is linked to a factorization of the Gram matrix of the dictionary. Other aspects of representations in neural networks are also investigated with an extra training step for deep learning, called post-training, to boost the performances of trained networks by improving their last layer's weights.

Finally, we illustrate the relevance of convolutional representations for physiological signals. Convolutional dictionary learning is used to summarize signals from human walking and Singular Spectrum Analysis is used to remove the gaze movement in young infant's oculometric recordings.

Our main contribution in the first part is an asynchronous algorithm, called DICOD, based on greedy coordinate descent, to solve convolutional sparse coding for long signals. Our algorithm has super-linear acceleration. We also explored the relationship of Singular Spectrum Analysis with convolutional representations, as an initialization step for convolutional dictionary learning.

In a second part, we focus on the link between representations and neural networks. Our main result is a study of the mechanisms which accelerate sparse coding algorithms with neural networks. We show that it is linked to a factorization of the Gram matrix of the dictionary. Other aspects of representations in neural networks are also investigated with an extra training step for deep learning, called post-training, to boost the performances of trained networks by improving their last layer's weights.

Finally, we illustrate the relevance of convolutional representations for physiological signals. Convolutional dictionary learning is used to summarize signals from human walking and Singular Spectrum Analysis is used to remove the gaze movement in young infant's oculometric recordings.

Acceleration strategies for sparse coding resolution, using the optmization structure.

Sparse coding is a core building block in many data analysis and machine learning pipelines. Finding good algorithms to accelerate the resolution of such problem is thus critical to many applications.

The first part of this talk is focused on recent acceleration techniques which estimate the sparse code with a train neural network such as LISTA. Empirical results have shown that they achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this talk, I will link the performance of these network to a factorization of the Gram matrix of the problem which preserves the l1 norm. This mechanism is shown to be sufficient to explain the performance of LISTA and numerical experiments show that it is also necessary. (Joint work with J. Bruna)

In a second part of the talk, I will focus on convolutional sparse coding, with band circulant matrices. The particular properties of these problems allow to derive an efficient distributed algorithm based on the greedy coordinate descent. It can be shown that this algorithm converges in an asynchronous setting, is communication efficient and has a super-linear speed-up. These different properties are then illustrated with numerical experiments. (Joint work with N. Vayatis and L. Oudre)

The first part of this talk is focused on recent acceleration techniques which estimate the sparse code with a train neural network such as LISTA. Empirical results have shown that they achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately. In this talk, I will link the performance of these network to a factorization of the Gram matrix of the problem which preserves the l1 norm. This mechanism is shown to be sufficient to explain the performance of LISTA and numerical experiments show that it is also necessary. (Joint work with J. Bruna)

In a second part of the talk, I will focus on convolutional sparse coding, with band circulant matrices. The particular properties of these problems allow to derive an efficient distributed algorithm based on the greedy coordinate descent. It can be shown that this algorithm converges in an asynchronous setting, is communication efficient and has a super-linear speed-up. These different properties are then illustrated with numerical experiments. (Joint work with N. Vayatis and L. Oudre)

Understanding Trainable Sparse Coding with Matrix Factorization

06 Nov 2017, At

06 Nov 2017, At

*Tech talk - Google, Zurich*
In this talk, we provide elements explaining why Learned ISTA is able to accelerate the LASSO resolution. This vision of optimization algorithms in the neural network framework could be used to link sparse representations and neural networks.

Optimization algorithms for sparse coding can be viewed in the light of the neural network framework. Using this framework, it is possible to design trainable networks which accelerate the resolution of an optimization problem on a given distribution, as it has been shown with the Learned ISTA network, proposed by Gregor & Le Cun (2010).

In this talk, we provide elements explaining why the acceleration is possible in the case of ISTA. We show that the resolution of sparse coding can be accelerated compared to ISTA when the design matrix admits a quasi-diagonal factorization with sparse eigenspaces. The resulting algorithm has the same convergence rate but an improved constant factor. Then we show under which conditions such factorization is possible with high probability for generic Gaussian dictionaries. Finally, we design neural networks which compute this algorithm and show that they are a re-parametrization of LISTA. Thus, the performance of LISTA are at least as good as this algorithm. We conclude by designing adverse examples for our factorization based algorithm and show that LISTA also fails to accelerate on these cases, proving that this mechanism plays a role in LISTA acceleration.

In this talk, we provide elements explaining why the acceleration is possible in the case of ISTA. We show that the resolution of sparse coding can be accelerated compared to ISTA when the design matrix admits a quasi-diagonal factorization with sparse eigenspaces. The resulting algorithm has the same convergence rate but an improved constant factor. Then we show under which conditions such factorization is possible with high probability for generic Gaussian dictionaries. Finally, we design neural networks which compute this algorithm and show that they are a re-parametrization of LISTA. Thus, the performance of LISTA are at least as good as this algorithm. We conclude by designing adverse examples for our factorization based algorithm and show that LISTA also fails to accelerate on these cases, proving that this mechanism plays a role in LISTA acceleration.

Understanding physiological signals via sparse representations

09 Oct 2017, At

09 Oct 2017, At

*Journée de rentré EDMH - IHES, Orsay*
General presentation of time series representations and of the convolutional representation model.

General presentation of time series representations and of the convolutional representation model.

The concurrent.futures module offers an easy to use API to parallelize code execution in python, using threading or multiprocessing primitives. We will begin our talk by presenting this API and the differences between Thread and Process.

The concurrent.futures module offers an easy to use API to parallelize code execution in python, using threading or multiprocessing primitives. We will begin our talk by presenting this API and the differences between Thread and Process.

For Process backed executions, useful for pure python parallelization, several issues can reduce the performance. Spawning new workers for each execution can create a large overhead, but maintaining the pool of worker across the program can become quickly bothersome. We will describe several of the pitfall that can make using concurrent.futures unstable.

Finally, we will introduce loky, a library providing a robust, reusable pool of workers, handled internally. It uses a customized implementation of ProcessPoolExecutor from concurrent.futures. We will describe its main features and the major technical design choice that helped making it more robust.

For Process backed executions, useful for pure python parallelization, several issues can reduce the performance. Spawning new workers for each execution can create a large overhead, but maintaining the pool of worker across the program can become quickly bothersome. We will describe several of the pitfall that can make using concurrent.futures unstable.

Finally, we will introduce loky, a library providing a robust, reusable pool of workers, handled internally. It uses a customized implementation of ProcessPoolExecutor from concurrent.futures. We will describe its main features and the major technical design choice that helped making it more robust.