# Publications

Super-efficiency of automatic differentiation for functions defined as a minimum

Feb 2020,

Feb 2020,

*preprint /*
We study the different techniques to differentiate a function defined as a min of an other.

In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of estimating the gradient of the function: using either an analytic formula obtained by assuming exactness of the approximation, or automatic differentiation through the algorithm. In this paper, we study the asymptotic error made by these estimators as a function of the optimization error. We find that the error of the automatic estimator is close to the square of the error of the analytic estimator, reflecting a super-efficiency phenomenon. The convergence of the automatic estimator greatly depends on the convergence of the Jacobian of the algorithm. We analyze it for gradient descent and stochastic gradient descent and derive convergence rates for the estimators in these cases. Our analysis is backed by numerical experiments on toy problems and on Wasserstein barycenter computation. Finally, we discuss the computational complexity of these estimators and give practical guidelines to chose between them.

Learning step sizes for unfolded sparse coding

Dec 2019, In proceedings of

Dec 2019, In proceedings of

*Advances in Neural Information Processing Sytems (NeurIPS)*
This paper presents a theoretical study on how LISTA can learn to accelerate computation compared to ISTA based on larger step sizes adapted to the sparsity distribution of the solution estimate. this mechanism is the only one which ensure assymptotic convergence to the Lasso estimator.

Sparse coding is typically solved by iterative optimization techniques, such as the Iterative Shrinkage-Thresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. In this paper, we study the selection of adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leveraging the sparsity of the iterates. However, it is impractical in most large-scale applications. Therefore, we propose a network architecture where only the step sizes of ISTA are learned. We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes. Experiments show that our method is competitive with state-of-the-art networks when the solutions are sparse enough.

A Data Set for the Study of Human Locomotion with Inertial Measurements Units

Nov 2019,

Nov 2019,

*Image Processing On Line*
A data set of 1020 multivariate gait signals collected with two inertial measurement units, from 230 subjects undergoing a fixed protocol.

This article thoroughly describes a data set of 1020 multivariate gait signals collected with two inertial measurement units, from 230 subjects undergoing a fixed protocol: standing still, walking 10 m, turning around, walking back and stopping. In total, 8.5~h of gait time series are distributed. The measured population was composed of healthy subjects as well as patients with neurological or orthopedic disorders. An outstanding feature of this data set is the amount of signal metadata that are provided. In particular, the start and end time stamps of more than 40,000 footsteps are available, as well as a number of contextual information about each trial. This exact data set was used in [Oudre et al., Template-based step detection with inertial measurement units, Sensors 18, 2018] to design and evaluate a step detection procedure.

Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals

Nov 2019,

Nov 2019,

*preprint /*
DiCoDiLe: a distributed and asynchronous algorithm, employing locally greedy coordinate descent and an asynchronous locking mechanism that does not require a central server.

Convolutional dictionary learning (CDL) estimates shift invariant basis adapted to multidimensional data. CDL has proven useful for image denoising or inpainting, as well as for pattern discovery on multivariate signals. As estimated patterns can be positioned anywhere in signals or images, optimization techniques face the difficulty of working in extremely high dimensions with millions of pixels or time samples, contrarily to standard patch-based dictionary learning. To address this optimization problem, this work proposes a distributed and asynchronous algorithm, employing locally greedy coordinate descent and an asynchronous locking mechanism that does not require a central server. This algorithm can be used to distribute the computation on a number of workers which scales linearly with the encoded signal's size. Experiments confirm the scaling properties which allows us to learn patterns on large scales images from the Hubble Space Telescope.

Sparsity-based blind deconvolution of neural activation signal in fMRI

May 2019, In proceedings of

May 2019, In proceedings of

*IEEE International Conference on Acoustic Speech and Signal Processing*
In this work, we formulate the joint estimation of the HRF and neural activation signal as a semi blind deconvolution problem.

The estimation of the hemodynamic response function (HRF) in functional magnetic resonance imaging (fMRI) is critical to deconvolve a time-resolved neural activity and get insights on the underlying cognitive processes. Existing methods pro-pose to estimate the HRF using the experimental paradigm(EP) in task fMRI as a surrogate of neural activity. These approaches induce a bias as they do not account for latencies in the cognitive responses compared to EP and cannot be applied to resting-state data as no EP is available. In this work, we formulate the joint estimation of the HRF and neural activation signal as a semi blind deconvolution problem. Its solution can be approximated using an efficient alternate minimization algorithm. The proposed approach is applied to task fMRI data for validation purpose and compared to a state-of-the-art HRF estimation technique. Numerical experiments suggest that our approach is competitive with others while not requiring EP information.

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals

Dec 2018, In proceedings of

Dec 2018, In proceedings of

*Advances in Neural Information Processing System (NIPS)*
A multivariate CSC with rank-1 constrain algorithm designed to study brain activity waveforms

Frequency-specific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (8-12 Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to state-of-the-art running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of non-sinusoidal mu-shaped patterns, along with their topographic maps related to the somatosensory cortex.

Step detection in inertial recordings using template based detection.

This article presents a method for step detection from accelerometer and gyrometer signals recorded with Inertial Measurement Units (IMUs). The principle of our step detection algorithm is to recognize the start and end times of the steps in the signal thanks to a predefined library of templates. The algorithm is tested on a database of 1020 recordings, composed of healthy subjects and patients with various neurological or orthopedic troubles. Simulations on more than 40,000 steps show that the template-based method achieves remarkable results with a 98% recall and a 98% precision. The method adapts well to pathological subjects and can be used in a medical context for robust step estimation and gait characterization.

DICOD: Distributed Convolutional Coordinate Descent for Convolutional Sparse Coding

Jul 2018, In proceedings of

Jul 2018, In proceedings of

*International Conference on Machine Learning (ICML)*
In this paper, we introduce DICOD, a distributed convolutional sparse coding algorithm to build shift invariant representations for long signals.

In this paper, we introduce DICOD, a distributed convolutional sparse coding algorithm to build shift invariant representations for long signals. This algorithm is designed to run in a distributed setting, with local message passing, making it communication efficient. It is based on coordinate descent and uses locally greedy updates which accelerate the resolution compared to greedy coordinate selection. We prove the convergence of this algorithm and highlight its computational speed-up which is super-linear in the number of cores used. We also provide empirical evidence for the acceleration properties of our algorithm compared to state-of-the-art methods.

Convolutional Sparse Representations -- application to physiological signals and interpretability for Deep Learning

Dec 2017,

Dec 2017,

*/*
Convolutional representations extract recurrent patterns which lead to the discovery of local structures in a set of signals. In this dissertation, we describe recent advances on both computational and theoretical aspects of these models.

Convolutional representations extract recurrent patterns which lead to the discovery of local structures in a set of signals. They are well suited to analyze physiological signals which requires interpretable representations in order to understand the relevant information. Moreover, these representations can be linked to deep learning models, as a way to bring interpretability in their internal representations. In this dissertation, we describe recent advances on both computational and theoretical aspects of these models. First, we show that the Singular Spectrum Analysis can be used to compute convolutional representations. This representation is dense and we describe an automatized procedure to improve its interpretability. Also, we propose an asynchronous algorithm, called DICOD, based on greedy coordinate descent, to solve convolutional sparse coding for long signals. Our algorithm has super-linear acceleration.In a second part, we focus on the link between representations and neural networks. An extra training step for deep learning, called post-training, is introduced to boost the performances of the trained network by making sure the last layer is optimal. Then, we study the mechanisms which allow to accelerate sparse coding algorithms with neural networks. We show that it is linked to a factorization of the Gram matrix of the dictionary.Finally, we illustrate the relevance of convolutional representations for physiological signals. Convolutional dictionary learning is used to summarize human walk signals and Singular Spectrum Analysis is used to remove the gaze movement in young infant’s oculometric recordings.

Understanding the Learned Iterative Soft Thresholding Algorithm with matrix factorization

Jun 2017,

Jun 2017,

*preprint Arxiv*
This paper aims to extend the results from our previous paper studying the mechanisms of LISTA and gives an acceleration certificate for generic dictionaries.

Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). These methods are optimal in the class of first-order methods for non-smooth, convex functions. However, they do not exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks, coined LISTA, was proposed in Gregor and Le Cun (2010), which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

Understanding Trainable Sparse Coding with Matrix Factorization

Apr 2017, In proceedings of

Apr 2017, In proceedings of

*International Conference on Learning Representations (ICLR)*
In this paper we study the mechanisms behind LISTA.

Sparse coding is a core building block in many data analysis and machine learning pipelines. Typically it is solved by relying on generic optimization techniques, that are optimal in the class of first-order methods for non-smooth, convex functions, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA). However, these methods don't exploit the particular structure of the problem at hand nor the input data distribution. An acceleration using neural networks was proposed in Gregor10, coined LISTA, which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

In this paper we study the reasons for such acceleration. Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the ℓ1 ball. When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version. Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments. We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.

Additional training step for deep networks to optimize the use of the features learnerd during the classical training. A link with existing kernel methods is then discussed.

One of the main challenges of deep learning methods is the choice of an appropriate training strategy. In particular, additional steps, such as unsupervised pre-training, have been shown to greatly improve the performances of deep structures. In this article, we propose an extra training step, called post-training, which only optimizes the last layer of the network. We show that this procedure can be analyzed in the context of kernel theory, with the first layers computing an embedding of the data and the last layer a statistical model to solve the task based on this embedding. This step makes sure that the embedding, or representation, of the data is used in the best possible way for the considered task. This idea is then tested on multiple architectures with various data sets, showing that it consistently provides a boost in performance.

Distributed Convolutional Sparse Coding via Message Passing Interface

Dec 2015, In

Dec 2015, In

*NIPS Workshop Nonparametric Methods for Large Scale Representation Learning*
Asynchronous algorithm to solve the convolutional sparse coding. This algorithm can be implemented using the MPI framework.

We consider the problem of building shift-invariant representations of signals from sensors with a large frequency of acquisition. We propose a distributed algorithm for convolutional sparse coding called DICOD that is based on coordinate descent and scales up with a speed up that is quadratic with respect to the number of processing units. Indeed, our implementation avoids sharing variables between cores and does not require any lock or synchronization at every step. We present theoretical results and empirical evidence of convergence of DICOD, and also provide numerical comparisons with respect to widely used algorithms for convolutional sparse coding.

Groupement automatique pour l’analyse du spectre singulier

Sep 2015, In proceedings of

Sep 2015, In proceedings of

*the Groupe de Recherche et d'Etudes en Traitement du Signal et des Images (GRETSI)*
This paper introduces several automatic grouping strategies for Singular Spectrum Analysis (SSA) components in a unified framework.

This paper introduces several automatic grouping strategies for Singular Spectrum Analysis (SSA) components. This step is useful to retrieve meaningful insight about the temporal dynamics of the series. A unifying framework is proposed to evaluate and compare the efficiency of different original methods compared to the existing ones.

Détection de pas à partir de données d'accélérométrie

Sep 2015, In proceedings of

Sep 2015, In proceedings of

*the Groupe de Recherche et d'Etudes en Traitement du Signal et des Images (GRETSI)*
This article presents a method for step detection from accelerometer signals based on template matching.

This article presents a method for step detection from accelerometer signals based on template matching. This method uses a library of step templates extracted from real data in order to not only count the steps but also to retrieve the start and end times of each step. The algorithm is tested on a large database of 300 recordings, composed of healthy patients and patients with various orthopaedic troubles. Simulations show that even with only 20 templates, our method achieves remarkable results with a 97% recall and a 96% precision, is robust and adapts well to pathological subjects.