Research activities

The AROBAS team focuses on three areas: algorithmics and operation research, bioinformatics, and statistical learning. These themes are linked together by a common methodological approach.

Algorithmics and operations research

Research in algorithmics and operations research within the team focuses on the design of algorithms with guaranteed performance (approximation ratio, execution time) in the areas of optimization (mono or multi-criteria), algorithmic game theory and operations research.

From a fundamental point of view, we study models that measure and explain the performance of algorithms from a theoretical standpoint going beyond the analysis of the worst case. A representative example is the resource augmentation model used for the analysis of online algorithms. We also want to develop unified methods based on duality in mathematical programming for classes of problems in online algorithm theory and algorithmic game theory.

From an algorithmic point of view, our prefered problems are graph-related and scheduling problems (possibly multi-periodic), especially those for which some energy minimization is required. Other works pertain to alternative models of computation based on tessellations, the aim being to understand the mechanisms used by nature to assemble complex shapes (crystals and quasicrystals) or to design artificial tiles (DNA tiles) able to perform computations.

In the theme of operations research, we are interested in major issues from the social and economic sectors in the transport, production, and agri-food sectors. These include the design of intelligent transport networks with lane reservation mechanisms, consideration of energy consumption in the manufacturing industry using batch scheduling, and design and optimization of the supply chain in the agri-food sector. The scientific difficulties in solving these problems are numerous: the systems to be studied are diversified, very complex, and large; the problems are often interdisciplinary and multi-criteria; the systems also exhibit aspects of uncertainty.

Bioinformatics

Our work in bioinformatics is mostly related to the development of methods and computer tools for the prediction and analysis of non-coding RNAs (ncRNA). In recent years, ncRNAs have attracted great interest from the scientific community (in biology, bioinformatics and biomedical research), particularly because of their roles in many diseases.

Two major topics are addressed in relation to RNAs: (i) the prediction of RNA structures and their interactions with proteins or other RNAs, mainly using combinatorial optimization methods, and (ii) the identification of ncRNA in genomic sequences, in particular using machine learning.

For the first subject, multi-criteria combinatorial optimization methods based on mathematical programming are developed, as well as methods based on graph theory. The objective of our work is essentially to be able, on the one hand, to combine several models and criteria for the prediction of RNA structures and complexes and, on the other hand, to generate sub-optimal solutions to get closer to the actual structures.

For the second question, several original machine learning approaches are proposed that integrate different heterogeneous sources of data, depending on the particularity of the RNAs being sought and the biological question asked. We are particularly interested in selecting the best sources of data to use and the best possible combination of these sources.

One of our concerns is to design algorithms that can handle large amounts of data and scale up while providing predictions that are as reliable as possible. Another interest is to propose methods that produce adapted visualizations, give interpretations of the results, and allow interactions with the user.

Particular attention is given to the exploitation of the algorithms developed: they all give rise to software made available to the scientific community (web services or download) via the EvryRNA platform.

Statistical learning

Our machine learning work focuses now on deep learning. Our research themes are guided by applications, especially those related to health. We develop predictive models for diagnosis, prognosis, treatment response or medical image analysis, using electronic patient data (EHR), genomics, meta-genomics or medical imaging.

The health sector raises specific problems. The most important is the small number of learning examples available – because of the cost and difficulty of collecting data – while the examples present a large number of variables. One of our main research topics is learning in deep neural networks from small datasets with large dimensions. We use transfer learning to jointly learn several related tasks, semi-supervised learning in order to learn from untagged data a relevant representation of the data, domain transfer learning to transfer data or predictions from one data source to another, and variable selection methods to reduce the size of the data.

Another major challenge is the interpretation of neural networks and their predictions. There is an urgent need to make neural networks interpretable, especially in the medical field for two reasons. First, it is important to ensure that the neural network bases its predictions on a reliable representation of patients and does not focus on irrelevant artifacts present in the dataset used for learning. Without prediction explanation, doctors cannot trust a neural network, regardless of its performance. Second, we must be able to produce for the biologists a biological interpretation of the neural network and its predictions in order to detect when a neural network which is efficient in predicting a certain phenotype has identified a signature in the biological data that could be a line of research. To address this concern, we propose methods of perturbation and back-propagation of the output signal of the neural network that we cross-reference with the biological and medical databases. We are also interested in the extraction of rules from the neural network.

We collaborate on this research with various partners from academic research laboratories (Paris-Dauphine University, IRD, Inserm, LIMICS), companies (Dental Monitoring, SystemX, Visiomed) and hospitals (Pitié-Salpêtrière, CHSF).

 

Last deposits

Chargement de la page

Full-text documents

200

Open Access documents %

48 %

Bibliographical references

398

Keywords

Risk Management Computational results Uncertainty Clustering Service level Optimisation Electricity cost Analyse en moyenne Model interpretation Logistics Programmation linéaire Network inference Non-preemptive scheduling Pseudoknot Optimal solutions Single machine Disassembly line balancing Cut-and-solve method Speed scaling Berth allocation Polynomial-time algorithms Closed-loop supply chain Lagrangean relaxation Supervised learning Integer linear programming Gene expression Gene regulatory networks Food quality Operations Research Distributionally robust Facility location Complexity Transportation planning Algorithmes Network Game theory Approximation algorithms Learning effect Lane reservation Stochastic optimization Automated truck Artificial intelligence Benders decomposition Kernel methods Lagrangian relaxation Mixed integer programming Vertex cover Bi-objective optimization Approximation Graph Bi-objective optimisation Combinatorial optimization Mixed integer linear programming Mathematical model Explainability Energy minimization Transportation Online algorithms Scheduling Machine learning Genetic algorithms Ripple effect Algorithms Gene regulatory network Bioinformatics Phenotype prediction Algorithmic game theory Algorithmes génétiques Makespan Genetic algorithm Argumentation Liveness Integer programming Linear programming Kernel search Interpretability Heuristic Tabu search Dynamic programming Batch scheduling Maillage Apprentissage automatique Operator-valued kernel Energy Approximation algorithm Choice-Free Boolean automata networks Stochastic programming Perishable food Branch and bound Mathematical programming Semi-supervised learning Ε-constraint method Bi-objective Gene Ontology Autoregressive models Inventory routing problem Optimization Heuristics Deep learning