Navigation
Receive our informations
Bioinformatics platform
Director: Emmanuel Barillot
Deputy director : Philippe Hupé
The Bioinformatics platform plays a two-fold role. On the one hand, we integrate the data generated by the Institut Curie's Biotechnology platforms: genome, transcriptome or proteome array platforms, mass spectrometry proteomics platform, large-scale sequencing platform and cell phenotyping platform. For this, it develops and manages the databases, tools and interfaces required for data integration. On the other hand, we provide collaborative bioinformatics and biostatistics data analysis support for the projects of our biologist or clinician colleagues.
The Institut Curie Bioinformatics platform is located on the Paris campus, in the Developmental Biology and Cancer building. Our work is based on a significant IT infrastructure, managed by the Institut Curie's Systems team (Jean-Gabriel Dick and Camille Barette). It comprises a 50 Terabyte SAN storage system, Sun opteron 8-processor servers (2 based on dual core processors with 32 Gb RAM and 2 based on quad core processors with 256 Gb RAM), along with workstations with dual quad core processors with 16 Gb RAM, i.e. a computing power of 400 logical processors.
BioIT : développement et maintenance des bases de données (Philippe La Rosa)
The multiple high throughput molecular approaches generate unprecedented data flows that must be structured and for which a unified view must be provided through an integration bioinformatics platform. This is the mission of the BioIT structure, in charge of the development, maintenance, administration, management and upgrading of the databases, processing pipelines and interfaces making up the platform.
The integration concerns both the clinical and biological data generated by the Institute and the vast amounts of related data, publicly available within the scientific community.
Our browsing and viewing tools provide us with a global overview of data collected, thus facilitating the formulation of working hypotheses, a critical step in the transition from data collection to knowledge discovery. This is based on software solutions available within the scientific community or on tools developed by the BioIT group when necessary.
We also develop automated data processing pipelines. The systematic nature of this approach facilitates traceability, guarantees homogeneous results and allows analyses to be rapidly repeated. The BioIT is in charge of the development of these processing pipelines.
Biostatistics and data analysis (Philippe Hupé)
This second line of work consists in providing our bioinformatics and biostatistics expertise in the context of collaborations with our biologist and clinician colleagues from the Institute or elsewhere. Indeed, high throughput data analysis must be based both on the command of cutting edge statistics and bioinformatics tools and concepts, and on a thorough understanding of the biological and clinical questions to resolve.
The analysis is conducted by request from our colleagues, in close collaboration with these latter and must start with the definition of the experimental plan. Once the data have been generated, the first step consists in quality control and extraction of the biological signal, frequently referred to as normalization. At this stage, it may be necessary to define ad hoc corrective models and experience maturity is established. Next comes an exploratory analysis phase, with no prior hypotheses, where the experiment's main message is sought, for example the concerned biological pathways. This stage may lead to the formulation of hypotheses, the identification of experimental perspectives, or the definition of new experiments. Following the exploratory phase, an analysis is initiated in view of answering the clinical or biological question posed, for example the comparison of two tumour types or the creation of methods capable of predicting the occurence of metastases.
Pheno-informatics (Alexandre Hamburger)
Many now standard technologies (micro-arrays, double-hybrid, MS-MS, etc.) have led to the generation of large volumes of data related to cell components (genes, proteins, RNA, etc.) and their interactions. More recently, major image analysis and robotics breakthroughs have provided us with the opportunity to observe the cell as a global entity, presenting a "phenotype", rather than a collection of individual elements.
Pheno-informatics concerns the acquisition, manipulation and analysis of such data: the behaviour of a cell, or population of cells, is quantified according to its type (cell line), miscellaneous disturbances and to the experimental context.
The resulting data may be used as an additional data source, enriching and complementing existing models, or as an autonomous source that could be used to significantly enhance our understanding of cell behaviour. Many applications can be imagined, both in the context of the development of biological knowledge and from a therapeutic standpoint.
In all cases, a new data type, radically different from the standards, requires the implementation of adapted analyses capable of making full use of these data and of intelligently managing its inherent complexity.
Large-scale sequencing data analysis (Emmanuel Barillot)
The new sequencing technologies (454, Solexa, SOLiD) provide the ability to sequence DNA at an unprecedented rate of up to 10 Gigabases per week. The Institut Curie recently acquired a SOLiD sequencer, now used for studies involving the sequencing of complete genomes, genetic mutations, transcripts (mRNA and small RNA), or the mapping of genomic rearrangements, protein-DNA binding sites, histone modifications, etc. For each experiment, this technology can produce in excess of 100 million sequences of 35 to 50 bases. It requires the use of new tools for managing the large volumes of data, along with adapted analytical strategies and methods. Within this line of work, we are collaborating with the SOLiD platform team and its biologists users in order to define projects, imagine bioinformatics and biostatistical solutions and to conduct the data analyses.



