Por Juan Ramón González
El Instituto INAMAT ^2 organiza el Seminario de investigación titulado: Bioinformatic tools for Big Data analyses in R
Fecha: 23 de abril a las 16:30 h.
Presentación telemática mediante Zoom. Se enviará enlace a los inscritos. Inscripción obligatoria antes del 22 de abril a las 14:00 h.
One of the most important challenges in Big Data is to create efficient and scalable algorithms to deal with the heavy computational cost required in Big Data analyses. R is a leading programming language of data science, consisting of powerful packages to tackle problems related to the vast majority of research areas. However, most existing R packages are not properly implemented to deal with Big Data since the implemented functions: 1) run only on data that can fit into your computer’s memory; and 2) are based on algorithms that are not scalable. In order to overcome these limitations, we developed a library called BigDataStatMeth which includes functions to perform basic matrix operations and linear algebra for big matrices using HDF5 and DelayedArray Bioconductor’s infrastructures. These functions can run on datasets bigger than system memory and implement parallel algorithms. We tested our package’s performance by comparing the computational time with the one obtained with R base functions. Our results showed that our implementation outperforms existing functions and the improvement increases when sample size is also increasing. In conclusion, our package can be the basis for implementing statistical methods required in any research area where Big Data are generated. As a proof-of-concept, we implemented PCA and multivariate linear regression for Big Data. In this talk, we will review our methodology and show how the implemented methods can be used in genomic data analyses. In particular we will illustrate how to use PCA to call genotype inversions of more than 400K individuals from UKBiobank. BigDataStatMeth package is available at BRGE’s GitHub repository: https://github.com/isglobal-brge.
Jeronimo de Ayanz Building
Public University of Navarre
Campus de Arrosadia 31006 - Pamplona
Tel. +34 948 169512
Contacto por email