Omics: Biotechnology and AI for health

Code 552EE
Credits 6

Learning outcomes

The student will learn the principles of high throughput sequencing and data analysis to understand the effect of human genetic variation in health and disease and model gene expression regulation in eukaryotic cells. The student will acquire theoretical and practical expertise in the transcriptome analysis at the gene, exon and alternative splicing level given a microarray or NGS dataset. In addition, the student will learn current applications of genomics data in relation to human health such as, for example, Genome Wide Association studies (GWAs), NGS for resequencing studies to discover rare variants and meta-genomic. The course will be divided into two parts. In the first part, the basic knowledge on omics techniques and an overview of web tools and databases will be provided. The common for DNA and RNA omic analysis will be analyzed such as microarray (SNPs arrays and expression arrays) and next generation sequencing platforms. The strategies for sequencing the whole transcriptome or sub-transcriptomes (actively transcribed RNA or actively translated RNA or epi-transcriptome) will be described. DNA/RNA extraction, quality check, library preparation (including single cell omics and low-input analysis) will be considered. Data cleaning and filtering, quality check, mapping against a reference genome (both for genomic and transcriptomic data), gene counting, algorithm for gene count normalization, modality for data plotting and principal component analysis will be discussed. In the second part of the course the focus will be on the analytic approaches used for omics data that will be described with a particular focus on advanced epidemiologic/statistics and Artificial Intelligence methodologies. The interpretation of -omics data will be understood by coupling data analysis with AI algorithmic approaches (e.g., gene set enrichment analysis, GSEA, KEGG, WikiPathways). The analysis of feature selection with machine learning algorithms will be performed to define the molecular signature of cells and tissues and for the identification of biomarkers. ChipSeq data will be considered for the study of the interaction between regulatory elements (eg, promoters and enhancers), DNA and proteins, topologically associated domains (TAD) and the epigenetic signature in order to analyze the modulation of gene expression through Genome Segmentation algorithms. The use of artificial intelligence algorithms will be also considered for the study of the protein interactome and protein-protein interactions. The students will also learn on how to use web tools to analyze genomic data and to relate the genetic variability with possible functions of the variants (RegulomeDB, The Genotype-Tissue Expression (GTEx) project, Haploreg, etc.)
All these data will be integrated by AI approaches to build a gene expression regulation model and to explore the possible interaction of genetic variability with environmental variables.