GeneQC Statistical Model General Idea Reads can be mapped to - PowerPoint PPT Presentation
GeneQC Statistical Model General Idea Reads can be mapped to multiple gene loci Leads to varying degrees of mapping uncertainty Potentially causes issues with inferences based on read counts Differentially expressed genes
GeneQC Statistical Model
General Idea • Reads can be mapped to multiple gene loci • Leads to varying degrees of mapping uncertainty • Potentially causes issues with inferences based on read counts • Differentially expressed genes • Co-expression patterns • Various network analyses
Options • Exclude ambiguous reads • Multiple assignment • Random assignment • Probabilistic assignment • Only considering local information
Co-expressed Genes • Co-expressed genes provided additional level of information • Global data for more solid statistical evaluation
Goal • Create statistically sound model for assignment of ambiguous reads • Use co-expression of genes • Develop method that produces p-value or probability score for each ambiguous read assignment • Provide a p- value signifying the confidence of each gene’s read count
Previous Publications • Faulkner, G.J., et al., A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics, 2008. 91 (3): p. 281-288. • Hashimoto, T ., et al., Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite . Bioinformatics, 2009. 25 (19): p. 2613-2614. • Wang, J., Huda, A., Lunyak, V. V., & Jordan, I. K., A Gibbs sampling strategy applied to the mapping of ambiguous short- sequence tags . Bioinformatics, 2010. 26 (20): p.2501-2508
Overall Direction • Assign all unambiguous reads • Use co-expression information of unambiguous reads to make first probabilistic assignment of ambiguous reads • Based on assignments, recalculate probabilities for ambiguous reads • Continue iterative procedure until no/minimal changes occur
Additional parameters • Similarity between a given read and each potential gene locus • Differences generally very minute • Co-expression rate between genes and co-expressed genes
Concerns & Limitations • Requires accurate co-expression information • Limited sample size of co-expression information could skew probability distribution • Potentially highly computationally intensive • Local optimization may occur • Does not currently consider dependence of read assignment
Our Future Plans • Collect test data to verify increased performance using statistical model • Run model with various validated probability assumptions • Normal, Poisson, etc. • Develop R package with statistical model implementation
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.