Robust Dimension Reduction Algorithms to Analyze High-Dimensional Genetic Data
Our research utilizes a new variant of random sample consensus (RANSAC) to perform robust subspace recovery in the presence of outliers. This variant of RANSAC was applied to a dataset (D=197,466) of genotype data collected from European descendants (N=1385). In comparison to principal component analysis, which solves the least squares problem, our RANSAC variant identifies a theoretical estimator which maximizes the points in a d-subspace over the Grassmannian manifold. The RANSAC variant was applied to the complete dataset, and also to reduced versions of the dataset, but did not yield consistent results. While this research could not demonstrate our RANSAC variant’s ability to perform robust subspace recovery on high-dimensional genetic data, several modifications have been proposed that may be able to do so in future research.