Summary of research
I’m a PhD student at the Centre for Doctoral Training in Computational Statistics and Data Science at the University of Bristol, supervised by Professor Patrick Rubin-Delanchy and Professor Nick Whiteley.
My research consists of providing a statistical grounding for manifold structure in high-dimensional data and to demonstrate that rich topological and geometric structure can emerge from generic and simple statistical assumptions involving correlations and latent variables. The aim of this work is to shed light on the efficacy of PCA for reduction from high to moderate dimensions before clustering, topological data analysis, nonlinear dimension reduction, regression and classification. This lead to work which gives insights to recover hidden tree structure in data via hierarchical clustering with dot products.
I have also created python package: pyemb. It contains tools to perform exploratory data anaylsis for complex data such as embedding, clustering and visualisation.
Papers
- Hierarchical clustering with dot products recovers hidden tree structure Annie Gray, Alexander Modell, Patrick Rubin-Delanchy, Nick Whiteley Advances in Neural Information Processing Systems, 36 (Spotlight - top 3.06%)
- Statistical exploration of the Manifold Hypothesis Nick Whiteley, Annie Gray, Patrick Rubin-Delanchy Under review
- Matrix factorisation and the interpretation of geodesic distance Nick Whiteley, Annie Gray, Patrick Rubin-Delanchy Advances in Neural Information Processing Systems, 34
Other projects
During a collaboration with Microsoft Research Special Projects, working in Human Rights Technology, I have been involved in developing and implementing techniques for detecting and understanding risk from relationships in large-scale datasets. One application for this work is to identify corruption in public procurement data using multiple data sources that describe relationships between companies.
I have also worked on a project with the Smith Institute, where I researched social network resilience to misinformation. My work here explored techniques that can be used to prevent misinformation spread, including education, algorithmic changes, red-teaming and regulation.
Talks
- Workshop on Statistical Network Analysis and Beyond (Best poster award). June 2024.
- Workshop on Functional Inference and Machine Intelligence (Poster) March 2024
- Advances in Neural Information Processing Systems, 36. (Poster) Dec 2023
- Compass Annual Conference (Speaker) Sep 2022/2023
- Heilbronn Institute Data Science Seminar (Speaker) June 2022
- Advances in Neural Information Processing Systems, 36. (Poster) Dec 2021