Accessible & Inclusive Data sciencE

Priyank Chandra, Keiran Campbell (Lunefield)



Analysis of high-dimensional data plays a key role in data-driven insights in a wide range of domains including natural sciences, engineering, and business. However, correct analysis often requires experience in specialized programming languages. This presents a significant barrier for users without the required training, including those with accessibility needs. Furthermore, the complex nature of such analysis makes even experienced user workflows error-prone, leading to a significant time and resource sink.


We introduce the Accessible & Inclusive Data sciencE (AIDE) project that will pair machine learning with user experience research and inclusive design to significantly improve data analysis accessibility. We focus on noisy, high-dimensional biomedical data that requires advanced analytic methods to extract biological insight. We build on GPT language models to automatically generate analysis code given natural language commands. While initially focusing on biomedical data, there is extreme potential to expand to other analysis application domains that involve of high-dimensional noisy data. The broader impact of this project is to democratize data analysis by making it available to a broad range of users.