Pursuing a master’s in Interdisciplinary Data Science — a field that combines domain knowledge, statistics and machine learning — he plans a career in the tech, financial tech or automotive industry.
This visualization displays satellite data of the 2020 Australian wildfires using a geographical point distribution. It was created with data from kaggle.com using Tableau’s Point Distribution Map tools.
Each point utilizes a temperature variable to display colors in a logical heat-map format. The visualization displays the location of the hotter regions and provides insight to the area being impacted. (In January 2020, this BBC story covered widespread social-media sharing of misleading or inaccurate data visuals related to these fires.)
This is a difference-in-difference plot developed with Python and the Altair Python package. Difference-in-difference plots are typically used as an experimentation tool to estimate causal-treatment effects.
In this case, the plot evaluates the rate of primary-school enrollment of women in rural Pakistan in both Taliban-controlled and non-Taliban-controlled regions before and after major Pakistani-Taliban attacks. The source of the data is the Pakistan Bureau of Statistics. A thorough exploration of this topic is available here.
This radar plot was used during the course of a team machine-learning project focused on an improved approach for classifying music genres. The data was obtained using the Spotify API (Application Programming Interface). We used Python and the Plotly Python package to develop the visualization.
The plot helped evaluate the distribution of songs by feature (loudness, key, energy, etc.). It displays the feature medians; the clusters represent the new groupings of songs — rather than Spotify’s original groupings — after we implemented a K-Means clustering machine-learning algorithm. A full report on the project can be read here.
This second image from that machine-learning project, a correlation matrix, was developed with Python scripting and the Seaborn Python package. Correlation matrices are used to evaluate the correlations between pairs of variables in a given data set. This is usually a step performed in what is called exploratory data analysis, which explores the data with visuals such as tables, plots and charts to get a solid understanding of what is present.
Values nearing +1 indicate the presence of a strong positive relation between X and Y, whereas those nearing -1 indicate a strong negative relation between X and Y.
Advanced Placement Computer Science and Honors Engineering were two courses that played a role in my pursuing a technical degree in college. During my senior year at Ravenscroft, Mollie Ducoste ’13 and I ended up winning a National LabView competition. This is when I realized I was passionate about problem solving.
Huge thanks to Mrs. Kat Belk for the hours spent in tutorial after school every week! It helped develop my love for mathematics!
During my undergraduate studies, I pursued a focus in computer science. I took a course in data mining as an elective, which led me to learning about data science. Data science is interdisciplinary in nature, using a combination of domain knowledge, statistics and machine learning to solve real-world problems.
After learning the power of data-science skills, I knew I wanted to pursue a master’s degree. I’m currently a second-year master’s student at Duke University, pursuing a degree in Interdisciplinary Data Science with plans of working in either the tech, fintech (financial technology) or automotive industry after graduation.
During my challenging master’s program, I’ve also pushed myself to balance marathon, ultramarathon and triathlon training. I’m currently training for a half-Ironman in October 2022 and another marathon in November. I have found that endurance training aids my graduate program studies; the combination of academics and athletics at Ravenscroft helped to steer me in this direction.