Data Scientist
Technical Skills
Python (PyTorch/Matplotlib/NumPy/Pandas/SciPy/scikit-learn/seaborn), SQL, Tableau, Git, HPC, SLURM, Command-line interface, Lightning AI Studio, Visual Studio Code
Education
-
M.S., Data Science |
New York University (May 2024) |
-
B.A., Economics |
Lawrence University (Jun 2016) |
-
B.A., Mathematics |
Lawrence University (Jun 2016) |
Data Science Experience
Memorial Sloan Kettering Cancer Center
Data Science Researcher (Sep. 2023 - Dec. 2023)
Cancer Drug Response Prediction through Knowledge Graph Embedding & Geometric Deep Learning
Report
- Constructed a knowledge graph of 3,217,941 triplets by synthesizing clinical & mutational profiles for 7 cancer types.
- Implemented RotatE, a knowledge graph embedding model, for graph augmentation & drug prediction.
- Executed the TxGNN architecture in PyTorch on NYU High Performance Computing clusters to predict triplets and generate therapeutic treatment recommendations for cancer patients, achieving 88% AUROC.
GitHub Repo
- Utilized Matplotlib, Pandas, Numpy in Python to drive understanding of Twitter’s user behavior such as location, language, percentage of tweets by user ID.
- Created tweet embeddings using OpenAI’s API and applied K-means clustering to identify themes and extract narratives within state-backed information operations.
- Fine-tuned a pre-trained LLM to classify tweets belonging to such operations, achieving a 97% validation accuracy.
Predicting Obesity from Lifestyle Characteristics of Latin American Population (Sep. 2022 - Dec 2022)
Link to Project
- Conducted data preprocessing and exploratory analysis using Pandas, Matplotlib, and Seaborn to visualize factors affecting obesity such as age, sex, caloric consumption, physical activities.
- Applied hypothesis testing using Chi-Square and Mann-Whitney U tests to perform feature selection.
- Achieved a 79% accuracy in identifying critical factors contributing to obesity using logistic regression.
Video Frame Prediction & Image Segmentation Using Generative Adversarial Network & U-Net (Mar. 2024 - May. 2024)
Github Repo
- Trained FutureGAN, a progressively growing generative adversarial network (GAN), on 13K videos to predict video frames on GPU, achieving a MSE of 0.0069 on validation set
- Collaborated with teammates on implementation of U-Net, a convolutional neural network generating segmentation masks for objects in predicted video frames
Large Language Models’ Cognitive Capabilities: A Study on OpenAI’s GPT Models (Mar 2023 - May 2023)
Github Repo
- Conducted vignette-based investigations on GPT-3.5 and GPT-4’s cognitive capabilities in decision-making, information search, deliberation, and causal reasoning using canonical scenarios from cognitive psychology.
- Applied prompt engineering on GPT-3.5 and evaluated GPT-3.5 and GPT-4’s performance compared to GPT-3.
- Identified the ability to handle adversaries as a weakness for current GPT models for future research purposes.
Work Experience
International Data Team Manager @ Haver Analytics, New York, NY (Mar. 2020 – Present)
- Lead a team of 4 data managers to manage 22 macroeconomic and financial databases covering Asia-Pacific.
- Utilized OpenAI’s API and MySQL to create a Q&A program on internal knowledge using embeddings-based search.
- Developed custom datasets for institutional clients, driving business decisions with accurate data.
- Directed the team’s automation effort, raising automation level from 3% to 42% of ETL processes.
- Reviewed data additions, cleanups, methodology change implementations, and automation.
Senior International Data Manager @ Haver Analytics, New York, NY (Mar. 2019 – Mar. 2020)
- Undertook team management responsibilities, including assigning and distributing workflow among members.
- Guided the team on complex client inquiries and projects requiring deep macroeconomic subject matter expertise.
- Researched documentation published by economic institutions, leading to increased insight in economic data-reporting procedures and changes in statistical classifications and methodologies.
- Collaborated closely with the Development team to provide product and program improvement recommendations.
International Data Manager @ Haver Analytics, New York, NY (Nov. 2016 – Mar. 2019)
- Integrated 180+ macroeconomic datasets covering emerging economies in the Asia-Pacific region.
- Ensured data integrity and update timeliness through statistical analysis and automation.
- Created, implemented, and streamlined ETL procedures for data acquisition and updates.
- Utilized Excel, EViews, Python, and proprietary software to process and manipulate time series data.
- Provided clients with technical support and advised them on data analysis functions and optimal methods of tracking data.