Non-Federal Acute Care Hospital Health IT Adoption and Use Analysis
Led data acquisition, preprocessing, and imputation for a graduate-level analysis on hospital EHR adoption trends using a public dataset from the U.S. Department of Health & Human Services.
Designed and implemented a full cleaning pipeline in Databricks, including schema alignment, type correction, and multi-model ML-based imputation (Random Forest, GBT, Linear Regression).
(Tools & Stack)
Python (Programming Language) · GitHub · SQL · Jupyter Notebook
(Process & Contribution)
Collaborated with faculty for method validation and integrated statistical feedback into imputation strategy.
Registered dataset as a SQL view to enable seamless team access for analysis and modeling.
Developed a dataflow diagram to visualize project stages (import → ML → SQL → visualization).
Created a new YouTube channel to host the final group presentation and recorded the technical intro.
Tools used: Python, PySpark, Databricks, SQL, GitHub, Canva