Datacenter Impact Study

research · economics · sustainability · data-science

Datacenter Impact Study

Energy Institute at Haas — Undergraduate Research, 2025–2026

This research project examines how data center construction affects surrounding communities — specifically property values, employment, local tax revenue, and environmental indicators. Working under faculty at Berkeley's Energy Institute at Haas, I assembled and cleaned a multi-source dataset combining DCByte facility data, Business Insider location records, NHGIS/ACS census tract data, and Zillow ZHVI/ZORI housing indices.

Tools & Methods: Python (pandas, NumPy, statsmodels, linearmodels) for data cleaning, merging, and econometric analysis. Difference-in-differences estimation using Callaway & Sant'Anna and Sun & Abraham estimators for staggered treatment timing. Matplotlib and seaborn for event-study visualizations. Data sourced and cleaned from NHGIS/ACS (census tract demographics), Zillow (ZHVI/ZORI housing indices), and DCByte/Business Insider (facility locations and capacities). Control group construction required identifying Virginia counties with zero datacenter presence — resolved contamination issues when Fairfax County (45 datacenters) was initially included. Final clean controls: Arlington, Stafford, Spotsylvania, Clarke, Warren, Rappahannock, Orange.

Key Findings: Using January 2021 as a structural break (AI-era acceleration of datacenter construction), early results show a measurable dose-response relationship: Virginia counties ranked by permit intensity show heterogeneous housing price effects. Higher-intensity counties (Loudoun, Prince William) see statistically significant divergence from control counties in post-treatment periods. The research reframes the traditional environmental impact lens into an AI-era economic story.