Big Data in the US Consumer Price Index: Experiences and Plans
The Bureau of Labor Statistics (BLS) has generally relied on its own sample surveys to collect the price and expenditure information necessary to produce the Consumer Price Index (CPI). The burgeoning availability of big data has created information that could lead to methodological improvements and cost savings in the CPI. The BLS has undertaken several pilot projects in an attempt to supplement and/or replace its traditional field collection of price data with alternative sources. In addition to cost reductions, these projects have demonstrated the potential to expand sample size, reduce respondent burden, obtain transaction prices more consistently, and improve price index estimation by incorporating real-time expenditure information—a foundational component of price index theory that has not been practical until now. The CPI uses the term alternative data to refer to any data not collected through traditional field collection procedures by CPI staff, including third party datasets, corporate data, and data collected through web scraping or retailer APIs. This paper reviews how the CPI program is adapting to work with alternative data, followed by discussion of the three main sources of alternative data under consideration by the CPI with a description of research and other steps taken to date for each source.