Skip to main content

Athough the NBER NPI page is no longer maintained, those files will continue to be available. That offering j does not include information about providers that were not current in April of 2019.

CMS offers a complete file of currently eligible providers each month, but does not offer a file that includes the full content of older deactivated records. For those records only the deactivation date is provided - not what a serious researcher wants. Here we offer new dataset created by concatenating 15 monthly files from April 2007 to the present day with roughly 12 month spacing between files. Our collection of monthly files was not perfectly regular, however any provider that was active since April 2007 for at least a year will be included, and some others. We are certainly interested in obtaining files for 2005 and 2006. The omission of short-lived providers may be a source of bias in certain applications, such as studies of fraudulent providers, the presence of all reasonably persistent providers with their historical data is an improvement over using only survivors. The most recent file can be downloaded from the CMS website from the link at https://download.cms.gov/nppes/NPI_Files.html

Deduplication

We have elected not to treat this as a rectangular panel, but only include records when there is a change in one of the data items from the previous year. The files are large and many records are essentially duplicates, a fraction that varies from .6 to .9 This does mean that selecting all the facility records for year X is not as simple as "keep if year==Z". More on that below.

You might think that we could deduplicate by NPI and date of last update without loss of information. A record should not change without a change to that date. Nevertheless, there are many changes to provider records without a change to lastupdate. In at least some cases, the only difference from the discarded records was the order of values in the multiplicative variables. The file offered here is deduplicated by all variables except source file name and year. Presumably only NPI and last update should be sufficient, but it doesn't seem so.

We did include a variable source which gives the filename of the source file for the included record. If our deduplication represents a loss of information, then please contact us with an explanation and we will try to do better.

We have mentioned that there might not be a record for year t, if there was no change that year. To extract all records valid for year 2018, try the following code:

gen lastupdate = date(lastupdatestr, "MDY") format lastupdate %td destring npi,force replace sort npi by npi: egen test = max(lastupdate) keep if test < td(01jan2019)

All files are zipped. They are very fluffy. The full file takes about 86GB in Stata but only 1.8GB compressed. As of 2021 the core variables (all but the multiplicative variables) take 18GB in Stata, but 1.8GB compressed. Possibly merging individual multiplicative datasets with the core dataset would be the most practical way to proceed. You may also find useful information about working with large datasets in Stata

Downloads

Documentation

 

Notes:

In April of 2020 We were able to obtain weekly files back to March 9, 2015, suggesting that CMS retains these files for 5 years. They are not linked on the CMS website, but can be obtained by guessing the URL (which differs only by the date fields). We did not use the weekly files in this round.

The original .csv files have a header with variable descriptions that are not suitable as variable names in a database or statistical package. Therefore we have created variable names and turned the supplied header into variable labels.

We have not updated the crosswalks, presumably the updates are not affected since UPINs have not been issued for many years.

We understand that providers are not required to provide DEA numbers, even if assigned, so the DEA crosswalk is incomplete. There are apparently commercial sources of DEA crosswalks. such as Lexis-Nexis, Surescipts, and the DEA itself.

We are very interested in speaking with users of this data, especially users of the older offerring. Please write or call Daniel Feenberg (feenberg@nber.org, 617-863-0343). We expect to provide a more comprehensive file once we have discussed with users their needs.

Source Data Files:

  1. Weekly .zip files as downloaded from the CMS website
  2. Monthly .zip files after conversion to .dta.
  3. Monthly .csv files before conversion to .dta.
  4. Our source code

More from NBER

In addition to working papers, the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter, the NBER Digest, the Bulletin on Retirement and Disability, the Bulletin on Health, and the Bulletin on Entrepreneurship — as well as online conference reports, video lectures, and interviews.

2024, 16th Annual Feldstein Lecture, Cecilia E. Rouse," Lessons for Economists from the Pandemic" cover slide
  • Feldstein Lecture
Cecilia Rouse, president of the Brookings Institution and a professor at Princeton University, who chaired the Council...
 2024 Methods Lecture, Susan Athey, "Analysis and Design of Multi-Armed Bandit Experiments and Policy Learning"
  • Methods Lectures
Background Materials:backgroundAthey, Susan, Undral Byambadalai, Vitor Hadad, Sanath Kumar Krishnamurthy, Weiwen Leung...