A housing unit in the CPS is interviewed for four consecutive months and then dropped out of the sample for the next eight months and is brought back in the following four months. So, in any given month, one-eighth of the housing units are interviewed for the first month. When the system has been in operation for a full year, four of the eight rotation groups for any month will have been in the survey for the same month, one year ago. Matching information and Stata .do files from NBER Working Paper T0247 by B. Madrian and L. J. Lefgren are available for March-to-March Annual Demographic File matches but can be modified for use in matching CPS Basic Monthly Data. Census Technical Paper such as 66 and 63 contains more information about Design and Methodology.
All data files follow the naming convention cpsbYYYYMM where YY is the year and MM is the month. Here we offer the original files and documentation from 1978 on, and Stata .dta and .csv files for 1989 on.
The file layouts are basically the same for the following periods: 1976-1988, 1989-1993, 1994-1997,1998-2004, and 2017-2020 and 2020+. In March 2021 the following variables had their prefixes changed from 'PE' or 'PR' to 'PT' in order to reflect that they are top-coded: PTIO1OCD, PTIO2OCD, PTERNHLY, PTERNWA, PTERN2, & PTERNH1C, but the layout is the same. Refer to CPS Basic Monthly Footnotes for year specific notes.
Important announcement about changes to the 2023 CPS PUF files: Announced in July 2023, the Census will introduce the changes summarized below starting January 2023. They will be phased over 16 months until fully integrated. This means that in January 2023 only Month-in-Sample (MIS) 1 will reflect the changes. This phase-in will continue for 16 months until April 2024, at which time all cases on the CPS microdata files will reflect the revisions. This phase-in procedure will allow users to continue to conduct longitudinal analyses without having a break in series.
- Starting in January 2023, all MIS 1 cases will be assigned Household Identification Numbers (HRHHID1) using an algorithm different than the one used in the current files. As with the current files, these numbers will remain constant over the time that a household remains in sample such that users will be able to use the same matching identification for the life of the case. This, along with HRHHID2, will uniquely identify a household. This is for your information and the data user will not have to do anything different than in previous data use.
- All Geography with population between 100,000 and 249,999 will go through a geographic synthesis for privacy protection. The new synthetic values allow preservation of the level of detail and many of the underlying relationships while providing the level of protection required.
- The Census Bureau will be rounding and dynamically topcoding hourly and weekly wages. Specifically, the upper boundary of the minimum rounding was raised to $29.99 for hourly wages. The weekly rounding was also updated to better align with the hourly wage rounding rules, assuming a traditional 40-hour work week. New "dynamic" top-coding approach for wage and earnings data that will be applied to a weighted average of the top 3% of values on a monthly basis. These changes will not appear until April 2023 when MIS 4 cases are first phased in.
- New Flag – PRERNMIN In order to identify wages that were originally reported below the Federal Minimum wage even with rounding on the PUF, all records originally reported below 725 ($7.25) hourly will be flagged. PRERNMIN is set to 1 if below minimum wage was reported.
The 1976-1988 data documentation calls every group of six Characters a Word. To convert Words and Characters into plain characters multiply the number of previous Words by six and add the number of character positions within the designated Word. That will give the location of the first character of the variable of interest. For example, State is in Word 3, Characters 5-6. 2 previous words * 6 = 12 + 5 = 17 = the starting location of State.
Weekly hours/earnings are not included in these files from 1976-1983. From 1976-1978, these variables are available in the May Extracts. From 1979 on, these variables are available in the Merged Outgoing Rotation Groups.
Usually, the documentation from January applies to an entire year. Exceptions are 1984-1985 and 1994-1995. The January 1984 documentation is used through to June 1985. The July 1985 documentation applies to the remainder of 1985. For 1994-1995, the January 1994 documentation is used through August 1995. The September 1995 documentation serves for the rest of the year.
All variables are numeric (mostly byte) EXCEPT identifier variables, hrsample, and hrsersuf.
Programs and Data Downloads
We provide Stata, SAS, and SPSS programs to convert the raw data files into their respective statistical package file formats. Each set of programs corresponds to a set of variable layout and definitions for a specified time period. (The variable layouts and definitions change 21 times from 1989 to 2023.) We also provide processed versions of the raw files in Stata, CSV, SAS, and SPSS formats for ease of use.
The programs are named using cpsbYYYYMM format where YYYYMM are the first year and (two-digit) month of data they are used to process. So for example, the do-file named cpsb201401.do is used to process raw data starting in January 2014. Similarly, the do-file named cpsb201404.do is used starting in April 2014. You will need to update certain parts of each program to reflect the current month and year you are processing.
Raw and processed data files are named similarly after the month and year of the data they contain. We provide both compressed (zip) and uncompressed versions of these files.
- Documentation
- Raw files from Census Bureau
- Stata, SAS, and SPSS programs for reading raw files
- Processed data files
Thanks to David Card at Berkeley for providing the 1989-1993
NBER internal Users can access the data from a UNIX shell at /homes/data/cps-basic3
Contact data@nber.org with questions, comments, or suggestions.