The IHPI Data & Methods Hub (DMH) maintains a repository of high-value datasets that IHPI members can utilize to advance their research. Many of these datasets are available for free for IHPI members. If you have any questions about any of the datasets listed below, please connect with our team at firstname.lastname@example.org. We will be happy to chat with you and help determine which data resource is right for you.
IHPI Data Assets
Administrative Claims Databases
The SID includes inpatient discharge records from community hospitals in a particular state. The SID files encompass all patients, regardless of payer, providing a unique view of inpatient care in a defined market or state over time.
HCUP Nationwide Inpatient Sample (NIS —2003 through 2018)
The NIS is the largest publicly available all-payer inpatient care database in the United States, containing data on more than seven million hospital stays annually. Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, uncommon treatments, and special populations.
HCUP Nationwide Emergency Department Sample (NEDS—2006 through 2018)
The NEDS is the largest all-payer emergency department (ED) database in the United States, yielding national estimates of hospital-based ED visits. Unweighted, it contains data from approximately 31 million ED visits each year. Weighted, it estimates roughly 143 million ED visits.
HCUP Nationwide Readmissions Database (NRD—2013 through 2018)
The NRD is a unique and powerful database designed to support various types of analyses of national readmission rates for all payers and the uninsured. This database addresses a large gap in health care data - the lack of nationally representative information on hospital readmissions for all ages.
The CMS Medicaid Analytic eXtract (MAX) data include enrollment information for all Medicaid enrollees within a state, regardless of whether the beneficiary receives all services under fee for service (FFS) or is enrolled in managed care organizations (MCOs). All FFS utilization and any reported MCO utilization is also included in MAX files.
Data from 55 million enrollees, including enrollment data for all members and inpatient/nursing facility data, as well as Part D prescription-related data for a large sample across multiple years. Data include:
- 100% sample—Medicare Provider Analysis and Review (MedPAR—Inpatient and Skilled Nursing Facility claims)
- 100% sample—Master Beneficiary Summary File (MBSF—Base enrollment)
- 20% sample—Carrier File (Professional claims)
- 20% sample—Outpatient File
- 20% sample—Home Health Agency
- 20% sample—Hospice
- 20% sample—Part D Event and Drug Characteristics files
- Cohorts for various conditions and procedures
Commercial and Medicare Advantage data from ~83 million insured individuals, including inpatient, outpatient, prescription, geographic and socioeconomic details, and month of death for patients.
- Month of Death View - Dictionary
- Socioeconomic Status View – Dictionary
- Zip Code View - Dictionary
- Optum User Manual
- SES Demographics
Commercial and Medicare Advantage data from ~140 million employees and dependents covered by the health benefit programs of large employers. These claims data are collected from several hundred insurance providers, Blue Cross Blue Shield plans, and third party administrators.
Contains the pooled healthcare experience of Medicaid enrollees from multiple states. It includes inpatient services and prescription drug claims, as well as information on enrollment, long-term care, and other medical care.
A collection of key metrics from a cross-sectional study of 6,500 U.S. hospitals including bed size, physician arrangements, IT indicators, community health partnerships, etc. The data represents the most credible, consistent and comprehensive data provided by nearly 6,300 hospitals and more than 400 health care systems.
This data includes current and historical data for more than 1.4 million physicians, residents and medical students in the United States. This figure includes approximately 449,000 graduates of foreign medical schools who reside in the United States and who have met the educational and credentialing requirements necessary for recognition.
Torch is a robust market-centric analytics solution that provides the most comprehensive and accurate data on the unique attributes of ACOs, bundle payments, hospitals, physician groups, insurance carriers, and more.
The Data & Methods Hub staff have also worked with internal Michigan Medicine EHR data and can help you determine if medical record data is the correct choice for your project
Data sources not managed by IHPI
The following are public use datasets that are relevant for public health and policy research. IHPI members can click on the links below to learn more about individual data sources, or contact the Data & Methods Hub (email@example.com) about accessing our data resources and services.
- Health and Retirement Study: https://hrs.isr.umich.edu/data-products
- Behavioral Risk Factor Surveillance System: https://www.cdc.gov/brfss/index.html
- Hospital Compare: https://www.nber.org/data/cms-hospital-compare-data.html
- Physician Compare: https://data.medicare.gov/data/physician-compare
- CMS Impact File: https://www.nber.org/data/cms-impact-file-hospital-inpatient-prospective-payment-system-ipps.html
- CMS Regional Variation public use file: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Geographic-Variation/GV_PUF.html
- Other health data from NBER: https://www.nber.org/data/
- Data available through ICPSR: https://www.icpsr.umich.edu/index.html
- National health expenditure accounts: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/NationalHealthExpendData/index.html
- CDC query-able health statistics: https://wonder.cdc.gov/
- HCUP query-able health statistics: https://hcupnet.ahrq.gov/
- OECD health statistics: https://stats.oecd.org/Index.aspx?ThemeTreeId=9
- Human Mortality database: https://www.mortality.org/
- National Vital Statistics System mortality data: https://www.cdc.gov/nchs/nvss/deaths.htm
How do the administrative claims databases differ?
Choosing the right data source can be quite difficult, especially if there are multiple options for similar types of data. This is especially true for administrative claims where the selection of variables offered often overlap between the comparative databases. Here is some info to help you get a better understanding of three of our most popular datasets: Medicare, Optum, and Truven MarketScan. Note that Optum has 3 separate "views" of data (SES, DOD, ZIP). Each view gains some variables while losing other variables. This is meant to maintain patient anonymity. Differences between the views will be noted in the table. Please remember that we are always available to meet with you to discuss any of our available datasets.
|Data Characteristic||Public Insurance||Private Insurance|
|Medicare FFS||OptumInsight||Truven MarketScan|
|Data years||2003 through 2016||2001 through 2018||2009 through 2018|
|Number of covered lives||55M||83M||140M|
|Date of Birth||✔||Birth Year||Birth Year|
|Ages Covered||65+ or disabled||0-90||0-100+|
|Race||✔||✔ (SES, DOD)||-|
|Additional Socio Economic Info||-||✔ (SES)||-|
|Death Date||✔||✔ DOD (Month and year only)||-|
|Cause of Death||✔ (through 2016 only)||-||-|
|Census Region||✔||(SES, DOD)||✔|
|ZIP Code||✔||(ZIP - 5 digit)||✔ (only 3 digit, 2009-2010)|