The IHPI Data & Methods Hub (DMH) maintains a repository of high-value datasets that IHPI members can utilize to advance their research. Many of these datasets are available for free for IHPI members. If you have any questions about any of the datasets listed below, please connect with our team at email@example.com. We will be happy to chat with you and help determine which data resource is right for you.
IHPI would like to thank the following center and program partners who help make our data available to the U-M research community through financial contributions:
- Ann Arbor VA Center for Clinical Management Research
- Center for Healthcare Outcomes & Policy
- Center for Evaluating Health Reform
- Center for Eye Policy & Innovation
- Dow Division for Urologic Health Services Research
- Kidney Epidemiology and Cost Center
- Michigan Medicine Department of Neurology
- Michigan Opioid Prescribing Engagement Network
- School of Public Health Department of Health Management and Policy
- Susan B. Meister Child Health Evaluation and Research Center
IHPI Data Assets
Administrative Claims Databases
The SID includes inpatient discharge records from community hospitals in a particular state. The SID files encompass all patients, regardless of payer, providing a unique view of inpatient care in a defined market or state over time.
The NIS is the largest publicly available all-payer inpatient care database in the United States, containing data on more than seven million hospital stays annually. Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, uncommon treatments, and special populations.
The NEDS is the largest all-payer emergency department (ED) database in the United States, yielding national estimates of hospital-based ED visits. Unweighted, it contains data from approximately 31 million ED visits each year. Weighted, it estimates roughly 143 million ED visits.
The NRD is a unique and powerful database designed to support various types of analyses of national readmission rates for all payers and the uninsured. This database addresses a large gap in health care data - the lack of nationally representative information on hospital readmissions for all ages.
The CMS Medicaid Analytic eXtract (MAX) data include enrollment information for all Medicaid enrollees within a state, regardless of whether the beneficiary receives all services under fee for service (FFS) or is enrolled in managed care organizations (MCOs). All FFS utilization and any reported MCO utilization is also included in MAX files.
Data from 55 million enrollees, including enrollment data for all members and inpatient/nursing facility data, as well as Part D prescription-related data for a large sample across multiple years. Data include:
- 100% sample—Medicare Provider Analysis and Review (MedPAR—Inpatient and Skilled Nursing Facility claims)
- 100% sample—Master Beneficiary Summary File (MBSF—Base enrollment)
- 20% sample—Carrier File (Professional claims)
- 20% sample—Outpatient File
- 20% sample—Home Health Agency
- 20% sample—Hospice
- 20% sample—Part D Event and Drug Characteristics files
- Cohorts for various conditions and procedures
Commercial and Medicare Advantage data from ~83 million insured individuals, including inpatient, outpatient, prescription, geographic and socioeconomic details, and month of death for patients.
- Month of Death View - Dictionary
- Socioeconomic Status View – Dictionary
- Zip Code View - Dictionary
- Optum User Manual
- SES Demographics
Commercial and Medicare Advantage data from ~140 million employees and dependents covered by the health benefit programs of large employers. These claims data are collected from several hundred insurance providers, Blue Cross Blue Shield plans, and third party administrators.
Contains the pooled healthcare experience of Medicaid enrollees from multiple states. It includes inpatient services and prescription drug claims, as well as information on enrollment, long-term care, and other medical care.
A collection of key metrics from a cross-sectional study of 6,500 U.S. hospitals including bed size, physician arrangements, IT indicators, community health partnerships, etc. The data represents the most credible, consistent and comprehensive data provided by nearly 6,300 hospitals and more than 400 health care systems.
The Data & Methods Hub staff have also worked with internal Michigan Medicine EHR data and can help you determine if medical record data is the correct choice for your project
Data sources not managed by IHPI
The following are public use datasets that are relevant for public health and policy research. Click on the links below to learn more about these valuable data sources.
- Health and Retirement Study: https://hrs.isr.umich.edu/data-products
- Behavioral Risk Factor Surveillance System: https://www.cdc.gov/brfss/index.html
- Hospital Compare: https://www.nber.org/data/cms-hospital-compare-data.html
- Physician Compare: https://data.medicare.gov/data/physician-compare
- CMS Impact File: https://www.nber.org/data/cms-impact-file-hospital-inpatient-prospective-payment-system-ipps.html
- CMS Regional Variation public use file: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Geographic-Variation/GV_PUF.html
- Other health data from NBER: https://www.nber.org/data/
- Data available through ICPSR: https://www.icpsr.umich.edu/index.html
- National health expenditure accounts: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/NationalHealthExpendData/index.html
- CDC query-able health statistics: https://wonder.cdc.gov/
- HCUP query-able health statistics: https://hcupnet.ahrq.gov/
- OECD health statistics: https://stats.oecd.org/Index.aspx?ThemeTreeId=9
- Human Mortality database: https://www.mortality.org/
- National Vital Statistics System mortality data: https://www.cdc.gov/nchs/nvss/deaths.htm
How do the administrative claims databases differ?
Choosing the right data source can be quite difficult, especially if there are multiple options for similar types of data. This is especially true for administrative claims where the selection of variables offered often overlap between the comparative databases. Here is some info to help you get a better understanding of three of our most popular datasets: Medicare, Optum, and Truven MarketScan. Note that Optum has 3 separate "views" of data (SES, DOD, ZIP). Each view gains some variables while losing other variables. This is meant to maintain patient anonymity. Differences between the views will be noted in the table. Please remember that we are always available to meet with you to discuss any of our available datasets.
|Data Characteristic||Public Insurance||Private Insurance|
|Medicare FFS||OptumInsight||Truven MarketScan|
|Number of covered lives||55M||83M||140M|
|Date of Birth||✔||Birth Year||Birth Year|
|Ages Covered||65+ or disabled||0-90||0-100+|
|Race||✔||✔ (SES, DOD)||-|
|Additional Socio Economic Info||-||✔ (SES)||-|
|Death Date||✔||✔ DOD (Month and year only)||-|
|Cause of Death||✔ (through 2016 only)||-||-|
|Census Region||✔||(SES, DOD)||✔|
|ZIP Code||✔||(ZIP - 5 digit)||✔ (only 3 digit, 2009-2010)|