Lesson 22 – How to use CMS data for medical marketing

Not many healthcare marketers can afford to buy healthcare data from the big guys like definitivehc.com, IQVIA etc. There’s no substitute for the clean, organized data that they have (of course, they have spent millions on preparing data that way).

The is part of a series of lessons in our medical marketing course. For the entire course, please check out our medical marketing lessons.

Not many healthcare marketers can afford to buy healthcare data from the big guys like definitivehc.com, IQVIA etc. There’s no substitute for the clean, organized data that they have (of course, they have spent millions on preparing data that way). However, you can “sort of” get by with CMS NPI database (available for download). Here are the steps that healthcare marketers in USA can use to get a workable database. Some of these a wee bit technically advanced and you might need a healthcare IT person to help out for a little bit.

  1. Download the NPI database (NPPES Downloadable File)

    Go to CMS data dissemination website here. Find this link to download the data (NPI_Files).
    Keep in mind that the data is large. Moreover, when you expand the zip file, it will become even larger, so be prepared for that.

    Once you download and extract the zip file, you will have these files (keep in mind that this can change at any point)

    npidata_pfile_xxxx.csv
    npidata_pfile_xxxx_FileHeader.csv
    pl_pfile_xxxx.csv
    pl_pfile_xxxx_FileHeader.csv
    endpoint_pfile_xxxx_FileHeader.csv
    endpoint_pfile_xxxx.csv
    othername_pfile_xxxx_FileHeader
    othername_pfile_xxxx.csv

    One thing to keep in mind before proceeding is that NPI database is not really that great. A lot of the information is old and has not been updated in years (even though CMS asks providers to keep their information up to date). In addition to that, you will notice that providers are not the best at clean, error free data entry either.

    Deal with it – this is one of the best shots you have got (there’s another way, of course, wherein you can get even better information.. from payers’ member directories, but a tad more technically advanced).

  2. How to import this NPI data into a database


    This is a bit of a tough one. For this, we would recommend that you download the free MySQL database. After this, create a schema e.g.

    CREATE DATABASE npi-20050523-20200607 /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */;

    Once you do that, just copy paste the following SQL statements to create the tables (otherwise, it is a bit of a nightmare). Create the npidata table like this (Download the txt file from here and then save it with a .sql extension)

    Now that you have created the table, then, create the Other Name Reference table like this (this table contains the “other names” that a practice does business as). Download the file from here

    Next, create the practice locations table (this gives you all the practice locations of a company that’s not listed as the main location. Download from here.


    Finally, create the endpoint (email address etc) table. Download the file from here.

  3. Load the data into each one of these tables like this (change it to the file names you have)


    LOAD DATA INFILE 'NPPES_Data_Dissemination_June_2020/pl_pfile_20050523-20200607.csv' INTO TABLE pl FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;

    LOAD DATA LOCAL INFILE 'NPPES_Data_Dissemination_June_2020/endpoint_pfile_20050523-20200607.csv' INTO TABLE endpoint_pfile FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;

    LOAD DATA LOCAL INFILE 'NPPES_Data_Dissemination_June_2020/othername_pfile_20050523-20200607.csv' INTO TABLE othername FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;

    LOAD DATA LOCAL INFILE 'NPPES_Data_Dissemination_June_2020/npidata_pfile_20050523-20200607.csv' INTO TABLE npidata FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS;

  4. How to find providers to market to

    First, read this file NPPES_Data_Dissemination_Readme.pdf .. This gives you a LOT of information and helps you understand how to search the data you just imported.

    Understand that if you want to market to providers directly, you will need to search for Entity_Type_Code = 1. Keep in mind that while there will be a phone number, the fax number might not be present. Also, you need to look at when the provider last updated their own CMS data. Based on when the provider last updated their data, what you are looking at, might be quite old (still does not mean that it is not valid).

    Let’s say you want to find all the providers in Queens, NY. You could very well find out all the providers in NYS first like this

    select * from npidata where<br>Provider_Business_Practice_Location_Address_State_Name like 'NY'<br>and Entity_Type_Code = 1<br>order by Provider_Business_Practice_Location_Address_City_Name;

    Let’s see how many providers practice in NYS first

    select count(*) from npidata where Provider_Business_Practice_Location_Address_State_Name like 'NY' and Entity_Type_Code = 1;

    You will find that count = 415,960

    After this, you can very easily download all the results as a CSV file and work further on this.
    Here’s what you would get if you were to download this as a CSV file. You can import this into google sheets and work on it further for your marketing goals.

  5. How to find providers of a certain specialist to market to?

    For this, you are going to need a little bit more data – on Taxonomy codes. Head over to this page from NUCC and download the latest CSV. After that, you can import this CSV straight away into your database OR, you can simply import this into Google Sheets (or open it in Microsoft excel).

    Click here for the taxonomy spreadsheet from NUCC. nucc_taxonomy_201

    You are going to need to filter the data a little bit for the specialist you are looking for. E.g let’s say that you are looking to market to all dental providers, you can filter and use these codes


    125K00000X – Dental Providers, Advanced Practice Dental Therapist
    126800000X – Dental Providers, Dental Assistant
    124Q00000X – Dental Providers, Dental Hygienist
    292200000X – Laboratories, Dental Laboratory
    126900000X- Dental Providers, Dental Laboratory Technician
    125J00000X – Dental Providers, Dental Therapist
    122300000X – Dental Providers, Dentist
    1223D0001X – Dental Providers, Dentist, Dental Public Health
    1223D0004X – Dental Providers, Dentist, Dentist Anesthesiologist
    1223E0200X – Dental Providers, Dentist, Endodontics
    1223G0001X – Dental Providers, Dentist, General Practice
    1223P0106X – Dental Providers, Dentist, Oral and Maxillofacial Pathology
    1223P0221X – Dental Providers, Dentist, Pediatric Dentistry
    1223P0300X – Dental Providers, Dentist, Periodontics
    1223P0700X – Dental Providers, Dentist, Prosthodontics
    1223S0112X – Dental Providers, Dentist, Oral and Maxillofacial Surgery
    1223X0008X – Dental Providers, Dentist, Oral and Maxillofacial Radiology
    1223X0400X – Dental Providers, Dentist, Orthodontics and Dentofacial Orthopedics
    1223X2210X – Dental Providers, Dentist, Orofacial Pain
    122400000X – Dental Providers, Denturist

  6. Filter specialists you want to market to using taxonomy code

    Now that you have the taxonomy codes, you can filter your provider data based on that taxonomy code. That’s how you would create a list of all providers serving a specialty to market to. Here’s a sample SQL to run:

    select * from npidata where
    Provider_Business_Practice_Location_Address_State_Name like ‘NY’
    and Entity_Type_Code = 1
    and Healthcare_Provider_Taxonomy_Code_1 in (‘125K00000X’, ‘126800000X’, ‘124Q00000X’, ‘292200000X’, ‘126900000X’, ‘125J00000X’, ‘122300000X’, ‘1223D0001X’, ‘1223D0004X’, ‘1223E0200X’, ‘1223G0001X’, ‘1223P0106X’, ‘1223P0221X’, ‘1223P0300X’, ‘1223P0700X’, ‘1223S0112X’, ‘1223X0008X’, ‘1223X0400X’, ‘1223X2210X’,’122400000X’)
    order by Provider_Business_Practice_Location_Address_City_Name
    ;

  7. How to find practices/clinics to market to (not providers)

    If you are marketing to hospitals, you will need Entity_Type_Code=2.

    So, your first step is to get the taxonomy codes from your taxonomy table. e.g
    Code, Grouping, Classification, Specialization
    273100000X, Hospital Units, Epilepsy Unit
    275N00000X, Hospital Units, Medicare Defined Swing Bed Unit
    273R00000X, Hospital Units, Psychiatric Unit
    273Y00000X, Hospital Units, Rehabilitation Unit
    276400000X, Hospital Units, Rehabilitation, Substance Use Disorder Unit
    287300000X, Hospitals, Christian Science Sanitorium
    281P00000X, Hospitals, Chronic Disease Hospital
    281PC2000X, Hospitals, Chronic Disease Hospital, Children
    282N00000X, Hospitals, General Acute Care Hospital
    282NC0060X, Hospitals, General Acute Care Hospital, Critical Access
    282NC2000X, Hospitals, General Acute Care Hospital, Children
    282NR1301X, Hospitals, General Acute Care Hospital, Rural
    282NW0100X, Hospitals, General Acute Care Hospital, Women
    282E00000X, Hospitals, Long Term Care Hospital
    286500000X, Hospitals, Military Hospital
    2865C1500X, Hospitals, Military Hospital, Community Health
    2865M2000X, Hospitals, Military Hospital, Military General Acute Care Hospital
    2865X1600X, Hospitals, Military Hospital, Military General Acute Care Hospital. Operational (Transportable)
    283Q00000X, Hospitals, Psychiatric Hospital
    283X00000X, Hospitals, Rehabilitation Hospital
    283XC2000X, Hospitals, Rehabilitation Hospital, Children
    282J00000X, Hospitals, Religious Nonmedical Health Care Institution
    284300000X, Hospitals, Special Hospital

    Now, just run the query

    select * from npidata where Entity_Type_Code = 2 and Provider_Business_Practice_Location_Address_State_Name like 'NY' and Healthcare_Provider_Taxonomy_Code_1 in ('273100000X', '275N00000X', '273R00000X', '273Y00000X', '276400000X', '287300000X', '281P00000X', '281PC2000X', '282N00000X', '282NC0060X', '282NC2000X', '282NR1301X', '282NW0100X', '282E00000X', '286500000X', '2865C1500X', '2865M2000X', '2865X1600X', '283Q00000X', '283X00000X', '283XC2000X', '282J00000X', '284300000X' );

    Hopefully this helps you out.

  8. Rinse repeat clean your data daily

    Keep in mind that CMS data is notoriously bad. These steps get you started. Your mileage may vary. If you cannot afford to buy your data, keep on conducting your outreach and keep on cleaning your data daily.


Learn more