Physician liaisons are tasked with introducing their employer practice to neighboring physicians. They are measured on how many new patient referrals the practice is getting each week/month/quarter. But, most of them are not equipped with the right tools to be able to do their job.
Of course, if you use our physician healthcare CRM, this task would be easy to do. However, you don’t necessarily need to, if you have your own IT team. Following are the steps you can take.
Table of contents
- Step 1 – Download and extract NPI data
- Step 2 – Add these files to AWS S3
- Step 3 – Add a crawler using AWS Glue
- Step 4 – Add a table using AWS Athena
- Step 5 – Query your NPI table using AWS Athena
- How to find all doctors in the USA
- How to find all healthcare businesses in the USA
- How to find referring practices near your practice locations
- How to find practices that are associated with hospitals or owned by other practices
You are going to consolidate quite a few sources of information (publicly available from CMS). The first one is the NPI database, available from CMS Data dissemination page. Note that this data (zipped) contains all the practitioner information across the USA.
As per CMS website, The zipped NPPES Downloadable file will include the following 3 reference files:
- Other Name Reference File – this file contains additional Other Names associated with Type 2 NPIs
- Practice Location Reference File – this file contains all of the non-primary Practice Locations associated with Type 1 and Type 2 NPIs
- Endpoint Reference File – this file contains all Endpoints associated with Type 1 and Type 2 NPIs.
This is in addition to the “Full Replacement Monthly NPI File”. In addition to this, you can also get the “Weekly Incremental NPI Files”. These files are delivered in CSV format and you can download them, then extract them to your system (WARNING – these are pretty large files).
We recommend taking one of these two approaches of loading up and exploring this data.
We have already explained how to process this data using Amazon RDS in this blog about “how to use CMS data for medical marketing”. The following information is for Amazon Athena.
Step 1 – Download and extract NPI data
You will get the necessary files from NPI files link here. Extract each file. They should all be CSV files once you extract them (that’s how CMS distributes them).
Step 2 – Add these files to AWS S3
AWS S3 allows you to store enormous amounts of data at dirt cheap prices. We use it quite heavily (encrypted, fully secured, of course). This data set that you are downloading, however, is public data and you don’t have to worry about encryption and HIPAA related headaches as much.
Create folders in S3 to hold your NPI data. Make sure that you have one directory per large file (it truly does make it easier to manage). E.g. see below
Step 3 – Add a crawler using AWS Glue
Read up about AWS Glue on their website. “AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.”
We use AWS Glue for “Automatic schema discovery”. This allows us to not waste time in loading, extracting, transforming the large healthcare data files that we typically have to work with.
Initially, you will run it “On demand” because you want to test how things are working. Go ahead and do so, then, you can run this crawler on a schedule (matching the schedule where you download the NPI data files – weekly and/or monthly).
Step 4 – Add a table using AWS Athena
Read up about AWS Athena on their website. “Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL.”
That’s pretty much exactly what you are going to do anyway. You have uploaded the file to Amazon S3 and now you are going to query it using Amazon Athena.
Notice that once AWS Glue crawler ran, it created a table for you already. You save a lot of money when you use Athena’s Parquet format (columnar vs flat). Go ahead and use the Athena’s CREATE TABLE AS (CTAS) query to create a parquet table for you. Here’s the syntax (from documentation)
CREATE TABLE new_table WITH ( format = 'Parquet', parquet_compression = 'SNAPPY') AS SELECT * FROM old_table;
Step 5 – Query your NPI table using AWS Athena
You can run a quick test (maybe for your own doctor) using Athena query console. E.g. let’s say you want to market to pediatrics providers
select np.NPI , Healthcare_Provider_Taxonomy_Code_1 as code , t.Classification , COALESCE(NULLIF(np.Provider_Other_Organization_Name, ''), np.Provider_Organization_Name_Legal_Business_Name) AS Organization_Name , Provider_Business_Practice_Location_Address_State_Name as state , Provider_Business_Practice_Location_Address_Telephone_Number as phone , Provider_Business_Practice_Location_Address_Fax_Number as fax , Authorized_Official_Last_Name AS ao_lname , Authorized_Official_First_Name AS ao_fname , Authorized_Official_Title_or_Position as AO_Title , Authorized_Official_Telephone_Number as ao_phone from npi_data_ctas_parquet np left join taxonomies t on np.Healthcare_Provider_Taxonomy_Code_1 = t.Code where np.Provider_Business_Practice_Location_Address_State_Name = 'NY' AND (Healthcare_Provider_Taxonomy_Code_1 = '208000000X' OR Healthcare_Provider_Taxonomy_Code_1 = '2080A0000X') limit 100;
You will get results back like thus
Which, you can save as a spreadsheet if you’d like.
How to find all doctors in the USA
Keep in mind that CMS data is VERY dirty.
However, there are several ways of enhancing the CMS data that you have uploaded using S3 and Athena.
Now that you have the table ready to go, you can make your life a little easier with Amazon Athena Views.
So, you can filter out all the medical doctors (i.e. individuals) from the practices / organizations with something as simple as this.
So, that gives you a view with only the doctors in the USA. You can query it as per your liking.
Again, feel free to download the information as a CSV for your healthcare CRM or wherever else you are using it.
If you use our physician healthcare CRM, this data is available for you to use.
How to find all healthcare businesses in the USA
So, you can filter out all the practices / organizations from the medical doctors (i.e. individuals) from the same dataset with something as simple as this.
Create a view and then query it however you want to.
You can filter out however you want to.
In all probability, you are not getting referrals from the dialysis centers 🙂 Just kidding.
Let’s narrow it down to all all referring providers that are relevant to you.
How to find referring practices near your practice locations
Let’s see what you get in the Bronx (as an example) by running a query like this below.
How about we narrow this list down to zip codes of your practice locations? What are the zip codes of your practice locations? Pick one or more of your zip codes.
Then, let’s run this query (example zip codes used here = 10451)
Great, you get some pretty good results (see below)
If you use our physician healthcare CRM, this data is available for you to use.
OK, so that’s interesting. Now, how about we get more specific about which kinds of providers we want to market to? Let’s run a query to find practices and their specialties, then limit them to containing “Medicine”. Keep in mind that you could also have queried using the taxonomy codes directly as well (much more precise).
How to find practices that are associated with hospitals or owned by other practices
You and your physician liaisons know very well that in some cases, all decisions are made by the “corporate locations” and sometimes decisions are decentralized and made by the “sites”. It is a good idea to have an idea about the organization structure before you market to a practice.
For this, let’s look at the following query – specifically, parent_organization_lbn
At least now you can decide whether you want to market top down or bottom up. If you use our physician healthcare CRM, this data is available for you to use.
Let’s look at the independent practices around 10451 zip code
Interesting – didn’t expect so many of these practices to be independent.
Let’s find all practices that are owned by the big guns (If you use our physician healthcare CRM, this data is available for you to use)
Here’s what the output looks like.
Another way of looking at this same data would be to run this query