Shashank Shekhar’s photo
Back to all posts

How deep learning can help us understand physician specialties from billions of insurance claims

Finding the right care is complicated, especially when you’re looking for doctors with unique expertise. For example, imagine searching for a pediatric cardiologist—should you pick the pediatrician or cardiologist expert?

At Amino, we care about helping people find the right doctor. That's why we decided to do an investigative project to represent physician data in a way that would allow us to discover how specialties relate to each other and determine one or more specialties for a physician.

We found that a deep learning technique helped us solve this problem effectively.

The problem

Specialty labels are used as a simple way to summarize a physician’s expertise—most doctors have a primary specialty and some have additional secondary specialties.

When searching for physicians, people typically look within specialty groups. The problem with constraining the search to a single label is that people can miss the relationships between specialties. This is especially important in understanding the experience of physicians with multiple specialties, or physicians with a very broad primary specialty (such as internal medicine).

Physicians can have experience across multiple specialities

Figure 1. Physicians can have experience across multiple specialities

An alternate way to describe physicians by their specialty label is to think of physicians having degrees of multiple specialities, so it’s possible to infer degrees of similarity between a physician and different physician speciality groups.

Mathematically speaking, this idea lends itself to thinking about representing all of a physician's practice in a continuous space such that physicians with similar expertise are close to each other, and specialties are labeled as regions in this space that may sometimes overlap.

Amino’s secret sauce

Amino has billions of health insurance claims across 928,000 physicians. A claim contains procedures serviced, diagnoses made, and the patient’s information. Claims data is extremely high resolution and standardized across physicians—and it is possible to group claims data by physician to build vectors.

A technique to build vector representations using all of a physician's claims is to encode the fill frequency of a feature from the claim into an index dimension of a vector and thus make a representation of all the physician’s claims with one vector.

If we can connect these vectors to one or more speciality labels, we can find correlations between specialities.

Physician experience can be represented as vectors in a high dimensional space

Figure 2. Physician experience can be represented as vectors in a high dimensional space

Unfortunately, the simple vector representation is too high resolution. With 130657 dimensions, it is intractable for a classifier that maps a vector to a speciality.

The good thing is that the vectors are sparse. The average sparsity of the vectors is 99.98%.

The solution

An approach to make embeddings of high dimensional vectors that seems promising, and has in fact been applied to patient claims with clinical notes, is a neural net model called Stacked Denoising Autoencoder (SDA)1. An SDA can create lower dimensional embeddings from vectors.

Compressed data is stored in the middle layer of a Stacked Denoising Autoencoder

Figure 3. Compressed data is stored in the middle layer of a Stacked Denoising Autoencoder

The input vector to the SDA is transformed by randomly switching feature values to zero with a corruption probability. It closely resembles the common problem of missing data at random in claims, so it's able to deal with noise in the vectors. The SDA selects weights in the network by maintaining invertibility between the embedding and the vector. This enables the model to do dimensionality reduction without losing too much information. The quality of embeddings is dependent on number of layers, output dimensions, and corruption probability that we optimized using grid search to 2, 500 and 0.05 respectively.

The first layer’s training has a decoder which is the transposed weight matrix from the last encoding layer. The optimizer calculates the loss between the previous (or input, in this case) layer’s output and the decoded version to train the weights. The exact routine is repeated after stacking every new layer. Batch normalization is used to fix covariate shift in the fine tuning step where all the encoding layers and a decoding layer are trained simultaneously.

The vectors are now tractable

In the study Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records1, a classifier that maps SDA embedding of a patient vector to likely diagnosis outcomes achieved higher accuracy over other dimensionality reduction techniques. So we used our favorite classifier xgboost to map the embeddings to specialities.

It works!

By using the latest Centers for Medicare & Medicaid Services (CMS) Physician Compare dataset2 to train a one versus rest classifier for each speciality, we were able to achieve an average accuracy of 95% across all the classifiers.

For a few specialties, such as clinical nurse specialist, gynecological oncology, diagnostic radiology, and physician assistant, even though much smaller sample sizes were used to train xgboost, embeddings outperformed the 130k dimensional vectors.

tSNE visualization of cardiologists and pediatricians

Figure 4. tSNE visualization of cardiologists and pediatricians

Now, let’s look at embeddings with the speciality labels. In Figure 4, we projected the embeddings using tSNE to a 2D space.

Going back to the pediatrician cardiologist example at the beginning of our post, there was no major intersection between the specialities. In order to discover cardiologists with the most similarity to a pediatrician's practice, we sorted cardiologist embeddings by minimum cosine distance to the average pediatrician embedding. Two embeddings had significantly small distance compared to the rest. The embeddings mapped to physicians who have 33 and 27 claims in their vectors respectively for hypothyroidism screening (CPT code: 84443) of newborns. (Hypothyroidism is a condition that is related to cardiac diseases3.)

tSNE visualization of rheumatologists and physician assistants

Figure 5. tSNE visualization of rheumatologists and physician assistants

Figure 5 is a tSNE visualization of physician assistants and rheumatologists. There is a large cluster of physicians with a mixture of the two labels. We found4 that incorporating physician assistants to rheumatology practices is on a positive trend.


With unsupervised learning techniques on the embeddings, it is possible to break down broad specialities like internal medicine. Also, we can measure the strength and express the quality of similarity by looking back at their decoded physician vectors.

Thanks to SDA for the clever dimensionality reduction of the very high resolution claims data, we can now use conventional classifications techniques to understand previously inaccessible connections between physicians and specialities.

For reference, here is the tensorflow implementation of the Stacked Denoising Autoencoder.

Get in touch

We would love to hear feedback and your experiences with using deep learning in healthcare (and with insurance claims data). Please email to get in touch.