Incident Timeline:

DateEvent
March 24, 2019Open ElasticSearch database discovered.
March 24, 2019stepstorecovery.com emailed via their published email address.
March 24, 2019Hosting provider for ElasticSearch database notified.
March 25, 2019Hosting provider confirms server owner has taken down the exposed server.
March 28, 2019A follow email sent to stepstorecovery.com, asking if they intended to notify their impacted users, no reply.
April 15, 2019A follow email sent, no reply.

Summary:

Recently I discovered an improperly secured ElasticSearch database that contained personally identifiable information (PII) related to individuals who had received medical treatment at an addiction treatment center. This data appears to cover patient data from mid 2016 - late 2018, and amounts to roughly 4.9 million rows of data. Following notification, the hosting provider of the database took prompt action to notify the owner of the database, but Steps to Recovery has yet to reply to any inquiries. To the best of my knowledge, the treatment center has not notified their patients regarding this leak of their PII.

Investigation:

While searching Shodan I recently discovered yet another ElasticSearch database that was exposed to the Internet without any form of authentication. Based on a quick review of the data it quickly became apparent that the database contained medical information and PII related to patients of some type of rehab center. Based on the name of the database and additional information in the database it appears this was patient data from Steps to Recovery, an addiction treatment center located in Levittown, PA. I initially notified Steps To Recovery regarding the data leak, but also notified the hosting provider given the sensitivity of the data. To date I have not received any reply from Steps To Recovery, but the hosting provider notified their customer who then promptly took action to disable access to the database. It is unclear if Steps To Recovery took this action, or if someone may have been running this database on their behalf.

The Data:

The ElasticSearch database contained two indexes, roughly 1.45GB in size, containing 4.91 million documents. These are not large numbers, but given the sensitivity of any PII leak I treated  this as an urgent issue.

infcharges   906Mi	2.74M
infpayments  549Mi	2.17M

Data related to multiple distinct patients was observed, though (luckily) it did appear that the number of unique patients was likely far fewer than the number of documents in the database would suggest. As demonstrated by the screenshot below, a single PatientID could have multiple rows of data for different medical procedures. Based on a random sample of 5,000 rows of data from the "infcharges" index, I observed 267 unique patients – or roughly 5.34% were unique. Assuming this trend continues, that would suggest the database contained roughly 146,316 unique patients. To reiterate – it's entirely possible this sample of 5,000 rows of data was not representative of the entire index of data though.

Impact:

A leak of PII related to 146,316 unique patients would be bad on any day. It's particularly bad when it is something as sensitive as a addiction rehab center. Given the stigma that surrounds addiction this is almost certainly not information the patients want easily accessible.

What could a malicious user do with this data? Based on the patient name it was simple to locate all medical procedures a specific person received, when they received those procedures, how much they were billed, and at which specific facility they received treatment.

That's just the tip of the iceberg though.

If you search on Google for the patient name and in the example included above "Ohio" where the addiction recovery center was located it becomes trivial to locate more information about this patient.

Sidenote: It's unclear the connection between Steps to Recovery in Levittown, PA and this Ohio Addiction Recovery Center. My best guess is that the patient lived either near Levittown and had visited Ohio, or vice versa. Based on the additional information I was able to easily locate – I can say with confidence the patient almost certainly lives in Ohio.

I've heavily redacted the Google search below – but you can still get a sense for the extent of the information that was immediately located.

This is a creepy Google search.

I did not pay for any of these background reports. I had no interest in going that far.

After briefly reviewing just the freely available information though I could still tell you, with reasonably high confidence, the patient's age, birthdate, address, past addresses, the names of the patient's family members, their political affiliation, potential phone numbers and email addresses.

In conclusion:

Please, please, please secure your data.

I hope that Steps to Recovery will acknowledge this leak of sensitive patient data. I hope they will promptly (it's not prompt any more – it's been a month) notify all of the patients they determine were impacted. I found this data leak purely by accident, but a malicious person could have also found this same data, and potentially used it as part of identity theft.