S3 Leak: 3,000 Permanent Account Number (PAN) cards and National ID (Aadhaar) cards from India

In the normal course of scanning for open/exposed/vulnerable Amazon S3 buckets, I discovered a bucket containing 3,000 Permanent Account Number (PAN) cards and National ID (Aadhaar) cards from India.

S3 Leak: 3,000 Permanent Account Number (PAN) cards and National ID (Aadhaar) cards from India

Summary

In the normal course of scanning for open/exposed/vulnerable Amazon S3 buckets, I discovered a bucket containing roughly 3,000 imaged of Permanent Account Number (PAN) cards and National ID (Aadhaar) cards from India.

After concluding the images and PDFs appeared to be sensitive identification documents I immediately attempted to make contact with India's CERT team to notify them of this data leak. I can confirm roughly 3 weeks after notice was provided that action has been taken to secure the S3 bucket.

It is unclear who the owner of the Amazon S3 bucket is/was. From the types of documents being added, the frequency, and from how some of the images look -- my best guess is that some type of scanner was set to upload (or backup) files to this S3 bucket.

I would like to thank India's CERT team for their assistance in securing this S3 bucket.

Incident Timeline:

Date Event
December 1, 2018 Open bucket discovered.
December 2, 2018 Email sent to India's CERT team.
December 2, 2018 Acknowledgement email from India's CERT team with a reference number.
December 5, 2018 Follow up email sent.
December 5, 2018 CERT team indicated they are in touch with Amazon.
December 21, 2018 Asked a contact for assistance with resolving this issue.
December 22, 2018 CERT tean confirms the S3 bucket is now secured.

What's in the bucket:

The S3 bucket contained roughly 4,800+ files -- mostly images, but it also contained PDFs.

An example file path is:
/private-documents/absolutegenericdocument-1/2018/7/7/JPEG_2018_06_29_17_20_27_-90041938.jpg

The "absolutegenericdocument-" level folder had 4 folders that contained distinctly different types of documents.

Within each of the "absolutegenericdocument-" level folders the documents were organized by year/month/day/file.

Side note: The irony of a an S3 bucket full of sensitive data being named things like "private-documents" and "absolutegenericdocument" is not lost on me. They could have named the folders something like "totally-meant-to-leak-this".

The "absolutegenericdocument-1" folder appeared to contain primarily images of Permanent Account Number (PAN) cards. It also contains some PDFs of images as well.
Document count: 1,142 images

absolutegenericdocument-1_redact2

The "absolutegenericdocument-2" folder appeared to contain images of India‘s national ID card known as “Aadhaar”.
Document count: 1,109 images.

absolutegenericdocument-2_redact

The "absolutegenericdocument-3" folder appears to contain miscellaneous images of people, mostly "head shots" -- like the type of photo you'd use for an ID
Document count: 1,082 images.

The "absolutegenericdocument-4" folder appears to contain "pay in slips" PDFs.
Document count: 942 PDFs.

absolutegenericdocument-4

The S3 bucket was still being actively updated with new images as of December 21st -- the date I received assistance from a contact who helped me escalate this report.