Incident Timeline:

DateEvent
May 22, 2019Open ElasticSearch database discovered.
May 23, 2019Shanghai Jiao Tong University notified.
May 24, 2019ElasticSearch database secured.

Summary:

While searching Shodan, I recently discovered an ElasticSearch database without any authentication. This database contained metadata related to a huge amount of emails. It was eventually confirmed that this server and the email metadata was controlled by a large university located in China. I would like to thank the university's security team for their prompt action to secure this data once notified. As far as I am aware they have not notified the impacted students though.

2019-06-18 UPDATE

Additional context shared by Shanghai Jiao Tong University:

"According to our investigation, this service was mostly hit by internet scanning, and the total data transfered over the entire month is less than 60 megabytes (including the traffic caused by your research). Also, the access log shows no evidence of bulk data acquisition."

About Shanghai Jiao Tong University:

Wikipedia describes Shanghai Jiao Tong University as: " 'The MIT of the East' since the 1930s." The university has roughy 41,000+ students covering their undergrad, masters, and Ph.D. programs.

How much data?

9.5 billion rows of data which translates to 8.4TB of data. This was email metadata that appears to have been from a popular self-hosted email platform named Zimbra. The database was also growing significantly in size at the time it was secured. On May 23rd I observed the database the database was only 7TB in size, and May 24th the database had grown to 8.4TB.

High level database stats.
High level database stats.

What was exposed?

Based on the metadata I was able to locate all email being sent or received by a specific person. This data also included the IP address and user agent of the person checking their email. As such, I could locate all the IPs used and device type of a specific person.

Email related to a specific person.
Email related to a specific person.

Using this metadata I could see the high level details of a specific email exchange such as which email address was sending or receiving an email from a different email address.

Specific email thread between two users.
Specific email thread between two users.

Note: this database did not contain the subject line information or the body of these emails.