This will be a brief summary.

Summary

kanopy.com has been leaking huge volumes of website access logs as well as API logs via an exposed ElasticSearch database. The database was publicly accessible without any type of authentication. After numerous emails beginning Sunday March 17th and messages on social media they have yet to reply. I ultimately went directly to their hosting provider, who I can only presume, promptly notified the company – only then did they take action to secure the server. The ElasticSearch database was last seen online March 18th, but I have not received any word from Kanaopy regarding my report to communications to them.

They've been leaking roughly 26-40 million log lines per day beginning March 7th.

What did those logs contain?

The website access logs included the following columns of data for every visitor to the kanopy.com website:

host
referrer
pop
user_agent
city
hits
req_body_size
device_type
resp_header_size
geoip.country_name
geoip.region_name
geoip.continent_code
geoip.timezone
geoip.location.lon
geoip.location.lat
geoip.latitude
geoip.postal_code
geoip.city_name
geoip.country_code3
geoip.region_code
geoip.country_code2
geoip.ip
geoip.dma_code
geoip.longitude
resp_body_size
@version
@timestamp
country_code
response
XXXXXXX_id
req_header_size
content_type
request
client_ip
info_state
tls_version
status
port

An Excel file containing website access logs showing my own visit to the kanopy.com website can be found here: https://www.dropbox.com/s/4deceg8pqyodlqy/Kanopy_mylogs.xlsx?dl=0

If you look closely you'll even see when I went to /contact to notify them of this security issue.

The API logs included similar information, with some references to specific libraries, organizations, and other institutions it seems that Kanopy has partnered with. I did observe references to something that looked like a user_id value (more on that below). I did not observe anything resembling leaked API keys though, to be clear.

What could a bad actor do with these logs?

Based on the client IP a bad actor (via the API logs or the web server logs) could have identified all videos searched for and/or watched by their client IP. In combination with the geo information, timestamp, and device type it likely would have been possible to identify the identity of a person behind that client IP (in the case of a static IP from their ISP). D