This will be a brief summary.
kanopy.com has been leaking huge volumes of website access logs as well as API logs via an exposed ElasticSearch database. The database was publicly accessible without any type of authentication. After numerous emails beginning Sunday March 17th and messages on social media they have yet to reply. I ultimately went directly to their hosting provider, who I can only presume, promptly notified the company – only then did they take action to secure the server. The ElasticSearch database was last seen online March 18th, but I have not received any word from Kanaopy regarding my report to communications to them.
They've been leaking roughly 26-40 million log lines per day beginning March 7th.
What did those logs contain?
The website access logs included the following columns of data for every visitor to the kanopy.com website:
host referrer pop user_agent city hits req_body_size device_type resp_header_size geoip.country_name geoip.region_name geoip.continent_code geoip.timezone geoip.location.lon geoip.location.lat geoip.latitude geoip.postal_code geoip.city_name geoip.country_code3 geoip.region_code geoip.country_code2 geoip.ip geoip.dma_code geoip.longitude resp_body_size @version @timestamp country_code response XXXXXXX_id req_header_size content_type request client_ip info_state tls_version status port
An Excel file containing website access logs showing my own visit to the kanopy.com website can be found here: https://www.dropbox.com/s/4deceg8pqyodlqy/Kanopy_mylogs.xlsx?dl=0
If you look closely you'll even see when I went to
/contact to notify them of this security issue.
The API logs included similar information, with some references to specific libraries, organizations, and other institutions it seems that Kanopy has partnered with. I did observe references to something that looked like a user_id value (more on that below). I did not observe anything resembling leaked API keys though, to be clear.
What could a bad actor do with these logs?
Based on the client IP a bad actor (via the API logs or the web server logs) could have identified all videos searched for and/or watched by their client IP. In combination with the geo information, timestamp, and device type it likely would have been possible to identify the identity of a person behind that client IP (in the case of a static IP from their ISP). D