What follows is a story involving a customer database from an up-and-coming utility company that I discovered via advanced querying in Google, what I did when I found it, and how the company responded afterward. And when I say I found a customer database, the find was quite significant, considering it contained some combination of the following for each customer record therein:
- First and last name
- Phone number
- Email address
- Social Security number
- Credit card number (including expiration date and security code)
That's quite a staggering amount of information about oneself to end up in Google's index, isn't it? Whether you're a business owner who maintains customer data, a 3rd party entity who gathers customer information with which to pass on, or a consumer who is simply trusting your data through an online exchange, you owe it to yourself to read about this.
Just to give you a quick, concise background with which to serve as the platform of this story, I basically enjoy performing advanced search queries in Google to find information that would blow most of you away to know as existing in a search engine's index. This includes Social Security numbers, credit card information, passport information, and far more.
Well, one night about three weeks ago, I decided to search specifically for credit card information. Needless to say, I found what I sought to find and far more: not just one, but TWO customer database files -- one 4 MB .txt file and one 13 MB .bak file that was also plain text -- comprised of ~11,000 users, each containing varying amounts of confidential information as noted above.
What do you do when you find something like this? Well, that depends on your ethical stance, but I like to inform the companies/individuals so they can remove the files immediately. But instead of ending it there, this time, I decided it was worth it for me to document everything and write a story about it after all was said and done.
With that in mind, I set out to find the contact information of the CEO, and after a bit of investigation, I managed to find his email address and contact number. Not even 10 minutes after sending an email I crafted which detailed who I was, what I had found, how I found it, and how he could get in touch with me if he wanted to speak further about matters, the files in the directory on their Web site were pulled and the directory locked down.
Now, while that's a great first step, finding something in Google means that a certain amount of information can still be found in its index for a certain amount of time even after source files have been removed. Luckily, that, too, was taken care of within a couple of days, no doubt thanks to Google's online request tool to remove content from their index.
Back to the day I sent the email, the CEO of the company actually called me later that night and we spoke for around 30-45 minutes about a multitude of things, such as what the files were that I discovered, why they were there, unencrypted, in an open directory, and the course(s) of action they planned to take with the affected customers, amongst other things.
To start, the company uses a third party entity to take orders for them. That company then sends them the information submitted by prospective customers, and they then verify the information before ultimately providing service. According to the CEO, a certain amount of this data is ultimately fake information provided by people who either don't want to provide it at the time, or who are trying to get away with providing false information with which to receive service. More on this in a bit.
As for why the data was being transferred to them unencrypted by the third party order-taking entity, he had no explanation and sounded thoroughly disappointed. Likewise, the reason I was able to find the files in the first place was apparently due to their moving files around on servers internally and placing those particular files in that directory, while forgetting to restrict access to it. How Google's bot located the directory in the first place is another story which remains told only to them through their traffic logs.
Back to the point from a couple of paragraphs up in regards to the integrity of the data, the CEO implied the impending task of verifying, then discerning, the real information from the fake within the database files. After that, they would need to look back at logs to see how many IP addresses had hit those files within the directory, so as to establish the extent of reach for the files. He already knew the directory was first hit by Google's bot about two weeks prior to me contacting them, so the files effectively sat in Google's index for about 2 weeks.
Once all of that is accomplished, then remediation begins. But how, exactly, the CEO planned on contacting affected individuals was unknown at the time of our conversation. Additionally, he was unsure as to what they specifically planned to offer in the way of compensation for the error on their behalf, but he made mention of credit/identity monitoring for a year or two as being a consideration.
So, after all of that, what's the lesson? Well, there are a number of them:
For starters, as a customer, you just never know how your data may potentially be mishandled and it's scary to think that you may become a victim of identity theft because of a Google query and the gross mishandling of your data, essentially. As we become more connected, social, and reliant upon the Internet, I see these types of occurrences eventually becoming primary explanations for identity theft instead of obscure ones seen in cautionary tales like this.
As a trusted entity with confidential information, at the very least, you should encrypt your data. You would think this to be a standard implementation enabled by default within any type of solution used to take orders these days, but I guess not. Make sure you check into enabling this or adding it, if you have your own custom platform.
Lastly, as a service provider, in addition to keeping customer information encrypted, make sure your Web site is properly configured -- especially if you plan on using it for any sort of sensitive data handling. Restrict access to private directories and add them to the disallow section of robots.txt so as to keep them from being indexed in search engines. I noted those two facets in that particular order, because if search engine spiders can read robots.txt, then so can people who are interested in manually going to that file to see which directories you're interested in preventing search engines from indexing.
To note, I (obviously) decided not to go public with who this company is or the search queries that led me to the database files in the first place. I want to leave it to the company to get things sorted out and to contact all affected individuals, and I don't want to empower the wrong people with search queries that aren't readily available on the Web -- even to advanced searchers.
In closing, I'd like to note that I plan on touching base with the CEO again in the near future to see how things have panned out since our initial conversation. Hopefully, I'll have a follow-up post to bring you -- perhaps even an interview with him -- depending on how that conversation transpires.
Thanks for reading.
*Special thanks to my esteemed colleague, Ed Bott, who offered sound guidance to me in the early stages of this affair.
- Beware: Social Security numbers available online via indexed tax documents
- How to Become a Search Ninja: Harnessing the True Power of Google - Part 1
- Search ninja part 2: How to find older versions of software (and much more)
- Porn, piracy, and personal data: Universities providing more than just education
- Harvard.edu: An Ivy League pornographic playground