Cautionary tale: Do you know where your customer database is?

Cautionary tale: Do you know where your customer database is?

Summary: Social Security numbers, credit card information, and more indexed in Google because of careless human error. Read my first-hand account of finding an entire database comprised of this type of information in Google.

SHARE:

What follows is a story involving a customer database from an up-and-coming utility company that I discovered via advanced querying in Google, what I did when I found it, and how the company responded afterward. And when I say I found a customer database, the find was quite significant, considering it contained some combination of the following for each customer record therein:

  • First and last name
  • Address
  • Phone number
  • Email address
  • Social Security number
  • Credit card number (including expiration date and security code)

That's quite a staggering amount of information about oneself to end up in Google's index, isn't it? Whether you're a business owner who maintains customer data, a 3rd party entity who gathers customer information with which to pass on, or a consumer who is simply trusting your data through an online exchange, you owe it to yourself to read about this.

Just to give you a quick, concise background with which to serve as the platform of this story, I basically enjoy performing advanced search queries in Google to find information that would blow most of you away to know as existing in a search engine's index. This includes Social Security numbers, credit card information, passport information, and far more.

Well, one night about three weeks ago, I decided to search specifically for credit card information. Needless to say, I found what I sought to find and far more: not just one, but TWO customer database files -- one 4 MB .txt file and one 13 MB .bak file that was also plain text -- comprised of ~11,000 users, each containing varying amounts of confidential information as noted above.

What do you do when you find something like this? Well, that depends on your ethical stance, but I like to inform the companies/individuals so they can remove the files immediately. But instead of ending it there, this time, I decided it was worth it for me to document everything and write a story about it after all was said and done.

With that in mind, I set out to find the contact information of the CEO, and after a bit of investigation, I managed to find his email address and contact number. Not even 10 minutes after sending an email I crafted which detailed who I was, what I had found, how I found it, and how he could get in touch with me if he wanted to speak further about matters, the files in the directory on their Web site were pulled and the directory locked down.

Now, while that's a great first step, finding something in Google means that a certain amount of information can still be found in its index for a certain amount of time even after source files have been removed. Luckily, that, too, was taken care of within a couple of days, no doubt thanks to Google's online request tool to remove content from their index.

Back to the day I sent the email, the CEO of the company actually called me later that night and we spoke for around 30-45 minutes about a multitude of things, such as what the files were that I discovered, why they were there, unencrypted, in an open directory, and the course(s) of action they planned to take with the affected customers, amongst other things.

To start, the company uses a third party entity to take orders for them. That company then sends them the information submitted by prospective customers, and they then verify the information before ultimately providing service. According to the CEO, a certain amount of this data is ultimately fake information provided by people who either don't want to provide it at the time, or who are trying to get away with providing false information with which to receive service. More on this in a bit.

As for why the data was being transferred to them unencrypted by the third party order-taking entity, he had no explanation and sounded thoroughly disappointed. Likewise, the reason I was able to find the files in the first place was apparently due to their moving files around on servers internally and placing those particular files in that directory, while forgetting to restrict access to it. How Google's bot located the directory in the first place is another story which remains told only to them through their traffic logs.

Back to the point from a couple of paragraphs up in regards to the integrity of the data, the CEO implied the impending task of verifying, then discerning, the real information from the fake within the database files. After that, they would need to look back at logs to see how many IP addresses had hit those files within the directory, so as to establish the extent of reach for the files. He already knew the directory was first hit by Google's bot about two weeks prior to me contacting them, so the files effectively sat in Google's index for about 2 weeks.

Once all of that is accomplished, then remediation begins. But how, exactly, the CEO planned on contacting affected individuals was unknown at the time of our conversation. Additionally, he was unsure as to what they specifically planned to offer in the way of compensation for the error on their behalf, but he made mention of credit/identity monitoring for a year or two as being a consideration.

So, after all of that, what's the lesson? Well, there are a number of them:

For starters, as a customer, you just never know how your data may potentially be mishandled and it's scary to think that you may become a victim of identity theft because of a Google query and the gross mishandling of your data, essentially. As we become more connected, social, and reliant upon the Internet, I see these types of occurrences eventually becoming primary explanations for identity theft instead of obscure ones seen in cautionary tales like this.

As a trusted entity with confidential information, at the very least, you should encrypt your data. You would think this to be a standard implementation enabled by default within any type of solution used to take orders these days, but I guess not. Make sure you check into enabling this or adding it, if you have your own custom platform.

Lastly, as a service provider, in addition to keeping customer information encrypted, make sure your Web site is properly configured -- especially if you plan on using it for any sort of sensitive data handling. Restrict access to private directories and add them to the disallow section of robots.txt so as to keep them from being indexed in search engines. I noted those two facets in that particular order, because if search engine spiders can read robots.txt, then so can people who are interested in manually going to that file to see which directories you're interested in preventing search engines from indexing.

To note, I (obviously) decided not to go public with who this company is or the search queries that led me to the database files in the first place. I want to leave it to the company to get things sorted out and to contact all affected individuals, and I don't want to empower the wrong people with search queries that aren't readily available on the Web -- even to advanced searchers.

In closing, I'd like to note that I plan on touching base with the CEO again in the near future to see how things have panned out since our initial conversation. Hopefully, I'll have a follow-up post to bring you -- perhaps even an interview with him -- depending on how that conversation transpires.

Thanks for reading.

*Special thanks to my esteemed colleague, Ed Bott, who offered sound guidance to me in the early stages of this affair.

-Stephen Chapman

Related Content:

Topics: Government US, Browser, CXO, Collaboration, Google, Government, Security

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

7 comments
Log in or register to join the discussion
  • RE: Cautionary tale: Do you know where your customer database is?

    "Do you know where your customer database is?"

    Good question. The problem with cloud computing right now is that there's no good way to tell what's happening with a cloud provider.

    So - why are we rushing forward with cloud computing without fixing this issue first? Why not build a solid foundation first before pushing forward?
    CobraA1
    • RE: Cautionary tale: Do you know where your customer database is?

      @CobraA1
      +1
      hkommedal
  • The problem is bigger than the "cloud" - It's about outsourcing

    If you outsource, and the cloud is about outsourcing plain and simple, then you have a responsibility for the consequences.

    That's the dark side of the whole thing. Outsourcing is always pitched as "get something without all the work"... well, in this case company didn't have to run it's own SQL server or whatever, but the "work" didn't disappear. It was supposed to be auditing what their contracted service provider was doing, doing spot checks, etc.

    There is no free lunch. Ever.
    croberts
  • Encryption is not always sufficient

    Certainly encryption has its place, and it's especially useful when no access control exists, such as on a backup tape.
    However using encryption on a website is not effective if the encrypted data is meant to be accessed by a user's web browser. That's because the website application will transparently decrypt the data for all *authorized* users. It all comes back to access control.
    The case you describe neither employed an effective data transfer solution or effective access control.
    Spatha@...
  • RE: Cautionary tale: Do you know where your customer database is?

    When you give personal information to a business, or even to government, there is no guarantee that it will be securely handled. There have been a number of laws created to address this issue, but oversight and auditing seems to be almost nonexistent.

    What is really frustrating is when a business is required by law to take critical info, like SSN, DL #, name, etc. and then they transmit this info, in the clear, to a government entity.

    You would think the people in charge would be too worried about lawsuits and/or bad PR to let this type of stuff happen; but in my experience, those same people either seem ignorant of the issue or just don't care.
    aJollyGeek
  • No, no, no, and no

    There must be so many PCI violations here, it makes my head hurt.

    It would be bad enough if there was no third party involved and their database was visible on Google. But it is especially egregious that the 3rd party was transferring data to them, either not over an encrypted VPN, or if over VPN, that the company was stashing the received file on a machine that hosts thier website. No files (especially customer data files) should reside on the machine that hosts any decent sized companies website, besides those files absolutely required for website functionality. And the website machine should be firewalled as much as possible from the rest of the company's internal network.

    "Restrict access to private directories and add them to the disallow section of robots.txt so as to keep them from being indexed in search engines."

    This shouldn't be required in the first place, because there should be no private directories to advertise in robots.txt. In fact it would be terribly bad to advertise there what you don't want hackers accessing. Because then you are giving them the address to find sensitive files if they can find a way around the rest of your network defences.

    Absolutely terrible all the way around.
    colinnwn
  • RE: Cautionary tale: Do you know where your customer database is?

    It would be interesting to set up a web-server with non-existing private directories listed in the robot.txt to see how many attempts are made against it, and/or to see how long it takes to bring the server down. Just to see if this is really a pressing concern, or just a bad practice. (It would be slightly amusing to see the hackers sweating it out trying to break into something that isn't there.)
    mlashinsky@...