Porn, piracy, and personal data: Universities providing more than just education...

Summary:Illegal downloads being provided by schools on behalf of students and faculty? Private data being stored unknowingly in the open? All of this and much more in the second and final part of my .edu exposé!

UPDATE, 5/19/11 2:13PM EST: It appears word is getting around to some of the schools I've linked to in this post, as links are beginning to die off (which is a good thing for the universities taking care of business). For the moment, respective Google queries still yield their results that reflect what now once was...

Directly on the heels of my latest post where I exposed pornography residing on university Web sites like Harvard.edu, Yale.edu, and MIT.edu), it's now time to show just how much extracurricular content many universities are inadvertently making available to the world via their Web sites. Get comfy and settle in as I continue my exposé on .edu domains and guide you through methods to find all sorts of content to download from them.

To start, if you think finding pornography, spam, supportive hate speech spam, etc. plastered across prestigious university Web sites is bad, then take into consideration the following list which showcases a larger picture of just how much more there actually is to be found:

  • MP3s
  • Movies/Documentaries
  • Applications/Software
  • Games/Roms
  • Ebooks
  • Whole curriculums
  • Personal data (tax documents, grade documents, etc.)
  • Intellectual property

Via the list above, you can see there is much more at stake than simply reputational damage to a school's image for having a little bit of pornography -- as noted in my previous post. There are concerns of potential legal woes, identity theft, theft of intellectual property (as in a body of work by a student or faculty), whole curriculums being made available for free, and more -- all touching various aspects ranging from just one individual to an entire university!

After giving plenty of thought to how I want to approach this post, I've decided to take each of the items on the list above and elaborate on specific scenarios for each, respectively. In case you wouldn't have otherwise, I'd like you to take notice of the similarities/patterns within the search queries I utilize for each scenario as they're a real testament to how one can fine-tune a specific set of search queries to use over and over again with very minor changes. Finally, I will conclude the post with some ideas for remedy and prevention.

If you want to read a really good primer (if I do say so myself, *mustache twirl*) on advanced Google searching before we get started, then have a look at this post. Many of the concepts I delve into below can be found explained within that post, should you find yourself completely lost with exactly what it is I'm doing.

MP3s

Interestingly enough, I have sat back for years and watched as P2P software has pigeonholed the attention of the RIAA, MPAA, and other lawsuit-seekers of the intellectual property/copyright variety. In this section, I'm going to show you how much music is out there just waiting to be downloaded straight from .edu domains this very moment. And my primary target? Harvard.

Let's assume you really like Kanye West, so you decide you want to find his music residing in a student/faculty directory such that you can freely nab it at fast download speeds. You now head on over to Google and formulate a nice little query that you think will get the job done -- initially, at least. You decide to go with the following: site:edu intitle:index.of "Kanye West"

Well, at first glance, it appears there isn't much... but looks can be deceiving and I'll show you why. Let's click on this directory found within those results. As we can see, the song "Gold Digger" is located there. Now, look up in your address bar in that page and take a look at the actual URL: http://www.people.hbs.edu/ffrei/MSOMaterials/iTunes/iTunes%20Music/Kanye%20West/Late%20Registration/

I don't know about you, but I'm thinking this looks pretty darn promising for there to be at least one other band located in the /iTunes%20Music/ directory! Let's remove the Kanye%20West/Late%20Registration/ bit from the URL and see what it looks like: http://www.people.hbs.edu/ffrei/MSOMaterials/iTunes/iTunes%20Music/

"Dear diary... JACKPOT!"

Just look at all the music sitting in that directory, freely available for the taking. Now, if you actually do some digging around, you'll notice that there aren't only singles in those directories; there are entire albums as well -- such as "Lovers Rock" by Sade and "Trouble" by Ray LaMontagne.

Perhaps worst of all (and the part that makes this kind of thing feel more "real") is that this directory belongs to that of a professor at Harvard Business School; yes, of the very same Harvard I made the primary focus of my previous post. How do we know this? Well, continuing on up the list of directories until we arrive at http://www.people.hbs.edu/ffrei/ is how we know. The final nail in the coffin is visiting http://www.hbs.edu/ and finding that it is indeed the site of Harvard Business School.

Your mileage may vary, depending on which artist you choose to search for, but it's when you start traversing directories that you often find a rather startling collection of MP3s. As seen in the screen shot below, this entire collection is all indexed in Google -- just waiting for someone to come along and do just the right search.

And that's all just one person's contents on Harvard! Need the album "Viva La Vida" by Coldplay? Rice.edu has you covered. What about the song "One" by Metallica? MIT's got your back. And isn't the new Pirates of the Caribbean movie coming out soon? Perhaps you're in the mood to hear the sountrack to "Curse of the Black Pearl!" Aye, Johns Hopkins University won't leave ye stranded! We could continue this all day; just don't forget to have a look around other directories on those sites -- especially Johns Hopkins.

Movies/Documentaries

Admittedly, movies are somewhat of a crap shoot; sometimes, I can find them like there's no tomorrow; other times, not so much. This just happens to be one of those "not so much" times. Finally, though, perseverance paid off with the following query: site:edu intitle:index.of avi 700M

My thoughts behind that query were to search for avi (movie) files (there's a filetype: operator in Google, but it's a finicky beast that I rarely utilize) that are 700MB in size (file size is a value typically displayed in a textual format in file indexes, which search engine spiders can easily pick up on and index). Though there are only two results from the query, we are met with yet another prime example of looks being deceiving. When exploring this particular result from the Milwaukee School of Engineering and stepping two directories up, we end up here, where we see many interesting documentary and directory names. Upon investigating all directories, luck would have it that this directory contains a DVD rip of the Will Smith action thriller, Enemy of the State. Though I didn't set out to find it in particular, there it is -- and this type of "dumb luck" happens far more frequently than you would think when you start traversing file directories.

Since this one was such a pain in the butt to dig up, I'll just leave this scenario as-is -- especially since the example directory I provided has both documentaries (which are much easier to find) and movies. I was just focusing on Hollywood, though. For whatever field of study you're in or whatever your unique passion/expertise may be, try searching for educational/instructional videos that would typically cost you money and see how you come up -- especially if you're into something like programming, neuroscience, astrophysics, etc. To give you a head-start, try the following query to build some ideas from: site:edu intitle:index.of DVD|avi|mpg "C++"

Applications/Software

Obtaining licenses to ubiquitous products -- like Microsoft products -- from universities is much easier than it should be. There was a time years ago where I ran fully-licensed copies of Windows and Office using licenses I obtained from exported MSDNAA key lists (lists that can contain hundreds to thousands of licenses for innumerable Microsoft products) residing on .edu domains. Yes, I've been doing this type of searching for many years now and my findings are never any less interesting, even if I do nothing with them these days.

Remember earlier when I mentioned "dumb luck?" Well, such was the case when searching for movies earlier and I instead ran across something Office 2010-related. Yes, while randomly doing some directory surfing within an employee directory at Brigham Young University Marriott School, I stumbled upon an Office 2010 Professional license contained within a document found in this directory. Truly amazing. And I haven't even tried one of my usual search queries along the lines of site:edu intitle:index.of MSDN | MSDNAA or site:edu intitle:index.of "Office 2010" | Office2010 | "Windows 7" | Windows 7 +key yet!

Giving the former of the previous two queries of a shot, I was able to quickly flesh out two directories containing relevant results: One directory from Bossier Parish Community College containing a whole slew of ISO files (and, in some cases, licenses) for Windows 7, Windows Vista, Windows XP, VMWare, etc. and another directory from an Indonesian university, PENS-ITS with licenses to more dated versions of Windows -- but licenses nonetheless.

Getting away from Microsoft now, even such generic searches as site:edu intitle:index.of crack can yield interesting results -- like this directory at MIT which contains a program (along with a crack) that normally retails for ~$600. Additionally, we can find directories like this (from Case Western Reserve University) which have been around for years housing cracked software.

And with that, I would like to direct you to page two, where I cover games/roms, ebooks, curriculums, and personal data. Then, on page three, I'll delve into intellectual property, solutions/preventative measures, and finally, the conclusion to my .edu exposé.

Taking you further down the rabbit hole, Alice... »

« Previous page

Games/Roms

As with movies, finding games can be a bit of a challenge without either putting in some time to formulate proper queries or just searching for the right thing on any given day. For the moment, any really recent games that I'd like to find (like Crysis 2, Portal 2, etc.) just aren't yielding any results; however, there are plenty of older WoW installs with cracks (from Case Western Reserve University), directories containing archives of Popcap games -- such as Bejeweled, Peggle, etc. -- (from Rutgers Institute), and directories with "The Sims" files scattered throughout (from Brandeis University).

With the issue of finding more recent games aside, some of you are probably familiar with emulators and roms. If not, emulators are programs that were written to emulate something; in this case, they emulate video game systems like the NES, SNES, Sega Genesis, etc. While emulators themselves are completely legal and free (for the most part; some cost money and others have seen court battles), roms are not. Roms are essentially game dumps and Nintendo is pretty strict about trying to keep their old games off of the Internet. While it may not seem like a big deal since NES/SNES/etc. games are dated, you must take into consideration that Nintendo decided to try to monetize their old works via the Virtual Console of the Wii and via DSiWare downloads on the Nintendo DS. So, in this context, directories like this (from Union College) and this (from Galileo University) contain files that are a no-no to share or possess -- unless one possesses the actual cartridge of the game, which then only makes it okay to possess a rom; not share it.

There is much to be found game-wise scattered throughout .edu sites, but digging up more recent titles is a very hit-or-miss affair. Try starting with some as basic as site:edu intitle:index.of Games | Gamez and see what you can find. You may be surprised!

Ebooks

Simple enough, copyrighted books are aplenty these days -- though you rarely (if ever) hear of a publisher or author pursuing someone for pirating their content. As such, ebook directories pretty much litter the Internet -- .edu domains included. Proof of this can be found by exercising a very simple query like site:edu intitle:index.of ebooks.

From those results, we find directories like this one from the University of Tennessee Martin that is chock-full of O'Reilly books. There is also plenty to be seen here (from UMass Boston), here (from Northeaster Illinois University), here (from an Indonesian university, PENS-ITS), here (from Galileo University), and here (from Milwaukee School of Engineering).

Remember that with everything I have written in this article thus far, I'm not at all implying that any given person is themselves thieving/pirating content; I'm making a statement to the effect of all of these things being made freely available for download to any random passerby.

Whole Curriculums

Basically, the concept here is that people can garner an education from an accredited university without ever having to pay a dime for it. Granted, they won't get the piece of paper that says they graduated, but with a desire to learn accompanied by some advanced Google search querying, an unbelievable amount of material can be scored. For instance, check out the following 2 queries and try interchanging some of the words with your own to see what you can dig up:

site:edu "Introduction to C++" Course site:edu Neuroscience 101

This all equates to money potentially being lost by a school or simply having their content stolen if they didn't intend for it to all be public. Even worse, I once found a curriculum which contained all of the course files for the whole span of the class -- including quizzes, tests, and the answers to go along with them! And no, this wasn't a free course, either. Quite amazing...

Personal Data

Unfortunately, my evidence for this portion of the article is going to have to be strictly anecdotal since I don't want to link to any information that could be used to the detriment of those I found it from. Long story short, I have unearthed tax documents (which include Social Security numbers along with names and addresses), grade documents, mortgage documents, bills (including names, addresses, and account numbers), receipts for software purchases (including licenses, totals, names, numbers, addresses, and email addresses), family photos with documents containing information related to other family members, and much more.

In my humble opinion, disclosing documents containing Social Security numbers, personal addresses, monetary status, etc. have no value being posted here for the purpose of proving they exist. I understand this will be perceived as the spreading of FUD to some of you, but I'm okay with that. The part I'm interested in is to create awareness; just as I did earlier this year via an SSN write-up I wrote. This section is primarily for those who may be storing this information openly, whether they realized it prior to now or not.

With all that said, here's a tiny nibble of the type of query that needs to be formulated to start seeking out information as mentioned above (don't expect to find anything substantial with this query, but you will clearly get an idea for what's going on): taxreturn site:edu filetype:pdf

And with that, it's on to the third and final page where I discuss ideas of intellectual property being stolen, provide some solutions/preventive measures, and draw my final conclusion. Hang in there; it's almost over! ;)

Concluding my .edu journey... »

« Previous page

Intellectual Property

If you're a student or faculty member, imagine working on an idea for weeks/months/years and storing all of that data in your university-designated directory... only to have someone roll by one day and steal it, thanks to Google having indexed all of your hard work within your wide-open directory. Wouldn't that be a disaster if you had no idea and/or had no protection (patent, copyright, etc.) put in place to safeguard your work?

It really doesn't take much for someone to find their way to your work if they're interested in the same thing you're researching/working on and they search for just the right keywords based on what Google has indexed. Who's to say some student at some other school just can't come up with the right search query to find information for a report, so they go messing about for others' research/reports to steal ideas from? Naturally, this type of thing is certainly not new, but as Google and search engines in general get more and more powerful, it only becomes that much more accessible.

Ultimately, this all just gets around to the simple notion of being cautious with what you place on your school's Web site -- especially if your school's directories are unprotected and wide-open to any Web surfer. You will have to be the one to decide if what you're working on is worth safeguarding or not, insofar as what I'm implying.

Solutions/Preventative Measures

For this section, I'm simply going to reiterate most of the last point I wrote in the previous post, since all of that information there absolutely applies here.

Student/Faculty Directories: As I've demonstrated throughout this post, it's all about student/faculty directories, what's stored in them, and whether or not they're wide-open to the public or not. Obviously, they're convenient; they serve a purpose; they're perceived to be safe, and it's clear they are well-utilized throughout the .edu world. Given everything I've covered, below are some solutions and preventative measures to take into consideration, whether you're a student/faculty member or an administrator.

To start, admins should disallow publicly-viewable student/faculty directories and require authentication for a student or faculty member to access shared drive data. This will prevent someone from being able to link directly to a file or image, as well as prevent people from directory-surfing. Additionally, you can make use of robots.txt to disallow Web spiders from crawling directories used for students/faculty. Just be aware that anyone can access your robots.txt file directly to see what kind of directories you're hiding from search engines, so take that into account. At the very least, a blank index.html file placed in the root of a directory will prevent anyone from being able to view the contents of that directory. While that solution isn't scalable whatsoever, it's worth the mention should you find it applicable.

As for students and faculty members, the much more simple message is to just be cautious of what you decide to store in your directories, because you just never know who's watching. Don't even rely on the fact that the general public can't see your directory if your school is currently set up as such, because things can change without you even being aware. I hope I have done a good enough job of making my case as to why you should be extremely cautious with what you choose to populate your school directory with. It would be bad enough to have your hard-earned work thieved, but the wrath you could face for storing things like applications, games, music, and other similar items may just be to the detriment of your longevity where you are!

Lastly, for any administrators or faculty interested in seeing what might be cached in Google, read this Google primer of mine and learn how to be proactive with seeking out loose ends to shore up. There are plenty of examples for you throughout this post, too! Simply replace site:edu portions of search queries with your school of interest, i.e. site:harvard.edu.

Last Words

I would like to begin my conclusion by stating that I checked the validity of all of my links using the open source LinkChecker program. This allowed me to not download any files that might have otherwise had some kind of risk associated with them. After all, just because you download from HTTP doesn't mean there aren't logs with your IP address -- especially if a site monitors its traffic and keeps thorough records. Additionally, I sought the advice of legal on our end here to make sure it was fine to post links to directories in the manner that I did; so as long as I didn't link directly to any given file directly, I've made it safe both for myself and for you, the reader!

Just to create a topic of discussion for the comments section; theoretically speaking, do you think there should be a course of action taken against schools that have -- knowingly or not -- allowed students and/or faculty to store music, movies, software, passwords, software licenses, etc. in their own directories, while knowing having those directories open to the whole of the Internet -- effectively making them file sharers sharing potentially-pirated content? What about the students/faculty themselves? How do you feel they should be reprimanded -- if at all -- for storing contents within their directories that do not abide by school rules or standards (provided there are rules/standards that have been set along these lines in the first place)?

And with that, I will conclude my lengthy 2-post exposé on the wide array of content that resides on educational establishment Web sites. While this type of information may be old news to some, the need for additional awareness is made painfully evident through all that we are able to dig up via simple search engine queries I've outlined -- and then some. While I doubt the message my posts carry will last very long or have the full reach I would like for it to, I hope I've created enough awareness to at least make an impact -- even if it's just to educate/inspire people to be much more cautious about what they place on the Web.

In the future, I plan to bring all of this full-circle and show you the true depths of what you can find when utilizing the search techniques I've touched on throughout these posts, and more. I know it may seem as though there isn't a whole lot more to be found out there than the types of content I've outlined in these posts, but trust me; there is -- especially when you step outside the walls of .edu domains, which are only a drop in the bucket in comparison to all other domains combined!

Thanks for reading and please share this post with people who you feel could benefit from its contents.

-Stephen Chapman SEO Whistleblower

Topics: CXO, Browser, Enterprise Software, IT Employment, Piracy, Security, Software

About

Stephen is a freelance writer and blogger based in Charlotte, NC. His contributions to ZDNet cover topics related to security, gaming, Microsoft, Apple, and other topics of interest with a tech/SMB skew.

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.