So long, Thomas.gov: Inside the retirement of a classic Web 1.0 application

We talk to the Library of Congress' Chief of Web Services about the challenges and technologies of Thomas.gov, the government's early and important online resource.

library-of-congress.jpg

The Library of Congress in Washington. Image: iStock

When I first set foot inside the United States Library of Congress, sometime in the late 1970s, I was awestruck. Within the main buildings (and stored in a variety of secondary facilities) was the largest curated collection of books in the world. You could request almost any book, on almost any topic, and within a relatively short time, the knowledge of the universe would be placed in your hands.

Back in 1783, shortly after the 13 American colonies united into a nation, James Madison had an idea: perhaps it would be good for those serving in Congress to be able to find information as they were evaluating the choices and methods of governing the new nation. It took another 17 years for the idea to germinate, but in 1800, John Adams allocated about $5,000 for the purchase of roughly 750 books, which were then stored in an apartment in the new capital city of Washington D.C.

Since that time, two fires destroyed the Library's collections, and each time, the Library came back stronger and with a more complete collection, until it became the largest collection of knowledge in the world. Today, of course, we simply yell at the slabs in our pocket and the knowledge of the universe (and an infinite supply of kitten videos) is delivered to us anywhere we happen to be.

The early days of Thomas

In 1995 -- when Newt Gingrich was Time's Man of the Year (seriously, he was) and Seinfeld's "No Soup For You" was the hot catch phrase, and the Web began to take real form -- the Library of Congress set out to create a powerful new, online resource: Thomas.gov (then located at thomas.loc.gov).

An early image of Thomas

An early image of Thomas, captured by Library of Congress officials, from a newspaper print of the screenshot.

New Gingrich is particularly relevant to our story of Thomas, as he was a proponent of the service. Recall that Bill Clinton was president and a Gingrich-led "contract with America" had catapulted Gingrich to the first Republican majority in congress in 40 years.

Back then, Gingrich was particularly visible in his rivalry with the Democratically-held White House, and this rivalry now extended to cyberspace (commonplace today, but more analogous to the space race back then).

In a January 1995 article entitled " Mr. Smith goes to cyberspace," The New York Times compared Thomas.gov to the White House's early online presence, saying, "Today, Mr. Gingrich proudly unveiled a new system that will allow Congress to match electronic wits."

Named after Thomas Jefferson, the U.S. president who donated his vast collection of books to the Library after the first fire, Thomas was a repository of congressional legislative information, including the full texts of bills and resolutions.

Interestingly, a strong criticism at the time of the plan to put congressional records online was the cost of access. Thomas.gov itself was going to be free, but the big complaint was that accessing the Internet required a computing costing $1,000 or more -- and Web access was far from the ubiquitous thing it is today.

In fact, the Web was so new that the Times article had to define it for readers:

The World Wide Web allows people to use a new generation of software to jump from site to site and subject to subject just by clicking on icons or words.

According to the Times article, Gingrich had a plan for the cost of access, a plan that today would seem to be ripped from a Bernie Sanders play book, rather than from the first Republican Speaker of the House in decades:

"Maybe we need a tax credit for the poorest Americans to buy a laptop," he suggested. "Now, maybe that's wrong, maybe it's expensive, maybe we can't do it, but I'll tell you, any signal we can send to the poorest Americans that says, 'We're going into a 21st century, third-wave information age, and so are you, and we want to carry you with us,' begins to change the game."

Back then, another concern about putting Thomas.gov online was the slow speed of access. Remember, these were the days of landlines and 14.4 modems. In fact, the entire Library of Congress web presence was fed with a 1.5 Mbps uplink. Most of us can upload data an order of magnitude faster from our iPhone and Android devices. The Times reports a Library of Congress official as stating, "You could take a shower while you wait for some of this to come in over a regular line."

Launching Thomas

Thomas got off to what back then seemed like a heck of a pace. More than 36,000 people accessed the system in the first nine days of activity. Today, that's normal traffic to a single article on any moderately popular tech site.

Interest in the service pick up quickly. Within 38 days of launch, Thomas had answered a million queries, a particularly impressive feat given the lackluster bandwidth and relatively low penetration of web-capable computers at the time. (Remember, even Windows 95 had yet to be released.)

Flyer describing Thomas

Note the explanation describing the World Wide Web.

The following flyer, used to describe Library of Congress services overall, specifically notes accessing LoC resources using Lynx and a VT100 terminal!

Accessing the Library of Congress

I actually had a VT100, it was a heck of a terminal. Remember, these were the days of Windows and WFW 3.1. Windows 95 had not yet been released.

The middle years

Today, we're sadly used to constant talk of hacking and penetration. But in the year 2000, hacking incidents weren't nearly as common -- and the hackers were still motivated more by amusement than data exfiltration, extortion, and profit. On January 17, 2000, Thomas was hacked by a group claiming to be "four hackers from a little country in Europe." The home page of Thomas was defaced for about an hour and a half.

Here's what Thomas looked like in 2004:

Thomas in 2004

Thomas in 2004, before a "modernization" facelift.

After 10 years online, Thomas got a major look and feel upgrade. Announced in November 2005, Thomas updated its home page to "increase visual appeal." The site also got a left-of-page menu to provide better access to sections of the site. Thomas users could browse through legislation by the sponsor of the bill, plus the site added links to more government resources.

Thomas 2005 facelift

Thomas after upgrading in 2005.

The 15th anniversary of the site, 2010 marked the last year of substantive upgrades. In June, the site added bookmarking and sharing widgets for dynamic pages, more consistent headers, and improvements to searching. In August, working hard through the congressional recess, Thomas' developers extended support for mobile devices. This was before smartphones, so the focus was on heavy BlackBerry support. Finally, by December, Thomas added support for additional indexing and searching tools related to congressional bills.

Thomas in 2011

Thomas, as it was shortly before migration began to Congress.gov.

Migrating to Congress.gov

By 2012, though, the writing was on the wall. Thomas had a "fragile infrastructure," according to Andrew Weber, Legislative Information Systems Manager at the Library. The move was afoot to a new site, Congress.gov, which would be more maintainable and use more modern technology.

Introducing Congress.gov

A first view of Congress.gov

Thomas served Americans well for the past two decades, providing legislative information to anyone who had online access. While Thomas managed to serve Congress and American citizens well, it has reached the point where retirement became necessary. Its services have been migrated to a much more modern system available now through Congress.gov. Originally intended for retirement in 2014, Thomas got an additional few years of life, but attempts to access the site are now being redirected to Congress.gov.

Interview with LoC's Chief of Web Services

I recently had the opportunity to speak with Jim Karamanis, Chief of Web Services at the Library of Congress, about the IT considerations and managing this well-loved legacy system. I wanted to know the architecture of the old Thomas.gov system. Here's what he said:

Thomas is based on the no longer supported Inquiry search engine. It sits on a single large legacy web server with no redundancy and no ability to create permanent URLs. Simple maintenance of the legacy application required significant downtime to ends users. The original code base has been deemed so unstable that we are no longer able to upgrade any functionality. The authors of the original Thomas code base were Federal FTEs [full-time equivalents, or employees] that are now retired.

Thomas supported millions of users on an annual basis. The user base has almost completely switched over to Congress.gov. The majority of usage of Thomas at this point is machines that are scraping the site for content. Now that GPO is making all of the data available in bulk XML, Thomas is no longer needed and thus obsolete.

Karamanis described how Congress.gov is picking up the torch:

Congress.gov is built based on modern open standards for code development and modern web delivery. There is virtualized redundancy at all tiers of the web application. Modern standards mean we can deliver an interface that is responsive and thus very mobile friendly. Faceted search has greatly improved the user experience. We continue to grow Congress.gov to meet the needs of the Congress and U.S. citizens.

You have to wonder what Thomas Jefferson would have made of the Internet, Thomas.gov, and Congress.gov. Considering how much of an innovator, man of curiosity, and scholar old TJ was, I think he'd have been very proud.

By the way, I'm doing more updates on Twitter and Facebook than ever before. Be sure to follow me on Twitter at @DavidGewirtz and on Facebook at Facebook.com/DavidGewirtz.

Newsletters

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
See All
See All