X

Business

Part of a ZDNET Special Feature: Coronavirus: Business and technology in a pandemic

Home Business Big Data

Verizon introduces open-source, big data coronavirus search engine

So much sickness, so much data, so little time. To help make sense of coronavirus research Verizon Media has created Vespa, an open-source big data search engine.

Written by Steven Vaughan-Nichols, Senior Contributing Editor April 13, 2020 at 5:39 a.m. PT

As we struggle to get a grip on exactly how COVID-19 makes us ill and what we can do about it, researchers have created over 50,000 articles. That's a lot of information! So, how do you make sense of it all? Verizon Media is doing it by using Vespa. This is an open-source, big data processing program to create a coronavirus academic research search engine: CORD-19 Search.

ebook

Coronavirus disease COVID-19 infection medical. New official name for Coronavirus disease named COVID-19, pandemic risk on world map background

Coronavirus and its impact on the enterprise

This TechRepublic Premium ebook compiles the latest on cancelled conferences, cybersecurity attacks, remote work tips, and the impact this pandemic is having on the tech industry.

This engine works on top of the COVID-19 Open Research Dataset (CORD-19). This dataset should help medical researchers to find and create new insights in the fight against SARS-CoV-2. The documents within it are updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, biological sciences preprints and medRxiv, health science preprints. It also includes document links to PubMed, Microsoft Academic, and the WHO COVID-19 database of publications.

What's different about it from other search engines is that it combines several different methods to find the best answers. Vespa combines text and structured search with exploring by semantic similarity using the scibert-nli model. This is a pre-trained data-mining language model for efficiently searching scientific text.

Usually Verizon uses Vespa for applications such as article recommendations, user personalization, and ad targeting. Now, by keyword indexing COVID-19 articles, it makes searching the flood of COVID-19 articles much easier for researchers.

More technically advanced researchers can access the data via the CORD-19 application programming interface (API). If you want you can even download the code and run the application on your own server.

This is very much a work in progress. You can expect daily updates to the documentation and query features. Verizon welcomes your help on both the code and the data. Check out its contributing guide for how you can help. You can also reach the project's developers by tweeting to them @vespaengine.

Related Stories:

Coronavirus

Editorial standards

Show Comments

Related

Google Third Party Cookies

Google backpedals on plan to eliminate third-party cookies in Chrome

abstractlinesgettyimages-1335852179

Meta inches toward open source AI with new LLaMA 3.1

qcom-panel-1

Switzerland's open-source rules and Google's privacy plans lead the Index