Salesforce's AI research team has developed a new system that promises to help business users communicate with databases without knowing languages, such as SQL, typically used to query those systems.
The CRM giant's Seq2SQL system -- laid out in an academic paper [PDF] -- is a deep neural network that translates natural language questions to corresponding SQL queries. Users type in a question such as "which accounts have the lowest customer satisfaction score", and have the system query the appropriate database and return with results.
Salesforce said in its paper that its model is inspired by pointer networks, which, rather than generating words from a fixed vocabulary like the attentional sequence-to-sequence model, generates by selecting words from the input sequence.
By applying reinforcement learning, Salesforce said its model has been able to generate more accurate results than attentional sequence-to-sequence models. According to its paper, Seq2SQL improved execution accuracy from 35.9 percent to 60.3 percent and logical form accuracy from 23.4 percent to 49.2 percent.
Salesforce said its model does not require access to the table content during inference.
The company has also announced the launch of WikiSQL, an open-source dataset of more than 87,000 natural language questions, SQL queries, and SQL tables drawn from 26,000-plus HTML tables from Wikipedia.
HTML tables were extracted from Wikipedia, which then became the basis for randomly generated SQL queries. The queries were used to form questions, which were then handed off to workers on Amazon Mechanical Turk for paraphrasing. Two other workers were asked to verify that the paraphrase has the same meaning as the generated question, Salesforce explained in its paper.
The CRM giant is not the first company attempting to dumb down database querying; Tableau subsidiary ClearGraph's technology is similarly designed to make it easy for users to access and analyse data without any technical training.
It stores semantic data in knowledge graphs, which can expand and learn over time. For example, a user can ask for "total sales by customers who purchased staples in New York", then filter the results to "orders in the last 30 days", then group the results by "project owner's department".
Quepy, a Python framework, also transforms natural language questions into semantic database queries that can be used with databases such as DBpedia. Quepy currently provides support for SPARQL and MQL query languages.
Earlier this year, Austin, Texas-based startup Pilosa announced the launch of the community edition of its "distributed bitmap index" aimed at dramatically improving querying speeds on datasets greater than 1TB without purchasing additional hardware.
Typically, databases have two components: Storage and retrieval. What Pilosa has done is "liberated" the index -- which is used to run queries on datasets -- from the storage, creating a new type of bitmap index that runs in-memory rather than on-disk.
More on Salesforce:
Salesforce rolls out new AI-powered tools for mobile workers | Salesforce steps up developer efforts for Einstein | Salesforce launches Einstein Analytics, Discovery, aims to bring AI to business users | Salesforce preps guidance feature for its Einstein AI technology | Salesforce bolsters Sales Cloud with more Einstein AI | Salesforce's Einstein: One smart way to upsell AI | Salesforce brings Einstein AI to Service Cloud | Salesforce Einstein promises AI applications that 'just work' | Salesforce Einstein: Dream versus reality | Salesforce's Einstein AI platform: What you need to know