The combination of big data and (relatively) cheap cloud storage has led to an explosion in efforts to make stored files findable. Google has Cloud Search, Elastic recently scored a $4B IPO, and other startups like Coveo and butter.ai are targeting search as well.
Cloudtenna is another startup in the space that is announcing an expansion of their cloud search engine, called DirectSearch. As I wrote three years ago about the company:
Cloudtenna's software scoops up your file server's metadata - not the files - and stores it in the cloud. Users can search for files from anywhere, on any device, and when they find the ones they want, they download them from the company's file servers over a secure SSL connection.
The new product adds a machine learning platform that find files across disparate platforms, including Dropbox, Box, Microsoft OneDrive, Google Drive, Outlook, Gmail, Slack, Atlassian JIRA and Confluence, and local file servers. You can search on name, sender, date, file type, keyword, content, and other attributes regardless of where the file is located.
That's a lot, but it's not the hard part. Nor is respecting file permissions, meaning that users can't access files they aren't supposed too. The hard part is doing this and delivering sub-second response times, even when thousands of users are searching across billions of files stored on dozens of repositories .
The secret sauce
Cloudtenna doesn't go into detail about how they achieve these results, except to say:
It uses real-time binding to build its file index and then performs consistency checks to capture deltas, such as a security change or a deleted file. File deduplication and ACL crunching reduces data required by the index, significantly reducing storage costs and requirements.
They've focused on making their crawler lightweight, so they can keep the index updated at low overhead.
It seems everyone in tech is smartwashing their products by claiming AI and/or machine learning features, but in Cloudtenna's case it appears they are actually incorporating ML into their product. The difficult part of ML is training the model, which takes a large corpus of data.
today announced its new OEM program which allows partners to easily embed Cloudtenna into their existing platforms. Through working with large OEM partners, Cloudtenna will be able to quickly gain access to a massive amount of metadata to help train the model they use so ML to will be able to rank and present search results that are most useful to the user.
Cloudtenna offers a free account for three months - with no credit card - so you can try it out, risk free. They don't currently handle one of my clouds (I currently use about a half dozen), but I'm trying it out anyway to see how it works, hoping they'll add it shortly.
The Storage Bits take
Just as Google search revolutionized our web access, universal search will revolutionize our file access. If it works as advertised, Cloudtenna's search will enable us to forget where our files are, and focus instead on using them to get work done.
This is part of the larger transition to a data-centric world. We are producing more data every day, and finding more uses for that data. Just as web search makes the massive web accessible, cross-cloud file search makes our massive data stores accessible - and usable - as well.
Multi-repository search will remain a hot area for years to come, as cloud storage continues to build out, and we develop more ways to harness the data we collect. I'm excited to see what developers invent to meet the need over the next decade.
RELATED AND PREVIOUS COVERAGE
Google needs to break up its all-or-nothing approach to permissions
Want to play a YouTube clip on a Google Home Hub? You better have handed over your Chrome web history.
New Windows 10 19H1 test build expands search to all folders and drives
With test build 18267 of Windows 10 19H1, users can now search all their folders and drives, rather than a subset
Dropbox bolsters smart search capabilities with automatic text recognition
The company is rolling out an automatic image text recognition capability that performs optical character detection on photos of documents, including receipts and field reports.
Elastic, search company for Uber and Tinder, nearly doubles in IPO
Elastic, the Mountain View, Calif., startup that went public Friday at $36, nearly doubled as soon as it opened for trading. The company boasts its technology's speed and scale offer a far better way to manage and search big data for clients such as Uber, Lyft, Tinder, Cisco, and Sprint. It also claims to have a better model for selling open-source software.
Courteous comments welcome, of course. Updated Wednesday, November 13, 2018 to reflect the fact that Cloudtenna, whose current products have millions of users, will be asking its OEMs for metadata access for ML training purposes.