With its growing emphasis on all things AI -- coupled with its history as a tool vendor -- it's not surprising that Microsoft is working on tools not just for traditional programmers, but also data scientists.
According to a Microsoft Research presentation from earlier this year, data scientists currently spend 80 percent of their time extracting and cleaning data -- AKA "data wrangling." Microsoft wants to fix this.
Enter "Project Pendleton."
A year ago, I first heard from a contact of mine about a new machine-learning-related tool under development by Microsoft that was codenamed "Pendleton." But it wasn't until The Walking Cat (@h0x0d on Twitter) unearthed some more information and documents that I had enough information to write about Pendleton.
"Pendleton provides a set of flexible and scalable tools to help you explore, discover, understand ad fix problems in your data. It allows you to consume data in many forms and to transform that data into new forms that are better suited for your usage."
Pendleton is a client app that works on Windows, OS X/macOS. Its design runtime uses Python and depends on various Python libraries.
As one of my contacts described it, Pendleton is a tool aimed at data scientists that is designed for data preparation and cleaning. The tool can do things like remove errant columns, change formatting in columns, handle missing data and the like. It also includes analytics tools to help data scientists figure out what's included in a dataset. Pendleton can read data from SQL Server, Azure Blobs, and Data Lakes. It also can read files from local PC files, my contact said.
Microsoft has been testing privately Pendleton for nearly a year, maybe longer. I haven't heard how the company plans to release the tool, but it seems like that's still the plan.