An October 29 post on the Azure blog described Data Factory as "a managed service to compose data storage, processing, and movement services into managed data production pipelines." Testers can create new data factories and link them to various data and processing resources. They can obtain a visual layout of all their pipelines and data inputs/outputs through the Azure Preview Portal, as well as get a historical account of job execution, data production status and system health.
The just-released public preview provides access to on-premises data in SQL Server and cloud data in Azure Blob, Table and Database services," officials said. Access to on-premises data is provided through a data management gateway that connects to on-premises SQL Server databases. Additional sources will be added based on customer feedback during the preview.
"The cloud ecosystem is made up of a bunch of workloads that need to integrate," noted Microsoft Azure Corporate Vice President Jason Zander. Using the Azure Data Factory, users will be able to get more insights by piping raw data, such as Internet of Things (IoT) log entries, into Microsoft's HDInsight Hadoop-on-Azure service or MapReduce and then integrate directly with other services like Azure Machine Learning.
"Data processing is enabled initially through Hive, Pig and custom C# activities. Such activities can be used to clean data, mask data fields, and transform data in a wide variety of complex ways. The Hive and Pig activities can be run on an HDInsight cluster you create or you can allow Data Factory to fully manage the Hadoop cluster lifecycle on your behalf. Author your activities, combine them into a pipeline, set an execution schedule and you’re done – no Hadoop cluster setup or management. Data Factory also provides an up-to-the moment monitoring dashboard, which means you can deploy your data pipelines and immediately begin to view them as part of your monitoring dashboard."
Microsoft already offers a number of Azure data and analytics services, including Azure SQL Database (managed relational database as a service); HDInsight (managed Hadoop clusters); cache, machine learning, Apache Storm analytics processing, DocumentDB (its recently introduced NoSQL document database as a service), Azure Search (its new full-text search as a service).
In other Azure news, Microsoft officials also made a preview available of a new Stream Analytics service, which provides insights in real time from devices, sensors, infrastruture, applications and other data sources. And the company made generally available Azure Event Hubs, a publish-subscribe ingestor that allows users to process and analyze data from connected devices and sensors.