Microsoft expands Azure Data Lake with new big data tools

Ngoc Huynh

A new, dynamically scalable analytics service is built on Apache YARN

Microsoft had its sights set squarely on big data when it introduced its Azure Data Lake earlier this year, and on Monday it broadened that effort with new tools designed to make big data processing and analytics simpler and more accessible.

First, what Microsoft originally called Azure Data Lake has now been renamed Azure Data Lake Store, offering a single repository for data of any size and type — including unstructured, semi-structured and structured — without requiring application changes as data scales.

Data can be securely shared there and made accessible for processing and analytics. It can be acquired in real-time from sensors and devices for Internet of Things (IoT) applications, for example, or from online shopping websites, all without restrictions on account or file size.

Available in preview later this year, the store is compatible with the Hadoop Distributed File System (HDFS), so Hadoop distributions such as Hortonworks, MapR and Cloudera can readily access the data for processing and analytics, Microsoft said.

Second, Azure Data Lake Analytics adds to the storage portion of Azure Data Lake with a new, dynamically scalable analytics service built on Apache YARN that will also be available in preview later this year.

The new analytics service includes the U-SQL query language, whose scalable and distributed query capability allows users to efficiently analyze data in the Azure Data Lake Store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse, Microsoft said.

Finally, Microsoft’s Azure HDInsight is now included in Azure Data Lake as well, offering a fully managed Apache Hadoop cluster service with open-source analytics engines including Hive, Spark, HBase and Storm. As of Monday, managed clusters on Linux are generally available with a service-level agreement (SLA) specifying 99.9 percent uptime.

Also supporting the Azure Data Lake are Azure Data Lake Tools for Visual Studio, which provide an integrated development environment that spans the Azure Data Lake, and leading Hadoop applications from independent software vendors spanning security, governance, data preparation and analytics, Microsoft said.

Pricing details were not immediately available.

Share the news now

Source : http://www.pcworld.com/