Search
  • Videos
  • Windows 10
  • 5G
  • Best VPNs
  • Cloud
  • Security
  • AI
  • more
    • TR Premium
    • Working from Home
    • Innovation
    • Best Web Hosting
    • ZDNet Recommends
    • Tonya Hall Show
    • Executive Guides
    • ZDNet Academy
    • See All Topics
    • White Papers
    • Downloads
    • Reviews
    • Galleries
    • Videos
    • TechRepublic Forums
  • Newsletters
  • All Writers
    • Preferences
    • Community
    • Newsletters
    • Log Out
  • Menu
    • Videos
    • Windows 10
    • 5G
    • Best VPNs
    • Cloud
    • Security
    • AI
    • TR Premium
    • Working from Home
    • Innovation
    • Best Web Hosting
    • ZDNet Recommends
    • Tonya Hall Show
    • Executive Guides
    • ZDNet Academy
    • See All Topics
    • White Papers
    • Downloads
    • Reviews
    • Galleries
    • Videos
    • TechRepublic Forums
      • Preferences
      • Community
      • Newsletters
      • Log Out
  • us
    • Asia
    • Australia
    • Europe
    • India
    • United Kingdom
    • United States
    • ZDNet around the globe:
    • ZDNet France
    • ZDNet Germany
    • ZDNet Korea
    • ZDNet Japan

Hands-on with Azure Data Lake: How to get productive fast

18 of 24 NEXT PREV
  • A data lake, in the cloud

    A data lake, in the cloud

    Azure Data Lake is Microsoft's cloud-based mashup of Apache Hadoop, Azure Storage, SQL and .NET/C#. It gives developers an extensible SQL syntax for querying huge data sets stored in files of various formats, and it's accessible from the browser, from Visual Studio and even from BI tools.

    In this gallery, I'll show you the ins and outs of the service's two main components: Azure Data Lake Store (ADLS) and Azure Data Lake Analytics (ADLA). I'll also introduce you to the provisioning process; the relevant tooling; the SQL dialect and its .NET integration; and how data in Azure Data Lake can be queried from external tools like Power BI.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Photo by: iStock

    Caption by: Andrew Brust

  • Pick your service

    Pick your service

    Getting started with Azure Data Lake is pretty easy. Just hit the "+" button on the upper-left-hand corner, choose "Intelligence + analytics," scroll down, and you'll see both relevant services: Data Lake Store and Data Lake Analytics. Set up Data Lake Store first.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Open the store

    Open the store

    Setting up Azure Data Lake Store (ADLS) is also pretty easy. Enter a name for the account (it will have to be unique across all ADLS accounts), then the name of a resource group that will contain it. That's it! Now you can check "Pin to dashboard" and click the blue Create button.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Deploying

    Deploying

    If you checked "Pin to dashboard" then you should see a tile added to your dashboard indicating that your ADLS account is being deployed, as shown here.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Deployment complete

    Deployment complete

    Deployment won't take long. Once complete, you'll see this "blade" in the Azure portal. Note the button along the top that opens a file explorer-like interface for ADLS. This will be useful once you have some data in there.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Analytics provisioning

    Analytics provisioning

    With your Azure Data Lake Store account now set up, you can go back to the step we showed in slide 2 ("Pick Your Service") and choose Azure Data Lake Analytics (ADLA).

    Once you do, you'll see this provisioning form. As before, you'll need to supply a name and resource group name, but you'll also need to click on the Data Lake Store option and pick the ADLS account that you just provisioned.

    In other words, you'll do the analysis work with ADLA, and it will retrieve data from, and save data to, the ADLS account.

    As before, check "Pin to dashboard" and click the blue Create button to provision an ADLA tenant.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Get ready to analyze

    Get ready to analyze

    Once provisioning is complete, this is the screen you'll see. Note the ability to add additonal users (and provide them access to your) data. The "Sample Scripts" button on the top can get you querying sample data in almost no time flat. Click it to start.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • So close yet so far

    So close yet so far

    Now you'll see options to query a tab-separated values (TSV) file, create a database and table, populate a table and query a table. But you can't do any of that until you click through the two items above those options, which will load the sample data and the U-SQL extensions (more on U-SQL shortly) into your ADLA tenant. So click each option and let both processes complete.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Query a file directly

    Query a file directly

    Now you can click the "Query a TSV file" option. Once you do, a full fledged U-SQL script will come up.

    U-SQL is a dialect of the popular Structured Query Language that you can use with ADLA to query data in ADLS. This particular U-SQL script pulls data out of a TSV file using U-SQL's EXTRACT command, and it pushes the result of the query into another TSV file using the OUTPUT command.

    Notice the U-SQL editor, even though it's embedded in the Azure portal, features color syntax highlighting. There's even better editor support in Visual Studio, which we'll see. But this isn't bad for the browser!

    Clicking "Submit Job" (highlighted) at the top will execute the query. We'll explore that in a bit.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Create database and table

    Create database and table

    Some users may find the familiar concept of tables more comfortable than keeping everything in files.

    So rather than working exclusively with text files for input and output, this U-SQL script creates an ADLA database, called SampleDBTutorials, and an empty table within it, called SearchLog, into which data can be inserted later.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Populate data

    Populate data

    This U-SQL script queries data out of a TSV file with EXTRACT and then uses INSERT INTO to populate the SearchLog table with its contents.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Query the database

    Query the database

    Now that the table is populated, this U-SQL script uses the standard SELECT command to query it. No more using EXTRACT and flat files! The result set gets persisted into a variable called @athletes.

    Next, the OUTPUT command is used to export the @atheletes result set, and then the entire SearchLog table, to files.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • 3...2...run!

    3...2...run!

    Now go back to the first script (Query a TSV file) and click Submit to run the job.

    The job will go through a preparation phase, then a queuing phase, shown here. The work to be done in the job is represented by the tasks in the job graph, to the right. Bear in mind, these graphs can get much more complex than this one.

    Running and Finalizing phases follow, after which the output of the job will exist as a file called SearchLog_output.tsv.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Preview the data

    Preview the data

    When the job is finished, click on the shape representing the SearchLog_output.tsv file (at the bottom of the job graph). This will let you preview its contents right in the Azure portal, as shown here.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Getting visual

    Getting visual

    As slick as it is to be able to do all that work in the browser, many pros will prefer to use a full fledged integrated development environment.

    To that end, Microsoft has created Azure Data Lake Tools for Visual Studio, a free download. They also have a great sample Azure Data Lake project for Visual Studio that you can download from GitHub. (And if you don't have Visual Studio, the full-featured Community Edition is a free download too.)

    Do all that, bring up the project, log in to your ADLA tenant, open up the first script and click its Submit button. The job will run right in Visual Studio, as shown here.

    When job execution is complete, right click the job graph shape for the output file, and you'll see options to preview and download the data, and more.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Show me the files

    Show me the files

    If you choose the "Open Folder" option from the menu mentioned in the previous slide, you'll see a full-fledged ADLS file explorer comes up right inside Visual Studio, as shown here.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • What's inside

    What's inside

    Drill down to the /Output/TweetAnalysis folder, then open MyTwitterAnalysis1.tsv, the output file from the job.

    The file will come up in the viewer shown here, which will display the file's metadata, several rows of its data in a tabular view, and provide options to save the data locally in CSV format or preview the data in Excel.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Run local

    Run local

    Visual Studio's U-SQL script editor has a drop-down at the top that lets you select "(Local)" as your ADLA tenant. Pick that and you can run jobs locally, without incurring any cloud usage fees.

    Shown here is an execution of the same script as in slide 15 ("Getting visual"), but locally this time. Note that when execution is complete, you can hover over the job graph shape for the output file to see its location on your local disk.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Find it, open it

    Find it, open it

    Go to AppData\Local\USQLDataRoot folder in your user folder to get to the local disk equivalent of the ADLS root folder in the cloud. Then drill down to the Output\TweetAnalysis sub-folder within it and you'll find the output file, as shown here. Double click the file to open it in the default editor for TSV files (Excel 2016 in the case of this screenshot).

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Nothing but .NET

    Nothing but .NET

    Doing Azure Data Lake work in Visual Studio is about more than using a desktop app. That's because U-SQL queries can call functions written in C#, either in code-behind files or in separate .NET assembly projects.

    The source code for a C# function called get_mentions is shown here. Look at the Solution Explorer window at the upper-right and you'll see that the class file containing this code is in its own C# project, separate from the Azure Data Lake project, but in the same Visual Studio solution.

    C# expressions can also be used inline, in U-SQL scripts. This includes scripts built in the Azure portal. The C# code-behind and assembly techniques require Visual Studio though.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Polyglot

    Polyglot

    This U-SQL script uses the get_mentions function in a SELECT query.

    Note, at the top of the script, the REFERENCE ASSEMBLY command used to link to the .NET assembly created by compiling the C# project shown in the previous slide.

    The CREATE ASSEMBLY command must be run once before REFERENCE ASSEMBLY can be used. The code for the former is also shown, though it is commented out.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Menu, sir?

    Menu, sir?

    Look carefully at Visual Studio's (admittedly crowded) main menu. If you're in an Azure Data Lake project, you'll see a "Data Lake" item, approximately 7th or 8th from the left. Click it and you'll see the options shown here.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • You've got the Power (BI)

    You've got the Power (BI)

    Did you know that Microsoft's Power BI can connect directly to data stored in ADLS? It's true, and shown here.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

  • Let's get visual

    Let's get visual

    You wouldn't connect to that data unless you wanted to visualize it, right? Shown here are several visualizations against the influencer.csv output file from the 8-TweetAnalysis-WindowingExpr.usql script in the Visual Studio sample project. This is the finish line!
    ---
    You've now gone end-to-end with Azure Data Lake, from provisioning ADLS and ADLA, using the Azure portal U-SQL tooling, working with ADL Tools for Visual Studio, with both U-SQL and C# code and even integrating the data into Power BI.

    This experience will all be very intuitive to .NET and SQL Server developers. And the fact that they're using Hadoop and YARN under the hood? It's just an implementation detail now.

    Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

    Caption by: Andrew Brust

18 of 24 NEXT PREV
Andrew Brust

By Andrew Brust for Big on Data | January 11, 2017 -- 18:17 GMT (10:17 PST)

  • A data lake, in the cloud
  • Pick your service
  • Open the store
  • Deploying
  • Deployment complete
  • Analytics provisioning
  • Get ready to analyze
  • So close yet so far
  • Query a file directly
  • Create database and table
  • Populate data
  • Query the database
  • 3...2...run!
  • Preview the data
  • Getting visual
  • Show me the files
  • What's inside
  • Run local
  • Find it, open it
  • Nothing but .NET
  • Polyglot
  • Menu, sir?
  • You've got the Power (BI)
  • Let's get visual

Microsoft's Azure Data Lake is now generally available, but what does it do, and how does it work? Here's a tour around the service's tooling and capabilities, to help you understand it and get productive with it, quickly.

Read More Read Less

Run local

Visual Studio's U-SQL script editor has a drop-down at the top that lets you select "(Local)" as your ADLA tenant. Pick that and you can run jobs locally, without incurring any cloud usage fees.

Shown here is an execution of the same script as in slide 15 ("Getting visual"), but locally this time. Note that when execution is complete, you can hover over the job graph shape for the output file to see its location on your local disk.

Published: January 11, 2017 -- 18:17 GMT (10:17 PST)

Caption by: Andrew Brust

18 of 24 NEXT PREV

Related Topics:

Digital Transformation Robotics Internet of Things Innovation Enterprise Software CXO
Andrew Brust

By Andrew Brust for Big on Data | January 11, 2017 -- 18:17 GMT (10:17 PST)

Show Comments
LOG IN TO COMMENT
  • My Profile
  • Log Out
| Community Guidelines

Join Discussion

Add Your Comment
Add Your Comment

Related Galleries

  • Azure Synapse Analytics data lake features: up close

    Big Data Analytics

    Azure Synapse Analytics data lake features: up close

  • Pitfalls to Avoid when Interpreting Machine Learning Models

    Artificial Intelligence

    Pitfalls to Avoid when Interpreting Machine Learning Models

  • When chatbots are a very bad idea

    Artificial Intelligence

    When chatbots are a very bad idea

  • How ubiquitous AI will permeate everything we do without our knowledge.

    Artificial Intelligence

    How ubiquitous AI will permeate everything we do without our knowledge.

ZDNet
Connect with us

© 2021 ZDNET, A RED VENTURES COMPANY. ALL RIGHTS RESERVED. Privacy Policy | Cookie Settings | Advertise | Terms of Use

  • Topics
  • Galleries
  • Videos
  • Sponsored Narratives
  • Do Not Sell My Information
  • About ZDNet
  • Meet The Team
  • All Authors
  • RSS Feeds
  • Site Map
  • Reprint Policy
  • Manage | Log Out
  • Join | Log In
  • Membership
  • Newsletters
  • Site Assistance
  • ZDNet Academy
  • TechRepublic Forums