Tutorial: Modern Spark DataFrame and Dataset

Tutorial: Modern Spark DataFrame and Dataset


Apache Spark has changed dramatically in the past year – from new APIs in Spark 1.4 to dramatic execution improvements and even better APIs in 2.0. In this intermediate-level tutorial, I'll address the question of which Spark APIs to use with a series of brief technical explanations and demos that highlight best practices, latest APIs, and new features.

We'll look at how Dataset and DataFrame behave in Spark 2.0, look at Whole-Stage Code Generation, and go through a simple example of Spark 2.0 Structured Streaming (Streaming with DataFrames) that you can run in your own free instance of Databricks.

Follow Along: You can run all the examples in this tutorial yourself. Just register for a free instance of Databricks Community Edition, and import this notebook.

Spark Training from NewCircle

If you're just getting started with Spark development, check out our 3 day Spark Programming course page to see upcoming public classes or request an onsite training for your team.

About the Author

If you liked this post you'll probably be interested in these:

1 Comment


Srinivas Thammanaboina | American Express
Posted on Sep 27, 2016 (5 months ago)

Thank you so much. it is very much useful