Apache Spark 2.1 for Machine Learning and Data Science

Apache Spark 2.1 for Machine Learning and Data Science

This 3 day training course for Data Scientists and Analysts will teach you how to harness Apache Spark 2.1 for large scale data analysis, predictive modeling and machine learning tasks. You will learn how to program Spark as efficiently and effectively as possible, by targeting the latest version of the platform, and learning the modern approach necessary to fully leverage the advantages it offers.

The entirety of the course is taught hands-on, using real code and interactive examples. In addition, longer labs allow attendees to work together to apply their growing Spark knowledge to solve common challenges faced by organizations running complex Big Data applications in production.

Use Spark to Solve Real-World Challenges

Both lectures and lab activities use real-world datasets, so that you can practice getting Apache Spark to work well in-spite of real-world challenges. You’ll also gain hands-on experience with performance tuning and troubleshooting.

Built Entirely for Apache Spark 2

Apache Spark 2 brings a suite of new features and speed improvements – but it also works differently under the hood, and requires a slightly different approach to programming in-oder to get the most out of it.

This course focuses entirely on Spark 2 and will teach you how to program for the latest version of Spark (currently Spark 2.1) in the most performant, most effective, and easiest way possible.

  • Overview
  • Outline
  • Instructors
  • Reviews
Duration: 3 Days


  • Program Apache Spark in the most performant, easy, modern, and effective ways possible to perform data wrangling, feature selection, model building, validation, tuning, and serving, as well as extending Spark ML to add your feature processing tools and new parallel ML algorithms.
  • Apache Spark has strengths and limitations, like anything else – learn exactly what those are, so you can get the most out of Spark together with other tools.
  • Learn how Apache Spark processes your jobs so that you can troubleshoot, analyze, and improve performance if they don't run well.
  • See important patterns, tricks, tips, and gotchas so that you don't have to learn them the hard way.


Data scientists or analysts involved in predictive modeling, who want to explore machine learning where data is too large for single-machine tools.

Upcoming Classes

No classes have been scheduled, but you can always Request a Quote.

Request a private course for your team

Custom Quote

Don't see a date that works for you?

Request a Class

Confirmed Class