Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data

Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data

Tags:

In this hands-on tutorial presented at Code for San Francisco we'll look at how to use Apache Spark to analyze datasets published by the City of San Francisco, through SF Open Data.

The workshop will focus on how to use Spark SQL and DataFrames to retrieve insights and visualizations from fire service calls made to the San Francisco Fire Department on July 4th of this year. The demos and labs are targeted for an audience with some general programming or SQL query experience, but little to no experience with Spark.

We'll begin with some brief theory and lecture on Spark, before diving into several demos where we'll perform visualizations and analysis on the data.

Try the labs yourself:
  1. Sign-up for Databricks Community Edition (free)
  2. Download the labs
  3. Login to Databricks CE, and import the labs

To follow along, view the static HTML version of the material covered.


To dive deeper into Apache Spark, check out NewCircle's upcoming public classes, and private training courses, delivered onsite for your team.

About the Author

If you liked this post you'll probably be interested in these:

0 Comments

Comments