NewCircle Developer Stream
Stream is a constantly updated source of free, educational content on open source development. Also, check out our bookshelf for in-depth tutorials.
In this article, I am going to show how to identify some common Spark issues the easy way: by looking at a particularly informative graphical report that is built into the Spark Web UI – the Web UI Stage Detail view.
A close look at the ways Spark ML models can be put into production, which patterns work best in which situations, and why.
Learn some key performance patterns and anti-patterns that will help you get the most out of Spark 2.0.
A hands-on tutorial using Spark SQL and DataFrames to retrieve insights and visualizations from datasets published by the City of San Francisco.
In this intermediate-level tutorial, I'll address the question of which Apache Spark APIs to use, with a series of brief technical explanations and demos that highlight best practices, latest APIs, and new features in Spark 2.0.
In this tour from QCon SF, I’ll show you Spark's ability to rapidly process Big Data. I'll demonstrate extracting information with RDDs, querying data using DataFrames, visualizing and plotting data, and show you how to create a machine-learning pipeline with Spark-ML and MLLib. We'll also discuss the internals which make Spark 10-100 times faster than Hadoop MapReduce and Hive.
Video covering Spark Streaming from my presentation at the Philly Area Scala Meetup.
Video and slides from my full-day Apache Spark workshop training at Spark Summit 2015
The Observer in Python works a bit differently than it does in other languages. This short tutorial will introduce you to how it works in Python and get you started on how to use it.
Today, according to Dean Wampler, Scala has successfully taken over the Big Data world. This is a talk about why.
Let's understand the basics of how Hadoop, and HDFS, works with the help of one our favorite childhood toys.
Pattern matching is a killer feature in Scala. Those of you coming from a Java background might find this particularly interesting, because even with Java 8, there’s nothing like this in Java.
Blaze is an open source project from Continuum Analytics. It’s a project under evolution, "an ambitious effort to provide uniform, Pythonic interface to modern datasets and computation platforms."
Bokeh is a data visualization library that lets Python programmers and data scientists create interactive, novel, plots for the web. This talk overviews its capabilities and demos its latest features.
Software peer review is essential on a modern development team. Learn how to keep your code healthy, and your people happy in this 15 minute talk from Forward JS.
What's new in the Python packaging community? Noah Kantrowitz outlines what's happened, what's going to happen, and how to incorporate the latest techniques into your Python environment.
Noah Kantrowitz overviews the various tools available for application deployment today, discusses their tradeoffs, and helps shine a light on which might be the appropriate platform for your project.
Andrew Godwin discusses the reasons behind Lanyrd's decision to move from MySQL to PostgreSQL, then from AWS to Softlayer, and what their team learned along the way.
The Django Debug Toolbar can be extremely helpful, but the interesting bugs only happen in production. Simon Willison offers advice on asking “what went wrong?,” and, “what’s going to go wrong?”
The story of taking two APIs, each with their respective issues, and updating them to create a single API for the modern era.
Nathan Yergler, Principle Engineer at Eventbrite, talks about how they took their code base, that's been around for quite some time, and built a culture of testing around it.
Releasing a new feature means takings into consideration how it will interact will all of your previous features. Feature flags are a tool to help confront this issue.
Nathan Yergler explains how Eventbrite adapted their code base for internationalization and discusses some of the unique challenges they faced along the way.
A series of 15-minute talks on Eventbrite and Lanyrd, two-large scale, layered, sites built on Python and Django.
Greg Sadetsky delivers an introduction for anyone interested in getting started with Python. He begins by setting up the environment, then demonstrates the power of a few simple lines of code.