PySpark HDFS Notebook

A simple PySpark wordcount app which reads from HDFS

What does it do?

Connects to a specified Spark cluster
Reads a file specified by an HDFS url
Splits words on spaces and counts them
Prints the counts for up to 20 words

Notes on permissions

This example uses an unsecured HDFS
The file must be readable by nbuser (1011:root)