PySpark HDFS Notebook
A simple PySpark wordcount app which reads from HDFS
What does it do?
- Connects to a specified Spark cluster
- Reads a file specified by an HDFS url
- Splits words on spaces and counts them
- Prints the counts for up to 20 words
Notes on permissions
- This example uses an unsecured HDFS
- The file must be readable by nbuser (1011:root)