The community has several ongoing projects with frequent releases. These are all collected in our GitHub organization. Each project addresses a specific concern within the OpenShift realm and provide solid solutions for your own data driven applications.

GitHub Organization


The following presentations are about the technologies involved in and related to the projects. We love our community and the passion they have for these technologies. If you know of a presentation that would fit in here, please open a pull request and add it to the list!


Why Data Scientists Love Kubernetes

Sophie Watson & William Benton

Building an Implicit Recommendation Engine with Spark

Sophie Watson

Extending Structured Streaming Made Easy with Algebra

Erik Erlandson

Apache Spark for Library Developers (Deep Dive Part 2)

Erik Erlandson & William Benton

Apache Spark for Library Developers (Deep Dive Part 1)

Erik Erlandson & William Benton

From Research to Production: What they didn’t teach you in Grad School

Sophie Watson

Building Streaming Recommendation Engines on Spark

Rui Vieira

Apache Spark from notebook to cloud native application

Rebecca Simmonds

Intelligent applications on OpenShift from prototype to production

Rebecca Simmonds and Michael McCune

Pythonic Apache Spark app patterns for the cloud

Michael McCune

Probabilistic Structures for Scalable Computing

William Benton

Collaborative Filtering Microservices on Spark

Rui Vieira, Sophie Watson


Containerizing TensorFlow Applications on OpenShift

Subin Modeel

One-Pass Data Science in Apache Spark with Generative T-Digests

Erik Erlandson

Fire in the Sky: An Introduction to Monitoring Apache Spark in the Cloud

Michael McCune

Building Machine Learning Algorithms on Apache Spark

William Benton

Analyzing Blockchain transaction graph with Spark

Jirka Kremser

From notebooks to cloud native: a modern path for data driven applications

Michael McCune

The Revolution Will Be Containerized • Architecting the Intelligent Applications of Tomorrow

William Benton

Smart Scalable Feature Reduction With Random Forests

Erik Erlandson

Converging insightful, data-led applications with traditional web applications

Michael McCune, Steve Pousty

Sketching Data with T-Digest In Apache Spark

Erik Erlandson

Optimizing Spark Deployments for Containers: Isolation, Safety, and Performance

William Benton

Teaching Apache Spark Clusters to Manage Their Workers Elastically

Erik Erlandson, Trevor Mckay

Big Data In Production: Bare Metal to OpenShift

William Benton

Insightful Apps with Apache Spark and OpenShift

William Benton, Michael McCune

Building My Own Little World with Open Data

Steven Pousty

Building Cloud Native Apache Spark Applications with OpenShift

Michael McCune


Building Apache Spark Application Pipelines for the Kubernetes Ecosystem

Michael McCune

Converging Big Data and Application Infrastructure

Steve Pousty

Running Apache Spark Natively on Kubernetes with OpenShift

Erik Erlandson

Containerized Spark on Kubernetes

William Benton

Big Data and Apache Spark on OpenShift Pt. II

William Benton

Big Data and Apache Spark on OpenShift Pt. I

William Benton

Analyzing Log Data With Apache Spark

William Benton


Diagnosing Open-Source Community Health with Spark

William Benton


Analyzing endurance-sports activity data with Spark

William Benton