Tutorials

These tutorials have been designed to showcase technologies and design patterns that can be used to begin creating intelligent applications on OpenShift. We have split them into two broad categories: examples and applications.

Applications are fully integrated packages which illustrate how an idea, methodology or technology can be developed and deployed on OpenShift in a manner that users can experience the underlying analytics in a more convenient manner.

Examples are small code samples or notebook workflows that demonstrate how you can integrate a specific technology or technique into your projects. They are separated from the concept of fitting into a user experience and speak to architects, developers and technologists.

All of these tutorials contain instructions for installation and usage as well as open source code artifacts that you are welcome to clone and use in your own projects and presentations. Some of these tutorials also contain videos and slide decks that can be helpful when presenting or demonstrating them to your peers and colleagues.

Applications

SparkPi: an introduction to applications using Apache Spark with radanalytics.io

Java Python Scala

In this tutorial you will build a microservice which will calculate an approximate value for Pi when requested by HTTP. You will learn how to create applications which utilize Apache Spark and that are deployed directly from a source repository to OpenShift with a single command.


Natural language processing with Apache Spark

Python MongoDB

Ophicleide is an application that can ingest text data from URL sources and process it with Word2vec to create data models. These resulting models can be then queried for word similarity. It contains a REST based training server and a browser based front end for user interaction.


Recommendation engine service with Apache Spark

Python Java S2I MongoDB PostgreSQL

Project Jiminy is an implementation of a recommendation system based around collaborative filtering. It is a demonstration of how to build machine learning application pipelines that are composed of several microservices. The application contains a web server, model training service, REST based prediction service and a few data storage services.


Batch filtering using distributed business rules

Java Drools Spark PostgreSQL

The Bad Apples tutorial shows you how to integrate the distributed processing features of Apache Spark with the buisness rules capabilities of Drools. Through the example use case of filtering fraudulent credit card transactions you will learn how to combine automated analytics with human domain expertise.


Scalable trending item detection on streaming data

Scala S2I Infinispan Spark Artemis

Equoid is an implementation of a top-k (aka heavy hitters) tracking system built upon the notion of utilizing a Count-Min Sketch. The project demonstrates the utility of microserviced data streaming pipelines coupled with a temporal and spatial efficient approach to a common use case. The application contains a web server, web UI, caching layer, Apache Artemis broker with associated data publisher and receivers.


Streaming query processing with Apache Kafka and Apache Spark (Python)

Python Kafka S2I

Graf Zahl is a demonstration application using Spark's Structured Streaming feature to read data from an Apache Kafka topic. It presents a web UI to view the top-k words found on the topic.


Streaming query processing with Apache Kafka and Apache Spark (Java)

Java Kafka S2I

jGraf Zahl is a Java implementation of the Graf Zahl application. It is a demonstration of using Spark's Structured Streaming feature to read data from an Apache Kafka topic. It presents a web UI to view the top-k words found on the topic.


Handwriting recognition with Tensorflow

Tensorflow S2I

This demo shows how to use source-to-image Tensorflow Serving build to deploy a tensorflow serving prediction endpoint on Openshift. The s2i build provides a GRPC microservice endpoint for web applications to send queries to be evaluated against the tensorflow model.


Geographic data analytics with PostgreSQL and Apache Spark

Python S2I Spark PostgreSQL

This is an application which brings together 3 microservices to explain how to use a PostgreSQL database to analysis data within a spark cluster.


Examples

Exploring value-at-risk analysis with Apache Spark and Jupyter

Python Jupyter

The value-at-risk notebook is a simple example of how to run Jupyter notebooks on OpenShift, Monte Carlo simulations in Spark, and how to interactively explore data to find better ways to model it.


Integrating AMQP with Apache Spark

Scala ActiveMQ

This demo shows how it's possible to integrate AMQP based products with Apache Spark Streaming. It uses the AMQP Spark Streaming connector, which is able to get messages from an AMQP source and pushing them to the Spark engine as micro batches for real time analytics


Deploying Apache Zeppelin notebooks

Zeppelin Spark Python

This is an example of how to use Apache Zeppelin


Accessing data in Ceph with Apache Spark

Python Ceph S3 Jupyter

This is an example of how to connect your application to data in Ceph using S3 API.


Deploying Apache Spark clusters with Fabric8

Java Kafka

This demo shows how to use the Fabric8 Maven Plugin to deploy a Spark cluster on Openshift.


Accessing data in HDFS with Apache Spark

Python HDFS Jupyter

This is a very simple Jupyter notebook application which runs on OpenShift. It shows how to read a file from a remote HDFS filesystem with PySpark.


Accessing data in S3 with Apache Spark

Python S3 Jupyter

This is an example of how to connect your application to data in S3.


Analyzing blockchain graphs with Apache Spark and Jupyter

Python Jupyter spark-notebook

These blockchain notebooks are examples of how to explore graph data using GraphX and GraphFrames on top of OpenShift using Apache Spark. It uses the real Bitcoin blockchain data to create a transaction graph for the analysis.