Follow Our Progress

These projects represent the core of the community's efforts. There are several different types of projects represented here, but all are focused on improving your experiences creating insightful data driven applications.

We invite you to take a look, test drive everything, and hopefully join us by contributing a comment, issue, or code!

OpenShift Spark Infrastructure

The openshift-spark repository contains all the files necessary to create Apache Spark images for OpenShift. These images can then be used to deploy Spark clusters within an OpenShift project, or for creating applications which utilize Spark in a standalone fashion.

Oshinko Infrastructure

Oshinko is a top level namespace that covers several individual projects which are focused on delivering Apache Spark clusters inside OpenShift. The individual repositories, labelled oshinko-*, provide differing levels of interaction with an OpenShift deployment.

The individual components of Oshinko are:

  • oshinko-cli

    The cli repository contains a command line tool for managing clusters, it also contains a Go language library encompassing the business logic of managing clusters, and a REST server which uses that library

  • oshinko-console

    An extension to the OpenShift console which enables integrated support for managing clusters

  • oshinko-s2i

    Source-to-image tooling for creating Spark applications with the ability to deploy transient per-application clusters

  • oshinko-specs

    Documents containing feature proposals for implementation in the Oshinko namespace

  • oshinko-webui

    An HTML server which provides a container based browser interface for cluster management

Scorpion Stare Spark Extension

This project provides Apache Spark backend plugin-ins for awareness of Kubernetes, OpenShift, Oshinko and the like. With this scheduler your Spark applications can utilize elastic scale capabilities.

Silex Spark Extension

This is a library of reusable code for Spark applications, factored out of applications that have been built at Red Hat. It will grow in the future but for now it has an application skeleton, some useful extensions to data frames and RDDs, utility functions for handling time and stemming text, and helpers for feature extraction.

Spark AMQP Connector Spark Extension

This project provides an AMQP (Advanced Message Queuing Protocol) connector for Apache Spark Streaming in order to ingest data as a stream from all possible AMQP based sources.