These projects represent the core of the radanalytics.io community's efforts. There are several different types of projects represented here, but all are focused on improving your experiences creating insightful data driven applications.
We invite you to take a look, test drive everything, and hopefully join us by contributing a comment, issue, or code!
The openshift-spark repository contains all the files necessary to create Apache Spark images for OpenShift. These images can then be used to deploy Spark clusters within an OpenShift project, or for creating applications which utilize Spark in a standalone fashion.
Oshinko is a top level namespace that covers several individual projects
which are focused on delivering Apache Spark clusters inside OpenShift. The
individual repositories, labelled oshinko-*
, provide differing levels of
interaction with an OpenShift deployment.
The individual components of Oshinko are:
The cli repository contains a command line tool for managing clusters, it also contains a Go language library encompassing the business logic of managing clusters, and a REST server which uses that library
An extension to the OpenShift console which enables integrated support for managing clusters
Source-to-image tooling for creating Spark applications with the ability to deploy transient per-application clusters
Documents containing feature proposals for implementation in the Oshinko namespace
An HTML server which provides a container based browser interface for cluster management
This project provides Apache Spark backend plugin-ins for awareness of Kubernetes, OpenShift, Oshinko and the like. With this scheduler your Spark applications can utilize elastic scale capabilities.
This is a library of reusable code for Spark applications, factored out of applications that have been built at Red Hat. It will grow in the future but for now it has an application skeleton, some useful extensions to data frames and RDDs, utility functions for handling time and stemming text, and helpers for feature extraction.
This project provides an AMQP (Advanced Message Queuing Protocol) connector for Apache Spark Streaming in order to ingest data as a stream from all possible AMQP based sources.