SparkPi: an introduction to applications using Apache Spark with radanalytics.io

This tutorial walks you through the steps of creating, deploying and operating an Apache Spark driver application on OpenShift. You will build a microservice which will calculate an approximate value for Pi when requested by HTTP. Through this process you will also learn how to create applications that utilize Apache Spark and that are deployed directly from a source repository to OpenShift with a single command. You should expect to spend 30 to 60 minutes on these exercises.

Prerequisites

  • You have access to an OpenShift cluster.

  • A terminal with the oc command is available.

  • A new project in OpenShift is available to store your work.

  • The Oshinko templates are available in your project (See the How do I install radanalytics.io? article for instructions.)

  • An editor to work on your source files is available.

  • An online space for a Git repository is available to store your code, such as GitHub, BitBucket or GitLab.

Application overview

The SparkPi microservice that you will build during this tutorial is quite simple in its design and functionality. SparkPi is deployed as a single server pod and several Apache Spark pods. The microservice provides an HTTP server that accepts GET requests and responds with an estimation of Pi. It does this by using Apache Spark to calculate the value by using a Monte Carlo method.

Note that usually you would not run extended calculations during an HTTP request unless the calculations can be completed quickly. However, this simplifies the implementation details. As you build further applications from the material provided here, keep this simplification of design in mind.

Regardless of the implementation language or the HTTP framework, all of the tutorial applications linked from this page have the same general architecture. The following diagram shows the major components involved with deployment and usage of the microservice:

Component key

  1. Computer with a web browser or other method for making HTTP requests

  2. The "SparkPi" microservice

  3. An Apache Spark cluster

The SparkPi component is the main component that you will be concerned with throughout the tutorial. You should already have a computer with the necessary tools installed for making Pi requests, editing your source files and commanding OpenShift to deploy your microservice. The Apache Spark cluster is created automatically for you by the Oshinko source-to-image tools.

Source-to-image technology is fundamental to the operation of this tutorial. Source-to-image provides a convenient way for you to produce container images directly from your source repositories. This work is handled by language-specific builder images which can ingest your source code, assemble it and then produce an application image which is ready to be deployed. In the case of the Oshinko source-to-image builders, an Apache Spark cluster is also created dynamically for your application when it runs.

Setup your coding environment

To use the source-to-image workflow you will need to have a Git repository to store your work. This repository is then used by the source-to-image builders to create the final application image which will be run on OpenShift. For the purposes of this tutorial your repository must be accessible by the OpenShift cluster. The easiest way to do this is by using an external service like GitHub, BitBucket, or GitLab.

Create a new repository on your preferred platform and make a local clone of it. For more information see BitBucket’s documentation and GitHub’s documentation.

With your freshly minted repository you are ready to start coding!

Choose your language and technology stack

This tutorial was developed for multiple languages and HTTP frameworks. Functionally, all of these microservices work in the same manner with similar results to an end user. In this way they represent a black box style microservice which can be replaced with another component that satisfies the same input and output requirements.

Each of these language specific tutorials provide instructions on how to structure your repository and what files to create. You will also learn how each microservice works and the commands necessary to build and launch them in OpenShift. Because the source-to-image templates that you will use are language specific, you must choose a single implementation to exist in your repository. When you are finished, return here to learn about interacting with your microservice and exposing it outside of OpenShift. Choose one of the following language and framework options:

Become a user

At this point you should have created a code repository for your microservice, populated the repository with source files, built an application image, and launched that image on OpenShift. The final stage in this tutorial is to expose your microservice outside of OpenShift and begin interacting with it as a user.

The first step in this process is to expose a route to your microservice. OpenShift contains an edge router which associates domain name URIs with services. By default, applications you create through source-to-image have services that can expose routes which will contain their name, the project name and a hostname for the OpenShift server. You can create these routes by using the following command:

oc expose svc/sparkpi

To see the routes available in your project, run oc get routes. This command should return output similar to the following example:

$ oc get route
NAME                      HOST/PORT                                      PATH      SERVICES            PORT       TERMINATION   WILDCARD
cluster-uo7wa9-ui-route   cluster-uo7wa9-ui-route-pi.10.0.1.109.xip.io             cluster-uo7wa9-ui   <all>                    None
sparkpi                   sparkpi-pi.10.0.1.109.xip.io                             sparkpi             8080-tcp                 None

Notice that in addition to the sparkpi route you created earlier that there is also a route prefixed with cluster-. This is the route created automatically by the Oshinko source-to-image tooling and it provides access to the Apache Spark web interface. You can visit either one of these routes in your web browser to inspect the results.

In addition to accessing the SparkPi microservice through the web browser, you can use the curl tool and command scripting to achieve the results. The following commands substitute in the URI for your service and the results should look similar for you:

$ curl http://`oc get routes/sparkpi --template='{{.spec.host}}'`
Python Flask SparkPi server running. Add the 'sparkpi' route to this URL to invoke the app.

$ curl http://`oc get routes/sparkpi --template='{{.spec.host}}'`/sparkpi
Pi is roughly 3.140480

You can see that this rough approximation is close, but not quite close enough. Try adding the scale parameter to your URL to see how it affects the outcome.

curl http://`oc get routes/sparkpi --template='{{.spec.host}}'`/sparkpi?scale=5

Continue exploring

You have created and deployed your first radanalytics.io application onto OpenShift. At this point you are beginning to understand the core concepts behind the Oshinko source-to-image tooling. Investigate the other applications and examples in the Tutorials section and also revisit the Get Started page to learn how you can use the Oshinko WebUI to control the Apache Spark clusters in your projects.