This application characterizes historical returns of stock data and then simulates many possible outcomes for a given portfolio over a given time horizon. The idea is that you can characterize the value at risk for a given percentage of simulated outcomes (e.g., “there is a 5% chance we’d lose more than $1M USD in the next two weeks”). While you wouldn’t want to use the techniques in this application to guide real investment decisions, it also demonstrates how to use a Jupyter notebook with Apache Spark in OpenShift and you can modify the basic simulation to model security returns in a more sophisticated manner.
The driver for the value-at-risk application is a Jupyter notebook. This notebook connects to a Spark cluster and schedules jobs to perform simulations.
Installing the value-at-risk notebook is straightforward; you simply need to create an OpenShift project, deploy a Spark cluster in that project, and install and run the notebook image.
The easiest way to get a Spark cluster going in your OpenShift project is to
oshinko-deploy.sh. This provides a streamlined installation for
the Oshinko cluster manager into your project. Oshinko is a service to manage
Spark clusters for OpenShift projects.
We’ll need to use a special Spark worker image for this application. This worker image has a small amount of historical stock return data stored in its filesystem. For a real application, you’d want to get data from persistent storage, from a database, or from an object store service. For a demo, though, it’s much easier to package up a small amount of data where each worker can get it.
In order to get this special Spark worker image, we’ll use the
-s flag to
oshinko-deploy.sh -s willb/var-spark-worker
From the OpenShift developer console, visit the Oshinko web interface. Use the
interface to create a new cluster, and take note of what you’ve called this
cluster. The rest of this documentation will assume that your cluster is named
sparky, and so your master URL is
The next step is to add the notebook application. From the OpenShift developer
console, select “Add to Project” and then “Deploy Image.” We’ll be deploying an
image with a particular name, so select “Image Name” and then enter
var-notebook service is visible within the developer console, select
“Create Route” and create a public route to the
targeting port 8888. The public route you create will depend on your DNS
configuration, but you can use the xip.io
service for development and
Verify that the
var-notebook service is running and that your Spark cluster
is running by looking at the developer console. Once each is up and functional,
you can use the notebook by visiting the route you defined earlier in your
Interacting with the Jupyter notebook is very simple. Select a cell with Python code and execute it by pressing the “run cell” button in the toolbar. Before you get started, you’ll need to make sure you’re able to connect to your Spark cluster: in the very first code cell, find the line that looks like this:
spark = SparkSession.builder.master("spark://sparky:7077").getOrCreate()
sparky with whatever name you chose for your Spark cluster.
You can edit the code in the notebook cells and re-run them, and results from cells you’ve already run will be available to new cells. The notebook interface provides a great way to experiment with new techniques. Try it out!
There are lots of ways to extend this application. Here are a few suggestions:
distplotfunction and models from SciPy, find a distribution that is a better fit for the stock returns data.
You can see a demo of installing and running the notebook application in the following video: