How do I choose my own Apache Spark distribution?

When you install radanalytics.io, your project is set up to use a pre-determined distribution of Apache Spark. You can choose your own Apache Spark distribution with a few additional steps.

Prerequisites
  1. A terminal shell and OpenShift oc tool available with an active login to OpenShift.

  2. An OpenShift project with the radanalytics.io manifest installed, see the Get Started instructions for more help.

  3. An Apache Spark tarball for the release you would like to deploy, like this one. You can download it or use the url. The tarball should have this basic structure:

    • a top-level directory we’ll call TOP

    • binaries under TOP/bin

    • jar files under TOP/jars

    • configuration files under TOP/conf

Procedure
  1. Get the rad-image script from this link, or download it in your shell using curl.

    curl -o rad-image https://radanalytics.io/assets/tools/rad-image
    chmod +x rad-image

    Use the -h option for details on commands and options (hint: not everything is covered here!)

    ./rad-image -h
  2. Build images using the Spark distribution you’ve selected. This will create new complete tags on the radanaltyics.io image streams (it may take a while since it will build multiple images).

    ./rad-image build https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz

    To build images with a tag other than complete, specify the -t TAG option on the build command.

    ./rad-image build -t 2.3.0 https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
  3. Make the radanalytics.io templates and configmaps use the new tag (if you used the -t option, specify that tag instead of complete).

    ./rad-image use complete

    To use the default tags again, specify the -d option.

    ./rad-image use -d

    If you create clusters from the Oshinko WebUI, you can use the avanced configuration tab to specify a custom Spark image that you built with rad-image. Just use the pull spec for your image as the value for the Apache Spark image input box.