DC/OS Apache Spark includes The Spark History Server. Because the history server requires HDFS, you must explicitly enable it.

  1. Install HDFS:

    dcos package install hdfs
    

    Note: HDFS requires 5 private nodes.

  2. Create a history HDFS directory (default is /history). SSH into your cluster and run:

    docker run -it mesosphere/hdfs-client:1.0.0-2.6.0 bash
    ./bin/hdfs dfs -mkdir /history
    
  3. Create spark-history-options.json:

     {
       "service": {
         "hdfs-config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
       }
     }
    
  4. Install The Spark History Server:

     dcos package install spark-history --options=spark-history-options.json
    
  5. Create spark-dispatcher-options.json;

     {
       "service": {
         "spark-history-server-url": "http://<dcos_url>/service/spark-history"
       },
       "hdfs": {
         "config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
       }
     }
    
  6. Install the Spark dispatcher:

    dcos package install spark --options=spark-dispatcher-options.json
    
  7. Run jobs with the event log enabled:

    dcos spark run --submit-args="--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs/history ... --class MySampleClass  http://external.website/mysparkapp.jar"
    
  8. Visit your job in the dispatcher at http://<dcos_url>/service/spark/. It will include a link to the history server entry for that job.