Welcome to the documentation for DC/OS Apache HDFS. DC/OS Apache HDFS is a managed service that makes it easy to deploy and manage an HA Apache HDFS cluster on Mesosphere DC/OS. Apache HDFS (Hadoop Distributed File System) is an open source distributed file system based on Google’s GFS (Google File System) paper. It is a replicated and distributed file system interface for use with “big data” and “fast data” applications.

DC/OS HDFS offers the following benefits:

  • Easy installation
  • Multiple HDFS clusters
  • Elastic scaling of data nodes
  • Integrated monitoring

Features

DC/OS HDFS provides the following features:

  • Single-command installation for rapid provisioning
  • Persistent storage volumes for enhanced data durability
  • Runtime configuration and software updates for high availability
  • Health checks and metrics for monitoring
  • Distributed storage scale out
  • HA name service with Quorum Journaling and ZooKeeper failure detection.

Install and Customize

HDFS is available in the Universe and can be installed by using either the web interface or the DC/OS CLI.…Read More

Uninstall

If you are using DC/OS 1.10 and the installed service has a version greater than 2.0.0-x:…Read More

Quickstart

This tutorial will get you up and running in minutes with HDFS. You will install and configure the DC/OS HDFS package and retrieve the core-site.xml and hdfs-site.xml files. These XML files are used to configure client nodes of the HDFS cluster.…Read More

Connecting Clients

Applications interface with HDFS like they would any POSIX file system. However, applications that will act as client nodes of the HDFS deployment require an hdfs-site.xml and core-site.xml file that provides the configuration information necessary to communicate with the cluster.…Read More

Managing

You can make changes to the service after it has been launched. Configuration management is handled by the scheduler process, which in turn handles deploying DC/OS HDFS Service itself.…Read More

API Reference

The DC/OS HDFS Service implements a REST API that may be accessed from outside the cluster. The parameter referenced below indicates the base URL of the DC/OS cluster on which the HDFS Service is deployed.…Read More

Troubleshooting

The DC/OS HDFS Service is resilient to temporary node failures. However, if a DC/OS agent hosting an HDFS node is permanently lost, manual intervention is required to replace the failed node. The following command should be used to replace the node residing on the failed server.…Read More

Limitations

Out-of-band configuration modifications are not supported. The service’s core responsibility is to deploy and maintain the service with a specified configuration. In order to do this, the service assumes that it has ownership of task configuration. If an end-user makes modifications to individual tasks through out-of-band configuration operations, the service will override those modifications at a later time. For example:…Read More

Supported Versions

…Read More

Release Notes

…Read More

Upgrade

We support upgrade/rollback between adjacent versions only. Concretely, to upgrade from version 2 to version 4, you must upgrade from 2 -> 3, then from 3 -> 4.…Read More