Get Started

2
MK
Last updated 5 days ago

Get started with Clusterone

This guide will get you started with the Clusterone SaaS platform at clusterone.com.

In the coming minutes, we'll walk you through setting up your account, linking your code and data, and training your model.

To keep things simple, we'll show you how to use Clusterone using a ready-to-run demo of a self-driving car simulation.

You can learn more about this example in this tutorial.

What is Clusterone?

Clusterone is deep learning platform that allows you to train your models on distributed GPUs and CPUs without setup or maintenance. Think of it as the operating system for deep learning. Clusterone runs in the cloud, in on-premise installations, or even a combination of the two. We offer a SaaS platform as well as dedicated enterprise installations.

Set up

Before we begin, make sure you have your gear ready:

  • A Clusterone account. Register on clusterone.com if you do not have an account.

  • Python 2.7 or 3.5+

  • The Clusterone Python package. Install it with pip install clusterone.

  • (Optional) A GitHub account. You can register here. In Clusterone, you can create projects based on existing repositories without a GitHub account. However - you'll need to connect your GitHub account to Clusterone to use code from your own private repositories.

The Clusterone command line interface, called just, is installed automatically with the Clusterone Python package. Clusterone also provides a graphical web interface, the Matrix.

Using GitHub with Clusterone

Using GitHub anonymously

Clusterone allows you to use public GitHub repositories in your project out of the box. Linking your own GitHub account isn't necessary in that case.

Linking your GitHub account

Linking your GitHub account allows you to access your private GitHub repositories from within Clusterone. To do this, you need to create a GitHub access token and add it to your Clusterone account.

On GitHub

Log into your GitHub account and navigate to the Personal Access Tokens page in the developer settings. Generate a new token and grant it the repo permissions:

Copy the token when it's created.

On Clusterone

Log into your Clusterone account and open the Matrix. Click on the Clusterone icon in the right top of the screen and select Access Keys from the menu.

On the Access Keys page, add your GitHub username and the access token you created above into the respective fields.

Perfect, you have successfully linked your GitHub account to Clusterone.

For more information on linking GitHub to Clusterone, see here.

Create a Project and Run Code

Create a project

Log into the Matrix and navigate to the Projects page either by clicking the scratchpad icon on the left or the green Projects field on the dashboard. Click the Add New Project button and select Link GitHub repository:

On the next screen, start typing clusterone/self-driving-demo to find the repository. Select it and click Add project to create the project.

To learn more about other ways to create a project, see here.

Create a dataset

For the self-driving car example, you don't have to worry about creating a dataset. We've already uploaded the data for you.

To learn more about how to use data with Clusterone, see here.

In this example we are using comma-as dataset loaded from Clusterone Public Datasets /public

Create a job and run it

Open a command line and log into your Clusterone account:

just login

If this command fails or just isn't recognized by your command line, make sure the Clusterone Python package is installed and has been added to your PATH.

Next, create a job:

just create job distributed \
--project <your username>/self-driving-demo \
--name sdc-first-job \
--docker-image tensorflow-1.11.0-cpu-py35 \
--ps-docker-image tensorflow-1.11.0-cpu-py35 \
--time-limit 1h \
--ps-type t2.small \
--worker-type t2.small \
--command "python -m main --absolute_data_path /public/self-driving-demo-data/" \
--setup-command "pip install -r requirements.txt"

Let's go over the parameters:

  • Here we are creating a distributed job, meaning that we're using multiple nodes of specific types in parallel. If you'd rather run your code on a single machine, use just create job single ... instead.

  • The --project parameter determines the project you want to run code from. You can only run code from one project per job. Be sure to replace <your username>

  • The --command parameter is used to define the bash entrypoint that will be executed by Clusterone. If this parameter is not provided, Clusterone assumes that the command is python -m main. In the self-driving car example, our module is called main.py, so we have to set python -m main.

  • The --setup-command parameter is used to define the bash script that will be executed before command on every node . If this parameter is not provided, Clusterone assumes that the command is

    pip install -r requirements.txt
  • The --docker-image parameter determines the Docker image containing the framework that will be used to run the experiment on worker nodes. See here for a list of available frameworks.

  • The --ps-docker-image parameter determines the Docker image containing the framework that will be used to run the experiment on parameter servers.

  • Clusterone offers a variety of different machines to run jobs on. The type of machine is defined by the --ps-type and --worker-type parameters for parameter servers and worker machines respectively. In case you want to run on a single machine, use the --instance-type parameter. See here for a list of available instance types. If not parameter is specified system will use t2.small by default.

  • The --name parameter is used to give the job a name. Use this name to refer to the job in the just start job command below.

Finally, all that's left to do is starting the job:

just start job <your username>/self-driving-demo/sdc-first-job

View your Job and Results

As soon as the job is started, it will gather the necessary resources and run once all resources are available.

If there are no available machines of your selected type, it will take around 2-3 minutes to provision new machines.

Follow Job Progress

You can follow the progress of your job on the Matrix. Navigate to the self-driving-demo project. Under the Jobs tab, you can see the job listed that you just started. In the State column, you can see the job status.

For more details, click the little plus symbol next to the job. The "Events" tab provides a graphical representation of the startup progress of the job. Four circles allow you to see at a glance if your job has gathered all the resources it needs, or what is still missing.

  • The Pods circle shows how many of the required computational resources have been secured. In the image above, the job is waiting for resources to become available.

  • The Code Cloning circle represents the status of the code copy operation from the source (GitHub in this case).

  • The Data Cloning circle represents the progress of the data cloning operation.

  • The Process Start-Up circle shows you the overall status of the job.

Below the circles you can find a list of worker machine statuses. It contains the type of machine, as well as its state and name. Each row can be expanded by clicking the [+] button - this will display detailed information about the worker status.

The bottom of this view displays a list of events connected to the job, allowing you to track the steps of the experiment execution.

Connect to TensorBoard

Clusterone provides direct access to TensorBoard, TensorFlow's suite of visualization tools.

To add your running job to TensorBoard, toggle the "In TensorBoard" switch. Your job is now available on TensorBoard.

To access TensorBoard, click the TensorBoard button on the top bar. You can observe how well the model trains using the Training_Loss and Validation_Loss curves on the "Scalars" page of TensorBoard.

You can further examine a graph representation of the model on the "Graph" page.

Please note that TensorBoard only officially supports Chrome. If you have trouble displaying TensorBoard in Firefox, Safari, or another browser, try using Chrome instead.

Learn More

In this guide, you have learned how to set up your first project on Clusterone, how to run it, and how you can examine its results. What's next?

If you're looking for another use case example, you can follow our tutorial page. You'll find many more interesting examples like:

And more!

If you want to learn more about a specific part of Clusterone, check out our Documentation Homepage with articles on all the details of running state-of-the-art distributed machine learning models on Clusterone.

Or jump right in and run your own project. If you have any comments, questions, or concerns, please don't hesitate to contact us, we'd love to hear from you!

Join our Slack to get support and tips from the community.