This guide will get you started with the Clusterone platform on an on-premise, VPC, or other type of enterprise installation. If you're using the SaaS version of Clusterone at clusterone.com, see this page.
Before you start, make sure you have the following things ready to go:
The web address to your Clusterone installation
The username and password of your Clusterone account
The AWS credentials related to your Clusterone account. You can find them on the Accounts page in the Matrix (if available).
The Clusterone Python package. Install it with
pip install clusterone.
The Clusterone command line interface, called
just, is installed automatically with the Clusterone Python package. Clusterone also provides a graphical web interface, the Matrix.
Linking your GitHub account allows you to access GitHub repositories from within Clusterone. To do this, you need to create a GitHub access token and add it to your Clusterone account.
Log into your GitHub account and navigate to the Personal Access Tokens page in the developer settings. Generate a new token and grant it the
Copy the token when it's created.
Log into your Clusterone account and open the Matrix. On the Account page, select the Keys tab. Click the Add GitHub OAuth Token button and paste the access token you created above. Click Save to store the token.
Perfect, you have successfully linked your GitHub account to Clusterone.
For more information on linking GitHub to Clusterone, see here.
If your installation address starts with
http instead of
https, run this command first:
just config tls disable
Now, point the
just command line interface to your Clusterone installation:
just config endpoint https://example.clusterone.io/
Finally, log into your account:
The pre-uploaded datasets on the SaaS version of Clusterone are not available on your installation. Instead, we'll explain how to upload your own dataset here.
The easiest way to use data with Clusterone is to upload it to an AWS S3 bucket.
First, create a dataset on Clusterone:
just create dataset s3 example-dataset-name
Note that the dataset name can only include letters, numbers, and hyphens. The name further has to be unique throughout all of AWS.
Next, configure the AWS CLI:
aws configure --profile example-profile-name
Use the AWS credential keys you obtained from the Matrix. Leave the other fields blank.
Now, upload your dataset to the S3 bucket:
aws s3 sync /local/path/ s3://example-dataset-name/ --profile example-profile-name
Log into the Matrix and toggle the switch on the left to show your projects. Click the Add Project button and select Link GitHub Repository in the wizard:
On the next screen, type the name of the repository you want to use. Click the button at the bottom right to create the project.
To learn more about other ways to create a project, see here.
To create a job, use the following command:
just create job distributed --project project-name --dataset example-dataset-name --module script-name --name job-name
Let's go over the parameters:
Here we are creating a distributed job, using multiple GPUs in parallel. If you'd rather run your code on a single machine, use
just create job single ... instead.
--project parameter determines the project you want to run code from. You can only run code from one project per job.
--dataset parameter tells Clusterone which dataset to mount. You can add multiple datasets if your code uses them.
--module parameter is used to define which Python file Clusterone should execute. If this parameter is not provided, Clusterone assumes the file is called
--name parameter is used to give the job a name. Use this name to refer to the job in the
just start job command below.
To learn more about available parameters, see here.
Finally, all that's left to do is starting the job:
just start job -p project-name/job-name
-p parameter determines which job to start.
As soon as the job is started, it will gather the necessary resources and run once all resources are available.
You can follow the progress of your job on the Matrix. Click the "See Details" button under the name of your job to see how it's doing.
The "Events" tab provides a graphical representation of the startup progress of the job. Four circles allow you to see at a glance if your job has gathered all the resources it needs, or what is still missing.
The Creation Status tells you if the job has been created.
The Computational Requirements circle lists all required workers and parameter servers. It also contains information if the workers are running or if your job is still waiting for workers to become available.
The Code Cloning circle tell you if the repository code has been successfully cloned onto the worker machines.
The Process Start-Up circle represents the overall status of the job. Once the job has started, it will say "Running".
The "Outputs" tab contains a list of all raw output files that are generated while running the job. Here you can find the log files for each worker, event logs, and more. Click on each file to open it, or follow the download link on the right to download the file.
Clusterone provides direct access to TensorBoard, TensorFlow's suite of visualization tools.
To add your running job to TensorBoard, click the "Add to TensorBoard" button. Your job is now available on TensorBoard.
To access TensorBoard, click the TensorBoard button on the top bar. You can observe how well the model trains using the
Validation_Loss curves on the "Scalars" page of TensorBoard.
You can further examine a graph representation of the model on the "Graph" page.
In this guide, you have learned how to set up your first project on Clusterone, how to run it, and how you can examine its results. What's next?
If you're looking for another use case example, you can follow our DCGAN tutorial. In this more complex example, we run a Deep Convolutional GAN and generate artificial celebrity faces based on the celebA dataset.
If you want to learn more about a specific part of Clusterone, check out our Documentation Homepage with articles on all the details of running state-of-the-art distributed machine learning models on Clusterone.
For any questions that come up, don't hesitate to contact us, we're there to help!