Clusterone offers a variety of different instance types to run your models on. On the public platform, Clusterone provides AWS instances. For Clusterone Enterprise, instances from any cloud provider are supported.
There are two main compute options on Clusterone:
blessed instances (default): reliable, with a 99.9% uptime guarantee
spot instances: can be interrupted at any time. Clusterone automatically manages restart of interrupted jobs.
Instances are named after their type. Spot instances are identified by a trailing "-spot":
<instance-type>[-spot]. Naming examples:
The table below provides an overview of the instance types Clusterone offers on its public platform, as well as the supported framework versions. For information on pricing for these instances see our pricing page.
PyTorch, TensorFlow up to v1.4
NVIDIA Tesla V100
PyTorch, TensorFlow 1.5 and above
For further information about each instance type, visit the AWS website.
Spot instances are spare capacity that is sold at a discount but can be interrupted at any time.
On AWS, you lose a spot instance when it is drained and you have to bid for a new one. Clusterone automatically and continuously bids for spot instances, thus ensuring availability for you.
Jobs running on spot instances that are interrupted are automatically restarted, meaning you can run jobs on spot instances without constantly monitoring them. When a spot instance is drained, Clusterone will procure additional instances and resume the job.
Provided you are using checkpoints, even large-scale long-running workloads do not require any setup or monitoring when running on spot instances.