MosaicML Logo
|
Explorer

Filters

(601 total runs)
Dataset
Baseline
GPT2-125m (A100)
Model

COMPARE

Recipe

COMPARE

Cloud

COMPARE

Hardware

COMPARE

Methods
Perplexity
Time
Cost

Training efficiency frontier

Perplexity

3230282624$50$100$150$200$250Baseline

Cost

  • Frontier

  • Selected

  • Baseline

323028262426m 28s2h 53m5h 20m7h 47mBaseline

Time

Perplexity

3230282624$50$100$150$200$250Baseline

Cost

Perplexity

  • Frontier

  • Selected

  • Baseline

Methods
3 Methods
Cost
$145.83
Perplexity
24.11
Time
4h 27m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.81
Methods
3 Methods
Cost
$148.86
Perplexity
23.94
Time
5h 00m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.86
Methods
2 Methods
Cost
$149.83
Perplexity
23.84
Time
5h 02m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.79
Methods
2 Methods
Cost
$163.46
Perplexity
23.61
Time
5h 29m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.86
Methods
2 Methods
Cost
$177.08
Perplexity
23.51
Time
5h 57m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.93
Methods
2 Methods
Cost
$190.70
Perplexity
23.49
Time
6h 24m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 1.00
Methods
2 Methods
Cost
$136.21
Perplexity
24.14
Time
4h 34m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.71
Methods
3 Methods
Cost
$135.23
Perplexity
24.17
Time
4h 33m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.79
Methods
3 Methods
Cost
$121.60
Perplexity
24.51
Time
4h 05m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.71
Methods
3 Methods
Cost
$107.97
Perplexity
24.95
Time
3h 37m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.64
Methods
3 Methods
Cost
$94.35
Perplexity
25.53
Time
3h 10m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.57
Methods
3 Methods
Cost
$80.72
Perplexity
26.33
Time
2h 42m
  • Model
    • GPT-2 (125M)
  • Cloud
    • Google Compute Platform
  • Instance
    • a2-highgpu-8g-a100-8
  • Scale Schedule Ratio
    • 0.50
Methods
3 Methods
Cost
$76.29
Perplexity
26.7
Time
2h 19m
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.82
Methods
2 Methods
Cost
$70.59
Perplexity
26.96
Time
2h 09m
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.91
Methods
3 Methods
Cost
$65.04
Perplexity
27.13
Time
1h 59m
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.73
Methods
2 Methods
Cost
$61.00
Perplexity
27.18
Time
1h 51m
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.82
Methods
2 Methods
Cost
$51.42
Perplexity
27.63
Time
1h 34m
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.73
Methods
2 Methods
Cost
$41.84
Perplexity
28.36
Time
1h 16m
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.64
Methods
2 Methods
Cost
$32.25
Perplexity
29.35
Time
59m 02s
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.55
Methods
3 Methods
Cost
$31.30
Perplexity
29.76
Time
57m 18s
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.45
Methods
2 Methods
Cost
$22.67
Perplexity
30.6
Time
41m 30s
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.45
Methods
3 Methods
Cost
$20.23
Perplexity
31.31
Time
37m 02s
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.36
Methods
2 Methods
Cost
$14.46
Perplexity
32.51
Time
26m 28s
  • Model
    • GPT-2 (83M)
  • Cloud
    • Amazon Web Services
  • Instance
    • p4d.24xlarge
  • Scale Schedule Ratio
    • 0.36

© 2022 MosaicML

TermsPrivacy Policy