I've been using Amazon's GPU instances for running deep neural network's, and have been quite impressed by the ease and cost. Spot bids, where your instance is terminated if the bid price goes above your maximum bid, often go for 5-10x less than on demand instances. Recent advances in deep learning have relied upon large neural networks using high end GPU's, but getting your own hardware is expensive, and a big lock in. GPU instances are quite affordable, scalable, and the spot bid prices reliable enough to minimize the risk of having a training run cut short.

One mystery struck me when I was looking at for the best region to run these instances on. Who is spending 100x the going spot price on g2.2xlarge instances in US East?

g2.2xlarge spot price in US East

Beginning on May 7, and ending on May 29, the spot price for g2.2xlarge instances in us-east-1e was $6.00.

When I restricted my bid to other availability zones.

aws ec2 request-spot-instances --spot-price 0.50 --launch-specification "{\"KeyName\": \"my-key\", \"SecurityGroups\": [\"myip\"],\"ImageId\": \"ami-0f53a04b\",\"InstanceType\": \"g2.2xlarge\", \"Placement\":{\"AvailabilityZone\":\"us-east-1a\"}}"

I found they had no instances at all.

Compare this to the going price of $0.065 in us-west-1, and other regions. On May 29, the price dropped to $2.60, still twenty times the rate in us-west-1. On the twenty-third of July, this price briefly spiked back up to $5.00, then dropped to the price of on demand instances, $0.65.

Why would anyone pay ten times the on demand rate, and one hundred times the spot bid price? On demand instances would be much more reliable, as Amazon can drop your spot instance whenever it wants. Moreover, this customer is taking the entire available supply for US East, Amazon's largest region. Such a huge customer would also be able to make a long term deal for a lower rate.

Looking at these charts, you might notice occasional spikes to $5.00 or $6.00. As this forum thread indicates, these price spikes have been going on for a while. Apparently this is caused by companies that don't want to lose their instance, and are willing to occasionally pay significantly more than the market rate for a long term cheaper solution. Sometimes the spot supply shrinks, and all that is left is a few high end bidders. In many cases they would have saved money, but a spike lasting ten weeks makes this strategy financially inefficient. So why did this happen? Here's a few theories.

  • Some company or government set the price way higher than they expected the price to ever reach (>=$6.00) to prevent being outbid, and losing their instance.
    • Then someone else did the same thing (=$6.00).
    • They accidentally bid against themselves.
  • They simply didn't notice the price spike never went down, and have been paying enormous prices for months.
  • The spot instance availability for this entire time period dropped so much that there are only a few instances being paid for. Still unnecessarily expensive, but if they are running something that can't be interrupted, it might make sense. I still recommend not running critical, month long operations on spot instances.
  • Someone thought they were bidding in cents, not dollars.
  • Actually, they are so price insensitive they don't care about expanding to other regions, using on demand, reserved, or g2.8xlarge instances, or buying their own hardware.

  • It's something Amazon is doing.

    • Maybe Amazon sources on demand instances from the spot pool by simply bidding very high. This would be an unusual hack, but it's not unthinkable.
    • Maybe Amazon doesn't actually have any g2.2xlarge instances in the US East spot pool. They did have them in the on demand pool.
    • It's a bug.

If you have any insight, let me know. I'm asking Amazon on twitter, and I'll update if I learn something new.