💥 GPT Excitement and AI Costs

data-meets-product
Author

Elias Nema

Published

July 20, 2020

A Week of GPT-3, Obviously

There were so many tweets, articles and general excitement that it became too much even for Sam Altman himself:

I’m, of course, impressed too. In fact, it was able to produce my most liked tweet 😞 (and same happened to some better-known folks):

There are many great articles covering GPT-3 and its meaning for the future of humanity. You can easily find them, but probably don’t need to, because they pop up everywhere.

Hype aside, it’s a huge model and probably costs a fortune, but the whole product part is done in a very lean way. It’s at the market validation phase, which it has nailed perfectly.

Probably, the next step would be to optimize costs and make the economics right for the rest of the internet. Which brings us to the topic of costs, which becomes as relevant as never for the AI/ML community.

💰Costs in AI

  • A very interesting conversation on last week’s TWIML AI podcast about model design optimization for hardware.
  • Another article suggests that despite shaving off 3/4 of errors in logistic optimization prediction with the help of deep learning, a European retailer chose not to use the model because of costs.

Until very recently, DL has been driven by the research in big companies. This means almost unlimited resources. It’s great to validate and/or win the market. But with time, you need to get the unit economics right. For training and serving and smaller devices (a smartphone in 2017 was able to run AlexNet only for 1 hour).

Basically there are multiple directions in ML optimizations:

  • Incorporating power/energy/latency constraints into network architectures search. This “can bring 5‐10x improvement in energy or latency with minimal loss in accuracy or can satisfy real-time constraints for inference”. Basically, by thinking about hardware constraints in advance you can get to almost the same accuracy while saving in the order of magnitudes. An amazing trade-off for most of the businesses.
  • Quantizing neural networks. The idea here is to round model weights to the nearest power of 2, hence allowing using shift and add operations to replace the multiplications. This improves speed and lowers energy consumption. A very smart approach and again, a well-worse trade-off.
  • Energy-aware pruning of NNs: Often both accuracy and latency are important to the application. This work allows you to quickly iterate over accuracy-vs.-speed trade-off for finding a sweet-spot for a particular application using model compression.
  • Discretizing vectors over a d-dimensional sphere: A super-smart approach where instead of adapting an index to the data, the data is adopted to the index itself. “We learn a neural network that aims at preserving the neighborhood structure in the input space while best covering the output space (uniformly)”.

to uniform

These are just the directions I’ve seen recently. But the topic is becoming more and more important. If AI aims to turn into a new cloud, the industry needs to figure out the ways to scale the “state-of-the-art“ to the rest of the internet. And it looks like we are finally getting there.