Without data

You have the most fantastic and revolutionary idea, that is going to change the world. You are stuck! It dawns on you, the data to train your model, simply doesn’t exist.


This paradox, of being without data, would sound familiar to so many. Why is this the case?

You’ve completed the training, done a tonne of experiments with known data. Everything seems fine, the ideas for how to use this newly acquired knowledge, have been brimming away in your subconscious.

Now you are ready to apply your idea to the real world.

Then reality hits! You simply don’t have the data that you need to train a model, that will enable your idea to flourish. To maybe even create your next startup.

This is the biggest challenge many AI startups have, is finding the data they need.

There are basically three ways to get data:

  • download or acquire it, from an external source,
  • generate it, from a simulator; or
  • use sensors in the real world to aggregate a source of your own data.

The last two are very expensive options. Simulators are essential for Deep Reinforcement Learning. Collecting data from sensors, still for the most part, requires augmentation to make it useful, in most Deep Learning scenarios. Not to mention of course, the need for labelling.

So the first is the cheapest, that is to find an existing source of labelled data. There is plenty of known data sets out there.

It all seemed to work so well in the training programs. But why is there not more data? Why is the set of available data sets so limited?

I’ve been asking myself those very questions of late. The best answer I’ve found, is that the Academic researchers are just using the existing sources of labelled data and making incremental improvements on deep learning algorithms. That is, if they invested in creating new sources of data, for new deep learning techniques, they probably wouldn’t get their research done. It is why everything that you’ve learnt just works effortlessly. They’ve been at it for a while now….

So the next question becomes how do I get the data?

You have to create your own data sets! Don’t under estimate the effort needed. Without data you just have an idea!

Leave a comment