Cheap Data

I went to a tech talk last night called “Genomics at Web Scale.” at Counsyl. [SV Angel is an investor in Counsyl.]

The following is above my pay grade to critique but here goes. There’s a strain of thought in computer science that “data beats algorithms.” That is, having more data is better than having a superior algorithm. For example, some contend that the winners of the Netflix Prize had a relatively simple algorithm but won because they had more data.

Imran Haque, Counsyl’s Director of Research, tweaked this claim. He said that cheap, abundant data leads algorithms. And by doing that, they can lead to technological (mainly software) breakthroughs.

In the mid 1990’s, cheap text created by the Web led to better machine learning, machine translation technologies, AI technologies, PageRank and other breakthroughs. In the mid 2000’s, cheap image data created by cell phones, digital cameras and the like helped actualize arithmetic coding, effective computer vision and quite possibly to break throughs like self-driving cars.

Imran said that the next breakthrough is cheap molecular data. The cost of sequencing one human genome (i.e., 6 billion bits!) cost $9MM in 2007 and $7K in 2013. These costs are falling almost 5x faster than Moore’s Law since 2007. Because of this and advances in other fields like robotics, Imran’s claim is that this will be the most interesting area of computer science for the next 10 years.

Would love any thoughts/comments here.