Computers are learning to do more than drive cars and predict your next online purchase; they are learning to compose music, translate languages in real time, invest your money, diagnose disease, read MRIs and ultrasounds, and even perform surgeries. The possibilities are truly incredible. Many of the brightest minds in engineering, computer science, medicine, and finance are engaged in one of these fields through academia, startup companies, and big corporations such as IBM, Google, and Amazon. Regardless of the field or forum, the limiting factor to the success of any innovation is the same: access to data.
“Machine learning,” a broad term that encompasses artificial intelligence, deep learning, and data mining, is “a method of data analysis that automates analytical model building. Using algorithms that iteratively learning from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.” For any new application in machine learning to become useful, it needs a massive amount of data to train the machine. This is where big tech firms like Facebook, Google, and Microsoft that with access to abundant data gain a competitive edge compared to smaller startups. “Smaller startups might have good ideas…they might be revolutionary, but without the data to make them work, we’ll never know.” Therefore, success often goes, not to the individual with the best algorithm or model, but to the individual with the most data available.
Some have referred to data as “digital gold” because, in certain fields like medicine, it is more than a competitive edge—it is a barrier to market entry. Before a healthcare startup can raise funds, it needs data—thousands upon thousands of medical records, X-Rays, MRIs, ultrasounds—to determine whether the method or algorithm works. Unfortunately the majority of data, especially sensitive data like medical records, is not public: IBM estimates that 70 percent of the world’s data reside in private databases to protect privacy or trade secrets. With small startups unable to test their ideas and train their machines, many do not get off the ground, and the potentially groundbreaking idea remains nothing more than an idea.
Understanding the value of data, IBM has engaged in a series of billion-dollar acquisitions and now effectively owns the health-related data of approximately 300 million patients. Also in the game, Google has acquired many biotech startups, and most recently, struck a deal with the UK’s National Health Service to gain access to medical information on 1.6 million people. In addition to concerns about the security of sensitive patient information, some regulators and academics have also recognized the possible anti-competitive market implications of a few large companies owning data to the exclusion of others.
German, French, and British authorities have issued reports explaining that competition law may be violated when companies exclude competitors by refusing or limiting access to data that are an “essential facility” to the activities of competitors. Authorities also identified other anti-competitive practices, such as the use of exclusive contracts with third-party data providers or the tying of access to data to the use of the company’s own data analytic services. While EU officials believe that their current competition laws are sufficient to address anti-competitive effects relating to data collection and use, officials continue to investigate and have not ruled out additional government action in the future. While America tends to take a free market approach to these issues, experts from both continents have begun to grapple with the possibility of developing a harmonized transatlantic approach to managing big data in order to optimize opportunities for scientific innovation and progress.