Why A.I. Needs Blockchain To Accelerate Its Growth

Innovation, Opinion | July 9, 2018 By:

Data is so important to the development of artificial intelligence that some have posited progress in A.I. has accelerated recently because there are finally large enough datasets on which to hone the programs. At a technical level, the data requirements for training are enormous: even a relatively simple technique like a decision tree usually needs at least 10 examples in test data for every decision node, and there can be tens of thousands of decision nodes, cumulatively requiring millions of datapoints.

Such voluminous data requirements are reflected by the industry’s leaders. Small A.I. startups are commonly limited in scope; meanwhile, the most astonishing recent advances in A.I. have been driven by the largest tech companies, such as Google and Facebook. Part of the reason is that A.I. development is very expensive and these companies have deep pockets, but equally paramount is the amount of data produced by search, social, and e-commerce businesses. Google and Facebook (and Microsoft and Amazon) have access to huge quantities of data from their users.

And yet, despite the importance of so-called Big Data for the development of A.I., the quality of the data is even more important than the quantity, and there’s one reason: bias. If the data isn’t high-quality, the conclusions drawn from it won’t be, either. Even in these nascent stages, we’ve already seen examples of data-related misinterpretations, which have resulted in headlines like: “Google Mistakenly Tags Black People as ‘Gorillas,’ Showing Limits of Algorithms” or “Twitter taught Microsoft’s AI chatbot to be a racist ***hole in less than a day.”

A.I. developers must be vigilant about several aspects of data quality, and perhaps the most important pertains to unstructured data. A recent, significant evolution in data management has been the ability to teach programs how to understand “messy” information like voice messages, Tweets, or pictures, rather than data contained in a format the program already knows, such as an Excel file. The goal is that a machine will see a picture of the Empire State Building, for instance, and hear it mentioned on an audio file, and understand it’s the same thing.

However, we are only in the beginning phases, and while progress has been made with regard to the meaning of unstructured data, machines are still unsophisticated as to data’s origins. Put it this way: would you give the same weight to a study about smoking by a cigarette company as one by a hospital? Even more abstract, asking why the information has been presented is often just as crucial to reaching a decision. When a person is presented certain choices, he or she can question not only which choice is the best, but why these are the choices in the first place.

Trusting large datasets further requires a thorough vetting of the underlying methodology. For example, if a learning machine were to study polling data, but the polling questions were biased and influenced the respondents’ answers, the patterns inferred by the machine could become biased as well. This is supremely important because we are trusting our health, wealth, homes – and soon, everything else – to decision-making machines.

Given how vital to the creation of A.I. large amounts of good data are, I think it’s worth discussing whether that data should be the domain of a single entity like a corporation. As we’ve already seen, even well-intentioned engineering teams can miss small data miscues that scale to costly mistakes. Costly is the keyword too, as A.I. tends to be a bit of a “black box,” meaning that if a problem is found, many times the only fix is to start from scratch.

Hence, for proper growth, it’s crucial that we endeavor to meld A.I. with the blockchain, as the latter has two big advantages: 1) it’s transparent, so not just a team but the entire world can understand what data is being presented and how; 2) blockchain is decentralized, so there can be no manipulation, unintentional or not, during the training process. A decentralized approach seems especially appropriate because when we say data, we’re often talking about the data of people, who deserve to know how their information is being used.

Encouragingly, it looks like this grand melding might naturally transpire. While we are entrusting A.I. with so many important functions, so too are we discovering ever-new uses for blockchain. Data-intensive blockchain efforts are underway globally: the U.S. state of Delaware is studying how to record securities issuances on the blockchain; the Republic of Georgia is using blockchain to certify and transfer property ownership. With each passing day, a synthesis of the two technologies seems less fantastical and more inevitable.

Even in our early attempts at A.I., we’ve made amazing progress, but as humans hand off more essential tasks, the sausage-making behind machines’ decisions will become more important. A.I. not recognizing someone during a test simulation is one thing; misprescribing medicine, or misdirecting air traffic, because of data-related errors, is quite different. What you put in is what you get out – in this respect, A.I. and people are the same.

About the author: Dr. Dongyan Wang is Chief A.I. Officer at DeepBrain Chain, the world’s first A.I. computing platform powered by the blockchain. Dr. Wang has almost 20 years of experience in A.I. and data science, including at several Fortune 500 companies. Among other accomplishments, Dr. Wang has received numerous industry awards, and has dozens of A.I.-related patents granted or pending.