Privacy-Preserving Computation – Making Data A Factor Of Production In The Information Erabr>
Data is an increasingly valuable asset for both businesses and individuals. Data is reshaping every aspect of human life. IDC Research estimates the revenue for global data analytics to reach 187 billion USD. In the past two years, the data industry has seen an unprecedented increase in demand for joint analysis and modeling across organizations.
However, due to the replicability of data, it is challenging to trace once it is shared.
The willingness for data sharing and monetization is severely restricted. In the traditional analysis of big data, the centralized collection of data can result in leakage of private information, and other risks.
Privatization requires developers to deploy the model on the server of the data source on site, which is time-consuming and labor-intensive. The algorithm also has potential leaks. Additionally, data privacy regulations are strengthening, and the central government of China has issued a data regulatory statue, which essentially makes data a factor of production for the first time.
In the second half of 2019, the Chinese regulatory authorities issued several consultation drafts and drafts such as Data Security Management Measures, Standards on Illegal Collection and Use of Personal Information on Apps, Trial Measures for the Protection of Personal Financial Information (Data), and several others.
In terms of the problems faced by data sharing, we believe that privacy-preserving computation technology can spotlight separating data ownership from data utility and achieving the goal of “available but invisible.” Imagine that all the parties involved in data analysis cannot see each other’s data, but can perform analysis and even train models together. Subsequently, they can send the final results to the party that pays for the data usage — eliminating the risk of data leakage.
Privacy-preserving computation technologies such as secure multi-party computation (sMPC) and federated learning are maturing. They allow for more intimate and secure data cooperation among institutions in finance, healthcare, and government affairs that involve users’ privacy. On the other hand, the combination of privacy-preserving computation and blockchain technology can ensure the credibility of the input data and make the calculation process invisible to the public. In this article, we will briefly introduce the principles of secure multi-party computation and then elaborate on some potential applications.
The secure multi-party computation (sMPC) technology is also known as secure computation or privacy-preserving computation. sMPC is a significant branch of modern cryptography. It is an attempt to manifest encrypted data storage and transmission and directly constructs operations on ciphertext data.
Specifically, sMPC is performed by countless parties who do not trust each other. They calculate a function determined with consensus, which can ensure the privacy of each participant not to be obtained by others. The article “Protocols for Secure Computation” published by the academic, Yao C. Yao, in 1982 first brought up the Millionaire Problem, and consequently, the concept of sMPC.
Today, sMPC technology is divided into two branches based on the underlying algorithms: one based on Garbled Circuit and the other based on Secret Sharing.
Professor Andre Yao proposed Garbled Circuit and Oblivious Transfer. Put simply, the computation protocol based on Garbled Circuit is more suitable for two-party operations with a fixed number of communication rounds, but it is less scalable. In another type of secure multi-party calculation based on secret sharing, both data input and the calculation of intermediate values exist in the form of secret shares.
The secret sharing technology can divide the private data into two or more shares, and then distribute the random shards to the participating parties. This process protects data privacy and allows multiple parties to compute on the data jointly.
After that, participants can use the homomorphic computing properties existing between the secret shares to calculate and reconstruct the secret shares to obtain the results of private data calculation. sMPC based on secret sharing is currently led and dominated by cryptographers in Europe, such as professor Nigel Smart, Ivan Damgard, and others. It is highly expandable and theoretically supports computation with limited parties. The calculation efficiency is relatively high but with heavy communication overhead.
In June 2019, China Academy of Information and Communication Technology, a unit directly under the Chinese Ministry of Industry and Information Technology, officially released the industry standard “Technical Requirements and Test Methods for Data Circulation Products Based on Secure Multiparty Computation,” with ARPA, Alibaba, Ant Financial, Baidu and other companies participating in the formulation. Internationally, the establishment of the IEEE International standard for Secure Multiparty Computing and the establishment of the MPC Alliance also demonstrate the extension of privacy-preserving computation from academia into industry.
We will share a few of privacy-preserving computation application settings worth exploring.
Secure query of the blacklist of finance and insurance
A blacklist is primarily used to record the bad behavior of individuals or corporate customers.
Each institution maintains a blacklist of multiple businesses, from small businesses to multinational institutions, encompassing a range of aspects from financial transactions to credit records. Sharing and querying blacklists between organizations helps organizations avoid risks, such as long-term borrowing and lending, and long-term fraud insurance.
However, the clear-text sharing of blacklists not only compromises user privacy but also reveals business secrets. Blacklist queries using privacy calculations can reduce institutional risk while protecting privacy. This type of calculation needs to compare two sets of lists and find the overlapping parts. This process should ensure that the participant cannot obtain information other than the result, and can avoid the querying party from obtaining the query conditions.
Marketing conversion rate calculation
Conversion rate calculation, such as the data-sharing problem, is a “privacy intersection problem.” The two companies hold data sets of active users in their respective business spheres. One side has a list of users related to the first activity, such as users who view advertisements on the Internet. The other side has users who have made transactions in the second activity.
For example, imagine a scenario where users have purchased advertising merchandise, and the value associated with each user (such as the user’s expenditure) needs to be shared anonymously. One of the users wants to know the number of users they are sharing with and the sum of related values, but the user does not wish to share any further data. For example, advertisers want to know something, such as:
“What is the total consumption of men under the age of 30?”
They can use privacy calculations to obtain intersections without calculating data and query conditions, and measure indicators such as conversion rates.
Multi-dimensional risk control model
Credit risk control needs to collect data through multiple data sources to perform operations such as decision trees, logistic regression, and random forests.
Starting from the second half of 2019, the state (China) has reformed big data companies that illegally collect and sell personal privacy data. The original data is becoming harder to obtain. Privacy computing can compliantly connect peers and inter-sector companies to form a data alliance — allowing them to perform distributed model inference or training on the premise that all parties’ input data is not leaked. This effectively reduces the risks of long-term credit and fraud. Simultaneously, the parameters of the risk control model will not be exposed in the calculation, which protects the intellectual property rights of the model provider.
If a robust building in the information era is based on the foundation of data, then privacy calculations are the elevators of this building.
The author believes that privacy computing is still in its infancy in China.
It is foreseeable that, with the enhancement of the country ’s regulation of privacy data, enterprises and individuals will pay more attention to their own data value. Privacy-preserving computation will be poised to explode throughout the next decade.
At present, the private computing industry includes not only large enterprises such as Ant Financial, Baidu, and Weizhong Bank but also entrepreneurial enterprises with technical strength such as ARPA and Huakong Qingjiao. National research institutions, such as the China Institute of Information and Communications, and the Central Bank, also play critical roles.
At ARPA, we look forward to participating in how privacy-preserving computation can develop in the future!