Article Preview
TopWhy Privacy-Preserving Data Mining
The application of machine learning (ML) to important problems in medicine and finance often results in an apparent contradiction: Training the models requires access to large and varied data sets under industry or regulatory expectation that security and privacy will be preserved, even though the size and scope of the data collected makes it attractive to hackers and increases likelihood of malicious or even unintended privacy breaches. Recent news reports have highlighted data security and privacy failures (Armeding, 2018; Cameron, 2017; Subramanian & Malladi, 2020). To mitigate this seeming contradiction and limit data leaks, a popular scheme obfuscates the raw data and applies machine learning on the transformed data, enabling data-driven discovery (“mining”) of insights while ensuring that the data remain private. This scheme which preserves privacy yet maintains data utility and modeling accuracy is called privacy-preserving data mining (Thuraisingham, 2005).
Given the popularity of AI (Siau & Wang, 2020; Wang & Siau, 2019), it is attractive to conceptualize a blockchain’s proof-of-work mathematical problem as a data mining problem. However, proof-of-work is most compelling for blockchain use cases in which the proof of access to resources is a proxy for proof of incorruptibility amongst untrusted potential validators (Nakamoto, 2018). Bitcoin and Ethereum are blockchain networks that exemplify this “trustless,” “permissionless” context. Clearly, raw, un-obfuscated data cannot be provided to third party validators (cryptocurrency miners) to do data mining on such open blockchains; miners may be trusted to do transparent, straightforward validation, but they cannot be trusted with raw data. Hence, our PoUW solves a privacy-preserving data mining problem, not a generic data mining problem using raw data.