When it comes to machine learning, is privacy possible?

We’re seeing the advance of AI everywhere we look—from ordinary applications like our voice-operated virtual assistants, to extraordinary innovations that are changing how we optimize operations, predict behaviour, and even diagnose disease.

This presents an unprecedented opportunity for Canada to become a global leader in technology. Canada already produces some of the most sought-after AI talent in the world and has committed upwards of $1 billion in investment.

However, along with opportunity, AI comes with many risks. Among the major concerns is how AI technologies will require new ways of thinking about privacy and policy. Failure to adapt not only puts our personal privacy rights at risk, but could also stifle innovation in AI—and by extension, high value job creation and economic growth for Canada.

I have thought a lot about the emerging challenges introduced by machine learning, how we can protect privacy while still enabling innovation, and how we might even leverage AI to improve privacy protection. After watching this industry closely for some time and speaking to my colleagues at length, here are three lessons I’ve learned.

1. AI amplifies existing privacy issues and creates an entirely new set of concerns

Too often we see that the privacy challenges cited for AI are the same ones identified for big data: the ability to re-identify personal information with large data sets; the challenge of only using the minimum personal information required, in a way that is consistent with the purposes for which it was collected; and the lack of transparency around how consumer data is used, to name a few.

What is so different about these privacy challenges in the context of AI? It’s the same privacy problems but on a bigger scale. The amount of data that machine learning algorithms can now process is unprecedented for three key reasons:

New algorithms read far more data when making predictions. Where traditional models typically can handle 10-50 carefully selected variables, new machine learning algorithms can sort through tens of thousands of variables for a single prediction.
The quantity of data that these models are learning from can be outstandingly large. For example, Baidu uses over 10 decades of audio data to train its voice recognition algorithm.
Deep learning, has opened the door to mining unstructured data (e.g. images, audio, text, video). Unstructured data accounts for 80% of enterprise data, making it extremely difficult—yet imperative—for companies to ensure the security and confidentiality of the information they collect.

As algorithms make inferences based on such large samples, there is a higher likelihood of sensitive information being unintentionally included. It is also much more likely that AI, having access to such massive stores of data, will be capable of discerning that information.

As algorithms make inferences based on such large samples, there is a higher likelihood of sensitive information being unintentionally included.

At the same time, it is challenging to review and de-identify all of the data the algorithms are privy to—and it is proving difficult to audit the decisions and conclusions that algorithms derive, as the most complex models operate in a “blackbox”. Policy is already notorious for lagging behind innovation; now, with AI development moving so quickly, it risks being left in the dust—along with the people who are supposed to hold these systems to account.

While these problems become increasingly delicate and complicated, new privacy issues are arising that have never been dealt with before. Matthew Killi, CCO at Dessa, describes these as falling into two major buckets:

Inferring sensitive personal information from obtuse data. Hidden AI neurons can derive potentially sensitive and discriminatory personal information despite the data set seeming innocuous, limited, and de-identified. Killi has witnessed this firsthand: “I’ve seen examples where you’re looking at résumés you want to de-identify, so you want to make sure sensitive information like gender is removed from it. But the machine will learn to pick up on subtle nuances in the language, and be able to infer the gender of the candidate. So it becomes very difficult to say that you’ve actually stripped that sensitive information out.”
Transfer learning. While AI models may be trained using personal information which is then deleted, personal information can still “live on” when the model is then transferred to another user. It is now common practice to share components of models that are pre-trained on data and incorporated in other models. Developers who use this technique can unwittingly mix sensitive information, or even use this technique to subvert regulation.

2. We can protect privacy and enable AI innovation through good governance and getting technical on privacy

The challenges may seem significant, but it’s entirely possible to ensure protection and compliance while leveraging machine learning. Part of this will involve a shift in mindset: we need to determine more quantitative and precise definitions regarding what is considered personal information and requiring protection.

As VP of Data Governance and Chief Privacy Officer at Symcor, Della Shea understands that good governance is the critical success factor in ensuring that AI is created and used ethically. It needs to be built with “privacy by design” as a core focus from the beginning—a Canadian concept pioneered by Ann Cavoukian, former Privacy Commissioner of Ontario.

It is not good enough to leave policy enforcement to external authorities only.

It is not good enough to leave policy enforcement to external authorities only. Developers and deployers of this technology have to consider the specific types of use cases their algorithms will apply to, and implement rules and structures that will be obeyed at the outset. At the other end of this process, organizations need to have internal centers of excellence to keep track of how their AI is being leveraged.

“The whole point of artificial intelligence is to create new knowledge that is not necessarily possible through regular human design or intervention,” explains Shea, “so ensuring you have governance end-to-end across the entire spectrum is really, really critical.”

Two technology solutions are already paving the way. The first is differential privacy, which is designed to prevent AI from viewing information that isn’t absolutely vital to the algorithm’s success. This technology is not just in the ivory tower of academia; it is being used by the US Census Bureau, Apple, and other companies. The second solution is homomorphic encryption, or secure multi-party computation (SMPC), which harnesses blockchain technology to secure data.

3. Privacy professionals need to use AI or risk disruption

In an almost paradoxical twist, there are AI technologies that can be programmed to protect privacy, as Shea does at Symcor. An automated system called PACT conducts real-time data inventory. PACT has become a part of her team, and she supervises and manages it the same way she would an actual team member. By being accountable and answerable to a human superior while handling a range of compliance processes, PACT is a real-world instance of what a responsible AI bot can look like.

As Shea notes, “There’s no humanly possible way we can operate as privacy professionals and data professionals without understanding how to leverage the technology to be able to automate processes, and then use it for that compliance purpose.”

What’s next?

The challenges we’re facing with machine learning aren’t the same as the ones we faced with big data—but too often, they’re approached the same way. We need to find solutions that address the specific privacy risks associated with our latest technological innovations. At PwC, we’ve come up with a three-layer approach to help our clients understand and prepare for the privacy implications of AI: awareness, assessment, and action.

Awareness: It’s important for companies to understand the privacy challenges for AI—from the possibility of transfer learning, to the sensitive inferences that can potentially be derived from little-to-no personal information. When businesses are informed, they are better prepared to mitigate risk, manage their AI, and respect the right to privacy.
Assessment: Trust but verify. Companies leveraging this technology need to address key questions both before and after AI deployment, including:
- Was the dataset the model was trained on used legally, ethically, and fairly (e.g. with consent)?
- What is the potential for the AI to infer sensitive information and how can you test this?
- Under what use cases can the trained model be shared within and across companies?
Action: Once organizations are aware of the concerns and have assessed them, they can enable solutions such as differential privacy, homomorphic encryption, or human oversight to ensure that they are doing their part to develop discerning, socially-responsible AI.

I cannot stress enough that these issues need to be tackled now. Part of this will involve facilitating better dialogue between privacy professionals, data scientists, AI creators, and regulators, so that we can start to interpret not only what our privacy requirements are, but also how we can leverage technology to comply with them.

The stakes are high—in Canada, we have earned our stripes as a world leader in AI and machine learning technologies. But to be a leader in AI, we also need to be a leader in privacy for AI. Without it, we risk losing the significant investment we’ve made in the AI sector and the benefits that come with it.

Photo via Shutterstock