Ensuring the privacy of users whose data are used to train Natural Language
Processing (NLP) models is necessary to build and maintain customer trust.
Differential Privacy (DP) has emerged as the most successful method to protect
the privacy of individuals. However, applying DP to the NLP domain comes with
unique challenges. The most successful previous methods use a generalization of
DP for metric spaces, and apply the privatization by adding noise to inputs in
the metric space of word embeddings. However, these methods assume that one
specific distance measure is being used, ignore the density of the space around
the input, and assume the embeddings used have been trained on non-sensitive

In this work we propose Truncated Exponential Mechanism (TEM), a general
method that allows the privatization of words using any distance metric, on
embeddings that can be trained on sensitive data. Our method makes use of the
exponential mechanism to turn the privatization step into a emph{selection
problem}. This allows the noise applied to be calibrated to the density of the
embedding space around the input, and makes domain adaptation possible for the
embeddings. In our experiments, we demonstrate that our method significantly
outperforms the state-of-the-art in terms of utility for the same level of
privacy, while providing more flexibility in the metric selection.

By admin