Database Anonymization Techniques with Focus on Uncertainty and Multi-Sensitive Attributes

Database Anonymization Techniques with Focus on Uncertainty and Multi-Sensitive Attributes

DOI: 10.4018/978-1-4666-2518-1.ch014
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Publication of Data owned by various organizations for scientific research has the danger of sensitive information of respondents being disclosed. The policy of removal or encryption of identifiers cannot avoid the leakage of information through quasi-identifiers. So, several anonymization techniques like k-anonymity, l-diversity, and t-closeness have been proposed. However, uncertainty in data cannot be handled by these algorithms. One solution to this is to develop anonymization algorithms by using rough set based clustering algorithms like MMR, MMeR, SDR, SSDR, and MADE at the clustering stage of existing algorithms. Some of these algorithms handle both numerical and categorical data. In this chapter, the author addresses the database anonymization problem and briefly discusses k-anonymization methods. The primary focus is on the algorithms dealing with l-diversity of databases having single or multi-sensitive attributes. The author also proposes certain algorithms to deal with anonymization of databases with involved uncertainty. Also, the aim is to draw attention of researchers towards the various open problems in this direction.
Chapter Preview
Top

Literature Survey

To handle linking disclosure while preserving the integrity of the released data, Samarati and Sweeney proposed the concept of k-anonymity (Samarati et al, 1998). In this approach, data privacy is guaranteed by ensuring that any record in the released data is indistinguishable from at least (k-1) other records with respect a set of attributes called the quasi-identifiers. In later years it was further expanded by Sweeney (1998, 2002a, 2002b) to the context of table releases. While k-anonymity protects against identity disclosure, it does not provide sufficient protection against attribute disclosure. Although the idea of k-anonymity is conceptually straightforward, the computational complexity of finding an optimal solution for the k-anonymity problem has been shown to be NP-hard (Meyerson et al, 2004), even when one considers only the technique of suppression of values (Agrawal et al, 2005; Chiu et al, 2007). In order to obtain k-anonymity, several algorithms have been introduced in recent times (R.Agrawal et al.,2005; Bayardo et al, 2005; Byun et al,2007, LeFevre et al, 2005; Li et al, 2006; Lin et al, 2008; Nergiz et al, 2007; Ng et al 2009; Samarati et al, 2007; Sweeny, 2002). The basic idea in most of these algorithms is that k-anonymization problem can be viewed as a clustering problem. Intuitively, the k-anonymity requirement can be naturally transformed into a clustering problem where we want to find a set of clusters, each of which contains at least k records. In order to maximize data quality, we also want the records in a cluster to be similar to each other as much as possible. This ensures that less distortion is required when the records in a cluster are modified to have the same quasi-identifier value. Some significant contributions in the devise of k-anonymization algorithms are as follows.

Key Terms in this Chapter

Quasi-Identifiers: A set of attributes, whose values together can uniquely identify an individual record.

P-Sensitive Databases: A database having p number of sensitive attributes. P is supposed to be greater than 1.

Adversary: These are users of databases, who are not authorized to know the details of information in the database.

Respondents: These are individuals whose data are expressed in a released database.

Sensitive Attribute: The attributes in a database, whose values the respondents do not want to be disclosed.

Partition: A partition of a set U decomposes it into disjoint subsets whose union is the set U.

Indiscernibility Relation: A relation which identifies a set of data. In other words the data are supposed to be similar with respect to this relation.

Complete Chapter List

Search this Book:
Reset