Privacy Preserving Data Mining Models And Algorithms-PDF Free Download

PRIVACY PRESERVING DATA MINING MODELS AND ALGORITHMS

2019 | 1 views | 76 Pages | 365.12 KB

PRIVACY-PRESERVING DATA MINING: MODELS AND ALGORITHMS Edited by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Hawthorne, NY 10532 PHILIP S. YU



PRIVACY PRESERVING DATA MINING
MODELS AND ALGORITHMS
CHARU C AGGARWAL
IBM T J Watson Research Center Hawthorne NY 10532
PHILIP S YU
University of Illinois at Chicago Chicago IL 60607
Kluwer Academic Publishers
Boston Dordrecht London
List of Figures xv
List of Tables xx
Preface xxi
An Introduction to Privacy Preserving Data Mining 1
Charu C Aggarwal Philip S Yu
1 Introduction 1
2 Privacy Preserving Data Mining Algorithms 3
3 Conclusions and Summary 7
References 8
A General Survey of Privacy Preserving Data Mining Models and Algorithms 11
Charu C Aggarwal Philip S Yu
1 Introduction 11
2 The Randomization Method 13
2 1 Privacy Quantification 15
2 2 Adversarial Attacks on Randomization 17
2 3 Randomization Methods for Data Streams 18
2 4 Multiplicative Perturbations 18
2 5 Data Swapping 19
3 Group Based Anonymization 20
3 1 The k Anonymity Framework 20
3 2 Personalized Privacy Preservation 23
3 3 Utility Based Privacy Preservation 24
3 4 Sequential Releases 25
3 5 The l diversity Method 26
3 6 The t closeness Model 27
3 7 Models for Text Binary and String Data 27
4 Distributed Privacy Preserving Data Mining 28
4 1 Distributed Algorithms over Horizontally Partitioned Data
4 2 Distributed Algorithms over Vertically Partitioned Data 31
4 3 Distributed Algorithms for k Anonymity 31
5 Privacy Preservation of Application Results 32
5 1 Association Rule Hiding 32
5 2 Downgrading Classifier Effectiveness 34
vi PRIVACY PRESERVING DATA MINING MODELS AND ALGORITHMS
5 3 Query Auditing and Inference Control 34
6 Limitations of Privacy The Curse of Dimensionality 37
7 Applications of Privacy Preserving Data Mining 38
7 1 Medical Databases The Scrub and Datafly Systems 38
7 2 Bioterrorism Applications 40
7 3 Homeland Security Applications 40
7 4 Genomic Privacy 42
8 Summary 42
References 43
A Survey of Inference Control Methods for Privacy Preserving Data Mining 53
Josep Domingo Ferrer
1 A classification of microdata protection methods 55
2 Perturbative masking methods 58
2 1 Additive noise 58
2 2 Microaggregation 59
2 3 Data swapping and rank swapping 61
2 4 Rounding 62
2 5 Resampling 62
2 6 PRAM 62
2 7 MASSC 63
3 Non perturbative masking methods 63
3 1 Sampling 64
3 2 Global recoding 64
3 3 Top and bottom coding 65
3 4 Local suppression 65
4 Synthetic microdata generation 65
4 1 Synthetic data by multiple imputation 65
4 2 Synthetic data by bootstrap 66
4 3 Synthetic data by Latin Hypercube Sampling 66
4 4 Partially synthetic data by Cholesky decomposition 67
4 5 Other partially synthetic and hybrid microdata approaches 67
4 6 Pros and cons of synthetic microdata 68
5 Trading off information loss and disclosure risk 69
5 1 Score construction 69
5 2 R U maps 71
5 3 k anonymity 71
6 Conclusions and research directions 72
References 73
Measures of Anonymity 81
Suresh Venkatasubramanian
1 Introduction 81
1 1 What is privacy 81
1 2 Data Anonymization Methods 83
1 3 A Classification Of Methods 84
2 Statistical Measures of Anonymity 85
Contents vii
2 1 Query Restriction 85
2 2 Anonymity via Variance 85
2 3 Anonymity via Multiplicity 86
3 Probabilistic Measures of Anonymity 86
3 1 Measures Based on Random Perturbation 87
3 2 Measures Based on Generalization 90
3 3 Utility vs Privacy 93
4 Computational Measures Of Anonymity 94
4 1 Anonymity via Isolation 96
5 Conclusions And New Directions 97
5 1 New Directions 98
References 98
k Anonymous Data Mining A Survey 103
V Ciriani S De Capitani di Vimercati S Foresti and P Samarati
1 Introduction 103
2 k Anonymity 105
3 Algorithms for Enforcing k Anonymity 108
4 k Anonymity Threats from Data Mining 115
4 1 Association Rules 115
4 2 Classification Mining 116
5 k Anonymity in Data Mining 118
6 Anonymize and Mine 120
7 Mine and Anonymize 123
7 1 Enforcing k Anonymity on Association Rules 124
7 2 Enforcing k Anonymity on Decision Trees 127
8 Conclusions 130
Acknowledgments 131
References 131
A Survey of Randomization Methods for Privacy Preserving Data Mining 135
Charu C Aggarwal Philip S Yu
1 Introduction 135
2 Reconstruction Methods for Randomization 137
2 1 The Bayes Reconstruction Method 137
2 2 The EM Reconstruction Method 139
2 3 Utility and Optimality of Randomization Models 141
3 Applications of Randomization 142
3 1 Privacy Preserving Classification with Randomization 142
3 2 Privacy Preserving OLAP 143
3 3 Collaborative Filtering 143
4 The Privacy Information Loss Tradeoff 144
5 Vulnerabilities of the Randomization Method 147
6 Randomization of Time Series Data Streams 149
7 Multiplicative Noise for Randomization 150
7 1 Vulnerabilities of Multiplicative Randomization 151
viii PRIVACY PRESERVING DATA MINING MODELS AND ALGORITHMS
7 2 Sketch Based Randomization 151
8 Conclusions and Summary 152
References 152
A Survey of Multiplicative 155
Perturbation for
Privacy Preserving Data Mining
Keke Chen and Ling Liu
1 Introduction 156
1 1 Data Privacy vs Data Utility 157
1 2 Outline 158
2 Definition of Multiplicative Perturbation 159
2 1 Notations 159
2 2 Rotation Perturbation 159
2 3 Projection Perturbation 160
2 4 Sketch based Approach 162
2 5 Geometric Perturbation 162
3 Transformation Invariant Data Mining Models 163
3 1 Definition of Transformation Invariant Models 163
3 2 Transformation Invariant Classification Models 164
3 3 Transformation Invariant Clustering Models 165
4 Privacy Evaluation for Multiplicative Perturbation 166
4 1 A Conceptual Multidimensional Privacy Evaluation Model 166
4 2 Variance of Difference as Column Privacy Metric 167
4 3 Incorporating Attack Evaluation 168
4 4 Other Metrics 168
5 Attack Resilient Multiplicative Perturbations 169
5 1 Naive Estimation to Rotation Perturbation 169
5 2 ICA Based Attacks 171
5 3 Distance Inference Attacks 172
5 4 Attacks with More Prior Knowledge 174
5 5 Finding Attack Resilient Perturbations 174
6 Conclusion 176
References 176
A Survey of Quantification of Privacy Preserving Data Mining Algorithms 181
Elisa Bertino and Dan Lin and Wei Jiang
1 Metrics for Quantifying Privacy Level 184
1 1 Data Privacy 184
1 2 Result Privacy 189
2 Metrics for Quantifying Hiding Failure 190
3 Metrics for Quantifying Data Quality 191
3 1 Quality of the Data Resulting from the PPDM Process 191
3 2 Quality of the Data Mining Results 196
4 Complexity Metrics 198
5 How to Select a Proper Metric 199
6 Conclusion and Research Directions 200
References 200
Contents ix
A Survey of Utility based 205
Privacy Preserving Data
Transformation Methods
Ming Hua and Jian Pei
1 Introduction 206
1 1 What is Utility based Privacy Preservation 207
2 Types of Utility based Privacy Preservation Methods 208
2 1 Privacy Models 208
2 2 Utility Measures 210
2 3 Summary of the Utility Based Privacy Preserving Methods 212
3 Utility Based Anonymization Using Local Recoding 212
3 1 Global Recoding and Local Recoding 213
3 2 Utility Measure 214
3 3 Anonymization Methods 215
3 4 Summary and Discussion 217
4 The Utility based Privacy Preserving Methods in Classification Prob
4 1 The Top Down Specialization Method 218
4 2 The Progressive Disclosure Algorithm 222
4 3 Summary and Discussion 226
5 Anonymized Marginal Injecting Utility into Anonymized Data Sets 226
5 1 Anonymized Marginal 227
5 2 Utility Measure 228
5 3 Injecting Utility Using Anonymized Marginals 229
5 4 Summary and Discussion 231
6 Summary 232
References 232
Mining Association Rules under Privacy Constraints 237
Jayant R Haritsa
1 Problem Framework 238
2 Evolution of the Literature 244
3 The FRAPP Framework 249
4 Sample Results 257
5 Closing Remarks 261
References 261
A Survey of Association Rule Hiding Methods for Privacy 265
Vassilios S Verykios and Aris Gkoulalas Divanis
1 Introduction 265
2 Terminology and Preliminaries 267
3 Taxonomy of Association Rule Hiding Algorithms 268
4 Classes of Association Rule Algorithms 269
4 1 Heuristic Approaches 270
4 2 Border based Approaches 275
4 3 Exact Approaches 276
x PRIVACY PRESERVING DATA MINING MODELS AND ALGORITHMS
5 Other Hiding Approaches 277
6 Metrics and Performance Analysis 279
7 Discussion and Future Trends 282
8 Conclusions 283
References 284
A Survey of Statistical 289
Approaches to Preserving
Confidentiality of Contingency
Table Entries
Stephen E Fienberg and Aleksandra B Slavkovic
1 Introduction 289
2 The Statistical Approach Privacy Protection 290
3 Datamining Algorithms Association Rules and Disclosure Limita
4 Estimation and Disclosure Limitation for Multi way Contingency
Tables 293
5 Two Illustrative Examples 299
5 1 Example 1 Data from a Randomized Clinical Trial 299
5 2 Example 2 Data from the 1993 U S Current Population
Survey 303
6 Conclusions 306
References 307
A Survey of 311
Privacy Preserving Methods
Across Horizontally Partitioned
Murat Kantarcioglu
1 Introduction 311
2 Basic Cryptographic Techniques for Privacy Preserving Distributed
Data Mining 313
3 Common Secure Sub protocols Used in Privacy Preserving Distributed
Data Mining 316
4 Privacy preserving Distributed Data Mining on Horizontally Parti
tioned Data 321
5 Comparison to Vertically Partitioned Data Model 324
6 Extension to Malicious Parties 325
7 Limitations of the Cryptographic Techniques Used in Privacy Preserving
Distributed Data Mining 327
8 Privacy Issues Related to Data Mining Results 328
9 Conclusion 330
References 330
A Survey of 335
Privacy Preserving Methods
across Vertically Partitioned
Contents xi
Jaideep Vaidya
1 Classification 337
1 1 Na ve Bayes Classification 340
1 2 Bayesian Network Structure Learning 341
1 3 Decision Tree Classification 342
2 Clustering 344
3 Association Rule Mining 345
4 Outlier detection 347
4 1 Algorithm 349
4 2 Security Analysis 350
4 3 Computation and Communication Analysis 352
5 Challenges and Research Directions 353
References 354
A Survey of Attack Techniques on Privacy Preserving Data Perturbation 357
Kun Liu Chris Giannella and Hillol Kargupta
1 Introduction 358
2 Definitions and Notation 358
3 Attacking Additive Data Perturbation 359
3 1 Eigen Analysis and PCA Preliminaries 360
3 2 Spectral Filtering 361
3 3 SVD Filtering 362
3 4 PCA Filtering 363
3 5 MAP Estimation Attack 364
3 6 Distribution Analysis Attack 365
3 7 Summary 366
4 Attacking Matrix Multiplicative Data Perturbation 367
4 1 Known I O Attacks 368
4 2 Known Sample Attack 371
4 3 Other Attacks Based on ICA 372
4 4 Summary 373
5 Attacking k Anonymization 374
6 Conclusion 374
Acknowledgments 375
References 375
Private Data Analysis via 381
Output Perturbation
Kobbi Nissim
1 The Abstract Model Statistical Databases Queries and Sanitizers 383
2 Privacy 386
2 1 Interpreting the Privacy Definition 388
3 The Basic Technique Calibrating Noise to Sensitivity 392
3 1 Applications Functions with Low Global Sensitivity 394
4 Constructing Sanitizers for Complex Functionalities 398
4 1 k Means Clustering 399
xii PRIVACY PRESERVING DATA MINING MODELS AND ALGORITHMS
4 2 SVD and PCA 401
4 3 Learning in the Statistical Queries Model 401
5 Beyond the Basics 403
5 1 Instance Based Noise and Smooth Sensitivity 403
5 2 The Sample Aggregate Framework 405
5 3 A General Sanitization Mechanism 406
6 Related Work and Bibliographic Notes 407
References 408
A Survey of Query Auditing Techniques for Data Privacy 413
Shubha U Nabar Krishnaram Kenthapadi Nina Mishra and Rajeev Motwani
1 Auditing Aggregate Queries 414
1 1 Offline Auditing 415
1 2 Online Auditing 416
2 Auditing Select Project Join Queries 424
3 Challenges in Auditing 425
4 Reading 427
References 428
Privacy and the Dimensionality Curse 431
Charu C Aggarwal
1 Introduction 431
2 The Dimensionality Curse and the k anonymity Method 434
3 The Dimensionality Curse and Condensation 439
4 The Dimensionality Curse and the Randomization Method 444
4 1 Effects of Public Information 445
4 2 Effects of High Dimensionality 448
4 3 Gaussian Perturbing Distribution 448
4 4 Uniform Perturbing Distribution 454
5 The Dimensionality Curse and l diversity 457
6 Conclusions and Research Directions 458
References 458
Personalized Privacy Preservation 461
Yufei Tao and Xiaokui Xiao
1 Introduction 461
2 Formalization of Personalized Anonymity 464
2 1 Personal Privacy Requirements 464
2 2 Generalization 465
3 Combinatorial Process of Privacy Attack 466
3 1 Primary Case 468
3 2 Non primary Case 469
4 Theoretical Foundation 470
4 1 Notations and Basic Properties 471
4 2 Derivation of the Breach Probability 472
5 Generalization Algorithm 473
Contents xiii
5 1 The Greedy Framework 474
5 2 Optimal SA generalization 476
6 Alternative Forms of Personalized Privacy Preservation 478
6 1 Extension of k anonymity 479
6 2 Personalization in Location Privacy Protection 480
7 Summary and Future Work 482
References 484
Privacy Preserving Data Stream Classification 487
Yabo Xu Ke Wang Ada Wai Chee Fu Rong She and Jian Pei
1 Introduction 487
1 1 Motivating Example 488
1 2 Contributions and Paper Outline 490
2 Related Works 492
3 Problem Statement 493
3 1 Secure Join Stream Classification 493
3 2 Naive Bayesian Classifiers 494
4 Our Approach 495
4 1 Initialization 496
4 2 Bottom Up Propagation 496
4 3 Top Down Propagation 498
4 4 Using NBC 499
4 5 Algorithm Analysis 500
5 Empirical Studies 501
5 1 Real life Datasets 502
5 2 Synthetic Datasets 504
5 3 Discussion 506
6 Conclusions 507
References 508
List of Figures
5 1 Simplified representation of a private table 106
5 2 An example of domain and value generalization hierarchies 107
5 3 Classification of k anonymity techniques 11 108
5 4 Generalization hierarchy for QI Marital status Sex 110
5 5 Index assignment to attributes Marital status and Sex 110
5 6 An example of set enumeration tree over set I 1 2 3
of indexes 111
5 7 Sub hierarchies computed by Incognito for the table in
Figure 5 1 112
5 8 Spatial representation a and possible partitioning b
d of the table in Figure 5 1 113
5 9 An example of decision tree 117
5 10 Different approaches for combining k anonymity and
data mining 118
5 11 An example of top down anonymization for the private
table in Figure 5 1 123
5 12 Frequent itemsets extracted from the table in Figure 5 1 124
5 13 An example of binary table 125
5 14 Itemsets extracted from the table in Figure 5 13 b 126
5 15 Itemsets with support at least equal to 40 a and corre
sponding anonymized itemsets b 126
5 16 3 anonymous version of the tree of Figure 5 9 128
5 17 Suppression of occurrences in non leaf nodes in the tree
in Figure 5 9 130
5 18 Table inferred from the decision tree in Figure 5 17 130
5 19 11 anonymous version of the tree in Figure 5 17 130
5 20 Table inferred from the decision tree in Figure 5 19 131
6 1 Illustration of the Information Loss Metric 147
7 1 Using known points and distance relationship to infer the
rotation matrix 173
xvi PRIVACY PRESERVING DATA MINING MODELS AND ALGORITHMS
9 1 A taxonomy tree on categorical attribute Education 219
9 2 A taxonomy tree on continuous attribute Age 219
9 3 Interactive graph 230
9 4 A decomposition 230
10 1 CENSUS 19 259
10 2 Perturbation Matrix Condition Numbers 19 260
13 1 Relationship between Secure Sub protocols and Privacy
Preserving Distributed Data Mining on Horizontally Par
titioned Data 321
14 1 Two dimensional problem that cannot be decomposed
into two one dimensional problems 339
15 1 Wigner s semi circle law a histogram of the eigenvalues
for a large randomly generated A 361
17 1 Skeleton of a simulatable private randomized auditor 421
18 1 Some Examples of Generalization for 2 Anonymity 433
18 2 Upper Bound of 2 anonymity Probability in an Non
Empty Grid Cell 433
18 3 Fraction of Data Points Preserving 2 Anonymity with
Data Dimensionality Gaussian Clusters 438
18 4 Minimum Information Loss for 2 Anonymity Gaussian
Clusters 444
18 5 Randomization Level with Increasing Dimensionality
Perturbation level 8 o U niDis 456
19 1 Microdata and generalization 463
19 2 The taxonomy of attribute Disease 463
19 3 A possible result of our generalization scheme 467
19 4 The voter registration list 468
19 5 Algorithm for computing personalized generalization 474
19 6 Algorithm for finding the optimal SA generalization 478
19 7 Personalized k anonymous generalization 480
20 1 Related streams tables 489
20 2 The join stream 489
20 3 Example with 3 streams at initialization 496
20 4 After bottom up propagations 498
20 5 After top down propagations 499
20 6 UK road accident data 2001 502
20 7 Classifier accuracy 502


Related Books

Privacy Preserving Data Mining: Techniques, Classification ...

Privacy Preserving Data Mining: Techniques, Classification ...

International Journal of Computer Applications (0975 – 8887) Volume 137 – No.12, March 2016 42 Figure 1: PPDM Classification Hierarchy 4. PPDM TECHNIQUES

Continue Reading...
Classification and Evaluation the Privacy Preserving Data ...

Classification and Evaluation the Privacy Preserving Data ...

based Framework for classification and evaluation of the privacy preserving data mining techniques. Based on our framework the techniques are divided into two major groups, namely perturbation approach and anonymization approach. Also in proposed framework, eight functional criteria will be

Continue Reading...
Privacy Preserving Data Mining - researchgate.net

Privacy Preserving Data Mining - researchgate.net

• The problem of privacy-preserving data mining has become more important in recent years because of the increasing ability to store personal data about users, and the increasing sophistication ...

Continue Reading...
Multiple Sensitive Attributes based Privacy Preserving ...

Multiple Sensitive Attributes based Privacy Preserving ...

privacy preserving data mining algorithms and analyses the representative techniques for privacy preserving data mining, and points out their merits and demerits. They also discuss present problems and directions for future research. Batya Kenig, Tamir Tassa [5] proposed modified k-anonymity method. The process of anonymizing a database

Continue Reading...
Swarm Optimization Algorithm for Privacy Preserving in ...

Swarm Optimization Algorithm for Privacy Preserving in ...

Das [19] proposed a scalable, local privacy-preserving algorithm for distributed peer-to-peer (P2P) data aggregation for advanced data mining/analysis like . IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 2, No 3, March 2013 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 47

Continue Reading...
PRIVACY PRESERVING DATA MINING - ethesis

PRIVACY PRESERVING DATA MINING - ethesis

the privacy preserving mining methods modify the original data in some way, so that the privacy of the user data is preserved and at the same time the mining models can be reconstructed from the modified data with reasonably accuracy. Various approaches have been proposed in the existing literature for privacy-preserving data mining which differ

Continue Reading...
Privacy Preserving Data Mining - Stanford University

Privacy Preserving Data Mining - Stanford University

What’s New Here? Common Question: Hasn’t this problem been studied before? 1. Census Bureau has privacy methods. Ad hoc, ill-understood. 2. DB interest recently rekindled, but weak results / de?nitions.

Continue Reading...
Privacy-Preserving Data Mining through Knowledge Model Sharing

Privacy-Preserving Data Mining through Knowledge Model Sharing

Although data mining is typically performed within a single organization (data source), new applications in healthcare, medical research, fraud detection, decision making, national secu-rity, etc., also need to explore data over multiple autonomous data sources. A major barrier to such a distributed data mining is the concern of privacy: data

Continue Reading...
Privacy Preserving Data Mining - Technion

Privacy Preserving Data Mining - Technion

uals’ privacy, can be reconciled, is the focus of this research. We seek ways to improve the tradeo between privacy and utility when mining data. In this work we address the privacy/utility tradeo problem by considering the privacy and algorithmic requirements simultaneously. We take data mining algorithms, and investi-

Continue Reading...
Privacy Preserving Data Mining - pinkas.net

Privacy Preserving Data Mining - pinkas.net

In this paper we address the issue of privacy preserving data mining. Speci?cally, we consider a scenario in which two parties owning con?dential databases wish to run a data mining algorithm on the union of their databases, without revealing any unnecessary information. Our work is motivated

Continue Reading...
Privacy Preserving Data Mining - InTech - Open

Privacy Preserving Data Mining - InTech - Open

and security and data mining research. In privacy-preserving data mining (PPDM), data mining algorithms are analyzed for the side-effects they incur in data privacy, and the main objective in privacy preserving data mining is to develop algorithms for modifying the original data in some way, so that the

Continue Reading...