Advanced Usage

Welcome to the advanced usage guide for anonymize-data.

Installation

pip

pip install anonymize-data

uv

uv add anonymize-data

Architecture Overview

Note

Under the hood, the library uses a cascading architecture. Data types are dispatched and masked using the appropriate class.

graph TD;
    MaskDict-->MaskDict;
    MaskDict-->MaskList;
    MaskDict-->MaskStr;

graph TD;
    MaskList-->MaskStr;
    MaskList-->MaskList;
    MaskList-->MaskDict;

Fluent API for Dictionaries

When working with dictionaries, you may not want all fields to be masked. You can use the Fluent API method .with_keys() to select precisely which keys should be evaluated.

from anonymizer_data import MaskDict

dict_data = MaskDict({
    "username": "JhonDoe",
    "password": "123Change",
    "roles": ['Admin', 'developer'],
    "contact": {
        "number": "+55 (99) 99999-9999"
    }
}).with_keys(['password', 'number'])

dict_data.anonymize()

print(dict_data)  
# {'username': 'JhonDoe', 'password': '*********', 'roles': ['Admin', 'developer'], 'contact': {'number': '*******************'}}

Unique Data Masking by Keys

The library has a built-in registry of sensitive data types (like CPF, CNPJ, Email, Phone, etc.). By passing key_with_type_mask=True, MaskDict will automatically apply specific format-preserving masks to known sensitive keys.

from anonymizer_data import MaskDict

dict_data = MaskDict({
    "username": "JhonDoe",
    "password": "123Change",
    "roles": ['Admin', 'developer'],
    "contact": {
        "number": "+55 (99) 99999-9999",
        "email": "jhondoe.09@example.com"
    }
}, key_with_type_mask=True)

dict_data.anonymize()

print(dict_data)  
# {'username': '*******', 'password': '*********', 'roles': ['Admin', 'developer'], 'contact': {'number': '*******************', 'email': '*********9@example.com'}}

This unique anonymization is highly robust and applies specialized validation logic before masking.

For example, using the MaskStr class explicitly with the "cpf" mask:

from faker import Faker
from anonymizer_data import MaskStr

fake = Faker('pt_BR')
cpf_mask = MaskStr(fake.cpf(), type_mask='cpf').anonymize()

print(cpf_mask)  
# Result: ***.739.***-**

Each dictionary key is passed as type_mask for the value when masked, so the anonymization happens through MaskStr inherently.

from anonymizer_data import MaskStr

string = MaskStr("+55 (11) 91234-5678", type_mask="phone")
string.anonymize()

print(string)  
# Result: +** (**) *****-*678

Warning

The size_anonymization parameter is only used by the "string" mask type. This parameter has no effect if you pass a specific type_mask like "phone" or "cpf".

Cascading Contexts

The type_mask context cascades to inner structures. Example passing a type_mask to a MaskList:

from anonymizer_data import MaskList

phones = MaskList(["+55 (11) 91234-5678", "123-456-7890", "9876543210"], type_mask="phone")
phones.anonymize()

print(phones)  
# Result: ['+** (**) *****-*678', '***-***-*890', '*******210']

Data Mask Types

The following mask types are supported out-of-the-box:

type_mask	Exemplo anonimizado
string	Hello Word -> **** Word
cpf	123.456.789-10 -> *.456.-*
cpfs	123.456.789-10 -> *.456.-*
cnpj	12.345.678/0001-95 -> ..678/*-
rg	12.345.678-9 -> .345.*-
cep	12345-678 -> ****-678
pis	123.45678.90-1 -> *.678.*-
phone, smartphone, cell_phone, cell_phone_number, celular, telefone, telefone_fixo	* () ****-678, *--890, *******210
email, mail	*******e@gmail.com
username	********
first_name	********
name	********
nome	********
numero	********
number	********
endereco	********
address	********
bairro	********
neighborhood	********
district	********
suburb	********
quarter	********
sexo	********
sex	********
gender	********
raça	********
raca	********
race	********
cor	********
color	********
senha	********
password	********
tipo_sanguineo	********
blood_type	********