Advanced Usage
Welcome to the advanced usage guide for anonymize-data.
Installation
pip
pip install anonymize-data
uv
uv add anonymize-data
Architecture Overview
Note
Under the hood, the library uses a cascading architecture. Data types are dispatched and masked using the appropriate class.
graph TD;
MaskDict-->MaskDict;
MaskDict-->MaskList;
MaskDict-->MaskStr;
graph TD;
MaskList-->MaskStr;
MaskList-->MaskList;
MaskList-->MaskDict;
Fluent API for Dictionaries
When working with dictionaries, you may not want all fields to be masked. You can use the Fluent API method .with_keys() to select precisely which keys should be evaluated.
from anonymizer_data import MaskDict
dict_data = MaskDict({
"username": "JhonDoe",
"password": "123Change",
"roles": ['Admin', 'developer'],
"contact": {
"number": "+55 (99) 99999-9999"
}
}).with_keys(['password', 'number'])
dict_data.anonymize()
print(dict_data)
# {'username': 'JhonDoe', 'password': '*********', 'roles': ['Admin', 'developer'], 'contact': {'number': '*******************'}}
Unique Data Masking by Keys
The library has a built-in registry of sensitive data types (like CPF, CNPJ, Email, Phone, etc.). By passing key_with_type_mask=True, MaskDict will automatically apply specific format-preserving masks to known sensitive keys.
from anonymizer_data import MaskDict
dict_data = MaskDict({
"username": "JhonDoe",
"password": "123Change",
"roles": ['Admin', 'developer'],
"contact": {
"number": "+55 (99) 99999-9999",
"email": "jhondoe.09@example.com"
}
}, key_with_type_mask=True)
dict_data.anonymize()
print(dict_data)
# {'username': '*******', 'password': '*********', 'roles': ['Admin', 'developer'], 'contact': {'number': '*******************', 'email': '*********9@example.com'}}
This unique anonymization is highly robust and applies specialized validation logic before masking.
For example, using the MaskStr class explicitly with the "cpf" mask:
from faker import Faker
from anonymizer_data import MaskStr
fake = Faker('pt_BR')
cpf_mask = MaskStr(fake.cpf(), type_mask='cpf').anonymize()
print(cpf_mask)
# Result: ***.739.***-**
Each dictionary key is passed as type_mask for the value when masked, so the anonymization happens through MaskStr inherently.
from anonymizer_data import MaskStr
string = MaskStr("+55 (11) 91234-5678", type_mask="phone")
string.anonymize()
print(string)
# Result: +** (**) *****-*678
Warning
The size_anonymization parameter is only used by the "string" mask type. This parameter has no effect if you pass a specific type_mask like "phone" or "cpf".
Cascading Contexts
The type_mask context cascades to inner structures. Example passing a type_mask to a MaskList:
from anonymizer_data import MaskList
phones = MaskList(["+55 (11) 91234-5678", "123-456-7890", "9876543210"], type_mask="phone")
phones.anonymize()
print(phones)
# Result: ['+** (**) *****-*678', '***-***-*890', '*******210']
Data Mask Types
The following mask types are supported out-of-the-box:
| type_mask | Exemplo anonimizado |
|---|---|
| string | Hello Word -> **** Word |
| cpf | 123.456.789-10 -> ***.456.***-** |
| cpfs | 123.456.789-10 -> ***.456.***-** |
| cnpj | 12.345.678/0001-95 -> **.***.678/****-** |
| rg | 12.345.678-9 -> **.345.****-* |
| cep | 12345-678 -> ****-678 |
| pis | 123.45678.90-1 -> ***.**678.**-* |
| phone, smartphone, cell_phone, cell_phone_number, celular, telefone, telefone_fixo | *** (***) ******-*678, ***-***-*890, *******210 |
| email, mail | *******e@gmail.com |
| username | ******** |
| first_name | ******** |
| name | ******** |
| nome | ******** |
| numero | ******** |
| number | ******** |
| endereco | ******** |
| address | ******** |
| bairro | ******** |
| neighborhood | ******** |
| district | ******** |
| suburb | ******** |
| quarter | ******** |
| sexo | ******** |
| sex | ******** |
| gender | ******** |
| raça | ******** |
| raca | ******** |
| race | ******** |
| cor | ******** |
| color | ******** |
| senha | ******** |
| password | ******** |
| tipo_sanguineo | ******** |
| blood_type | ******** |