Data Anonymiser – The road towards Privacy

Sanjay Kumar Das

Based on identification, an information can be categorised into identifiable and non-identifiable. A data is a fact or statement and when value is added to it, the data becomes information. One can casually say that data is the smallest part of information though it may not be such linear in application.

In a colloquial term, one can broadly distinguish between information and data by the following equation: Data + Value = Information and therefore, one is often found saying You have given me an invaluable piece of information.

Therefore, the portion that makes data turn into information is value and that valued portion is bought and sold. A name can’t be unique as it is open to be adopted by one and all. That is why a name is never bought or sold i.e., transacted against consideration. While that very name, when laced with other attributes making it unique, creates value. The following depiction will give a clear idea.

The common name in the first slot is a valueless fact. While it acquires value when laced with a mobile number because it becomes unique. Lastly, adding another unique attribute viz. AADHAR etc. will make it invaluable. Sharing one’s name is an etiquette while sharing identifiable information that creates unique attributes is frightening. Therefore, a casual share of data can become a devastating disclosure of identity.

Thus, information is classified into Personally Identifiable Information and Sensitive Personally Identifiable information.

The locker that doesn’t require any lock is the one which is empty. Otherwise, that doesn’t contain something of value. Similarly, if an information can be devoid of its valuable attributes, it becomes value-less. It doesn’t require any lock or encryption. Or else, if an information is shared or known based on necessity or need, the chances of losing it becomes much less.

Normally, information is acquired for the purposes of identification, verification and promotion.

One’s name doesn’t identify oneself until one discloses other attributes viz. residence, organisation that one belongs to, etc. Secondly, unique identity proof viz. AADHAR etc. allows one to be verified before accessing some restricted service. Lastly, this information is often used for promoting a product or a service. Therefore, restriction of identifiable information can be done by practising Need to Know – to not know any information unless needed. Personally Identifiable Information turns into Sensitive Personally Identifiable information when the former is added with sensitive data pertaining to health, finance etc. If the medicine and its mode of acquisition are known, the patient’s life could easily become vulnerable to compromise leading to unthinkable consequences.

To avoid all these possibilities, understanding that privacy is a fundamental right and to enforce that right, practising privacy hygiene are the two most logical ways.

Everyone has the right to know why being identified, the limits of identification and the authority of the identifier. These three corelate to one’s fundamental right of privacy enshrined in MY INFORMATION – MY RIGHT & RIGHT TO BE FORGOTTEN.

The road ahead lies in technological intervention coupled with process reform and it is called DATA ANONYMISER.

Anonymiser is a strainer which strains attributable data from personally identifiable information making it losing the value but not losing the context. It is NEITHER just redacting NOR masking of data. Because, both redacting as well as masking processes are retractable.

Anonymiser can be an application, a process or a hybrid of both. For example, a person keys in one’s login credentials oblivious to the CCTV behind one’s back. It is clear that the feed stored via that CCTV has the person’s SPII and using that the person’s privacy can easily be violated. How to avoid that? The answer could be repositioning that CCTV. But is it possible on the fly? No. The simplest solution is to reposition the person himself. Similarly, Anonymiser could process reform too. Hence, a single Anonymiser may not be the solution for various Problem Statements. There could be many Anonymiser Solutions.

The State Government of West Bengal in the Department of Information Technology & Electronics has rolled out THE ANONYMISER HACKATHON through which any individual can contribute to the ever-growing REPOSITORY of problem statements via IDEATHON and thereafter come together to solve each of such problem statements by innovative methods comprising application, process reforms et al which will be known as the ANONYMISER.

Also Read | Bengal Spirited Strides in Leveraging Technologies

These solutions will then be taken through the live transactional data reposited with the Government to find out whether they can really strain the valuable attributes laced in the PII/SPII and create an ANONYMISED DATA LAKE.

The anonymised data has a market value of $ 220 Billion as in 2021 that will grow to a whopping $ 343 Billion in 2030 and the tons of data generated by public services on a daily basis in this country could be game changer for the academia, industry as well as the professionals. They will create affordable and accessible solutions for the mass.


Views expressed by: Shri Sanjay Kumar Das, Joint Secretary, Department of Information Technology and Electronics, Government of West Bengal.