Safe and secure data use

Safe and secure data use

Last updated 4 February 2021
Last updated 4 February 2021

Guidance for TEOs on anonymising and de-identification of data for safe and secure use of personal information.

Data anonymisation and de-identification

Anonymised data means that all the links between a person and the person's record have been irreversibly broken so that it would be impossible to identify the person in the original record.

De-identification of data means that the personal identifiers in a record have been extracted so that it would be difficult to identify the person in the original record.

De-identified data can be re-identified and be made identifiable again. Anonymised data cannot.

Where possible try to anonymise information before use and ensure that data cannot be reverse-engineered or combined with other datasets later to disclose personal information. 

Where anonymisation or de-identification is not possible, a case-by-case decision should be made balancing the rights of the individuals concerned against the organisation’s needs. 

All data analytics activities must be carried out in compliance with the Privacy Act and with the student’s best interests in mind. If in doubt, seek advice from the data governance board or your privacy officer.

De-identification

Once information is de-identified it is not ‘personal information’. However, this may not completely remove the risk that an individual can be re-identified. For example, another dataset or other information could be matched with the de-identified information.

Generally, de-identification includes three steps:

  1. removing personal identifiers, such as an individual’s name, address, date of birth or other identifying information
  2. removing or altering other information that may allow an individual to be identified, for example, because of a rare characteristic that could enable identification
  3. putting controls and safeguards in place to appropriately manage the risk of re-identification.

De-identification techniques

Consider all relevant factors, including:

  • the kind of information or data that is to be de-identified
  • who will have access to the information, and for what purpose
  • whether it contains unique or uncommon characteristics that could enable re-identification
  • whether it could be targeted for re-identification because of who or what it relates to
  • whether other information or data could be matched up or used to re-identify the de-identified data
  • what harm may result if the information or data is re-identified.

There is sometimes a trade-off here. In some cases modifying the data may reduce its usability. Nevertheless, this may be necessary to minimise the risk of disclosing personal or confidential information.

Examples of de-identification techniques include:

  • sampling — providing access to only some of the total records or data
  • choice of variables — removing quasi-identifiers that are unique to an individual or are likely to identify them when combined with other information
  • rounding — combining information that could identify an individual into categories, eg, express ages in ranges (25–35 years) rather than single years (27, 28)
  • perturbation — altering information that is likely to identify an individual in a small way, so that the aggregate information is not significantly affected
  • swapping — swapping information that may identify an individual with that of another person with similar characteristics to hide its uniqueness.
  • manufacturing synthetic data — creating new values from original data so that the overall totals, values and patterns are preserved, but do not relate to any particular individual
  • encryption or ‘hashing’ of identifiers — obscuring the original identifier, rather than removing it altogether, usually for the purposes of linking different datasets together.