Back to Articles
OAIC Establishes Guidelines for Data Usage in Generative AI Training

iTnews

SKIPPED

Details

Date Published
22 Oct 2024
Priority Score
4
Australian
Yes
Created
8 Mar 2025, 12:37 pm

Authors (1)

Description

Clarifies interpretations of legislation.

Summary

The Office of the Australian Information Commissioner (OAIC) has issued new guidelines on the use of personal data in training generative AI models. These guidelines clarify the application of the Privacy Act's existing legislation to ensure that organizations in Australia adhere to privacy obligations, particularly concerning the use of sensitive information like biometrics and health data. The OAIC emphasizes the need for explicit consent and warns against vague data collection terms and practices like web scraping. This is significant for AI governance, as it reinforces data privacy in AI model development, highlighting Australia's proactive approach to managing AI's potential risks.

Body

Australia's privacy watchdog has drawn lines around how it intends to adjudicate mass ingestion of data into AI models, both by developers and end users. New guidelines issued by the Office of the Australian Information Commissioner (OAIC) clarify interpretations of existing Privacy Act legislation and how it applies to the use of personal - and especially sensitive - information when training or fine-tuning generative AI models or systems. Although the Privacy Act only applies to businesses with $3 million turnover, the OAIC told iTnews the guidelines would cover “all organisations that are using or developing AI products involving personal information in Australia”. An OAIC spokesperson said it is already in the “initial stages of assessing a number of practices and entities for compliance” relating to generative AI but has yet to formally launch any investigations. “If non-compliance comes to our attention, we will consider taking regulatory action," the spokesperson said. "Whether we proceed to take action will depend on a number of factors, as articulated in our statement of regulatory approach.” Under the Privacy Act, organisations can only “collect or use only the personal information that is reasonably necessary for their purpose”. They are then only allowed to use it or share it with third parties for that stated purpose unless “consent or an exception applies”. Organisations that intend to use data for training AI models should “explicitly refer” to it when collecting consents, “rather than relying on broad or vague purposes such as research”, the guidelines state. Directly informing customers or changing privacy terms are unlikely to count as “sufficient” means for establishing consent for using previously acquired data to train AI. Additionally, “developers must consider whether data they intend to use or collect (including publicly available data) contains personal information and comply with their privacy obligations,” the guidance states. "Even if it was made public by the individual themselves, they may not expect it to be collected and used to train an AI model.” For sensitive information, such as biometrics and health data, tighter scrutiny will apply. Under the Privacy Act, sensitive information may only be used in limited circumstances and with consent. As such, organisations may risk using sensitive information without establishing consent when data is scraped from the web, particularly from photographs and recordings. The guidelines also spell out that creating a dataset through web scraping “may constitute a covert and therefore unfair means of collection”, which could also breach the Privacy Act ’s stipulation that data be “collected by lawful and fair means”. “Where a third-party dataset is being used, developers must consider information about the data sources and compilation process for the dataset. They may need to take additional steps (for example, deleting information) to ensure they are complying with their privacy obligations,” the OAIC added. As guidance for future data collection, the OAIC suggested organisations “delete or de-identify personal information or provide individuals with control over the use of their personal information”.