Australian authors have been caught up in a major controversy around the data used to train large language models after an investigation uncovered 183,000 pirated books were included in a commonly used dataset called Books3. Booker Prize winning Australian author Richard Flanagan called it the 'biggest act of copyright theft in history'.
The Australian Society of Authors (ASA) described the revelations as “horrifying” and noted the opacity with which AI systems had been trained made it nearly impossible to know just how much copyrighted material they contained. “Tech companies will charge the end user of their products but will not pay for the labour that enabled it,” said ASA CEO Olivia Lanchester.
Read the Australian Computer Society article