A Case For Harmonizing Gen AI And The Copyright Regime
Nikhil Narendran
24 March 2025 11:33 AM IST
Globally, Generative AI (Gen-AI) developers are facing lawsuits from publishers, and India is no exception. Gen-AI developers have been accused of using content generated by publishers & authors to train their AI models and then using it for profits. At the outset, while this sounds like violating copyright protection, there are deeper questions worth considering here.Copyright Is...
Globally, Generative AI (Gen-AI) developers are facing lawsuits from publishers, and India is no exception. Gen-AI developers have been accused of using content generated by publishers & authors to train their AI models and then using it for profits. At the outset, while this sounds like violating copyright protection, there are deeper questions worth considering here.
Copyright Is Not Absolute
One crucial aspect often overlooked in these discussions is that copyright is not an inherent natural right. It is an artificial right granted by law to promote innovation and access. Copyright exists not merely to protect authors and publishers but also to ensure that knowledge is disseminated. It has limitations, including time limits, fair dealing, and the doctrine of first sale, all designed to strike a balance between protection and innovation.
For instance, fair dealing allows using copyrighted materials without permission, particularly for research, education, and commentary. The doctrine of first sale ensures that once a copy of a book is sold, the author's control over that specific copy is restricted.
Such exceptions allow books to be read, referenced and shared, leading to the creation of new technology. These ensure access and prevent information monopolization while ensuring creators receive due recognition and compensation.
Importantly, information and ideas themselves are not protected by copyright—only their specific expression is. This means anyone who reads a book can use the information within it to create another publication without fearing copyright enforcement if it is not substantially similar.
Why Should Large Language Models (LLMS) Be Treated Differently?
Gen-AI models learn similarly to how our brains learn. Once trained, it can understand the hierarchy and relationship between information and reproduce it in a new form. If a student can synthesize information from multiple sources and create something new, why should an AI model be restricted from doing the same?
The courts have seldom stopped a new technology from being adopted to protect copyright. If they had, we would not have several new technological advancements, such as video cassette recorders or even camera phones. At the same time, they have stopped instances of illegal copying and sharing, such as in the case of Napster, while allowing the underlying technology to be used in a non-infringing manner.
Therefore, legal and policy questions in these matters will likely shift from 'AI-generated content being infringing' to whether the AI developer used a 'legally obtained copy' to train the model. This is a question of fact and should not prevent AI developers from training their models using Indian data. If it does, it will not only prevent the adoption of AI in India but also prevent AI from being trained in uniquely Indian data sets, including our regional languages.
The Way Forward
If we value copyright more than more than the development of Gen AI, we will be stifling innovation and preventing progress. If LLMs are denied of copyrighted materials, it will only further misinformation, increase bias and lead to our society being denied of the value of AI. If Indian content including in various regional languages are excluded from Gen AI training, it will exclude u from technological progress.
India stands at a unique crossroads where it can pioneer a balanced approach that promotes Gen AI model training while safeguarding publishers' rights. Publishers worldwide are already struggling to adapt to evolving news distribution models in the digital era. The rise of AI-generated content only adds to their challenges, making it imperative to establish a framework that works for AI developers and content creators.
Most importantly, we should introduce a fair dealing right under our copyright laws for AI training. But such a fair dealing right, if it is unconditional will no doubt, undermine the labor and efforts of authors, journalists and publishers. Hence such a fair dealing right should be subject to the condition that the first copy obtained to train LLMs was obtained legally. Such a move would open vast amounts of Indian knowledge without infringing upon the IP rights of authors and publishers.
Another key point of contention concerns reproduction. If an AI model generates text that closely resembles or directly reproduces parts of a news article, does that amount to intellectual property (IP) infringement? Policymakers must tread carefully here. AI-generated responses should ideally be transformative rather than verbatim reproductions.
The two industries, publishing and technology, would also need to develop the right licensing model for AI training. This will ensure that AI developers can compensate publishers for access to high-quality datasets. One way to do this further is to look at close collaboration through a copyright society model for licensing such works to AI developers.
With its booming AI ecosystem and rich publishing industry across multiple languages, if India can come up with the right model which fosters both innovation and fair compensation, it will not only foster both industries but also go a long way in promoting access to knowledge, not just in English but also in other regional languages.
The author is an Advocate, views are personal.