Read It (Aloud) and Weep: Controversy Surrounds Text-To-Speech Feature of Amazon’s Kindle Reader

By Jack Schecter.

March 2009 IP Update

In February, Amazon unveiled the Kindle 2, the newest version of its popular electronic-book reader. Though the improvements over the first version are mostly incremental, one notable exception has drawn much attention: The Kindle 2 offers a text-to-speech feature that allows the device to convert the text of an e-book into audio. In other words, the new Kindle can read to you.

The Hatch-Waxman Act rewards makers of generic drugs for the risk and expense of litigating challenges to a pioneer drug company’s patents: 180 days of exclusive marketing rights, if the patent challenge succeeds. Procedurally, the generic drug maker files an abbreviated new drug application (ANDA) for drugs that are the bioequivalent of previously approved drugs.
When the approved drug is covered by patents still in force, the ANDA filer submits a so-called Paragraph IV certification, declaring that those patents are invalid. This declaration is itself an act of infringement, entitling either party to commence an infringement lawsuit. Increasingly, pioneer drug companies have settled such litigations with reverse payments. See our May 2008 article.
The courts have taken conflicting views of such payments. The FTC and at least one appeals court regards them as per se antitrust violations. Other courts, applying a “rule of reason,” believe that, as long as the pioneer drug company does not seek to expand the scope of its patent through the settlement process, reverse payments do not offend the antitrust laws.See our December 2008 article.

But will it? The day after the text-to-speech feature of the Kindle 2 was announced, the executive director of the Authors Guild, Paul Aiken, challenged its legality in aninterview with the Wall Street Journal: “They don’t have the right to read a book out loud. That’s an audio right, which is derivative under copyright law.”

Mr. Aiken’s comments on behalf of the Authors Guild drew withering criticism from a number of sources, including the Electronic Frontier Foundation (EFF), a prominent non-profit group that advocates for consumers’ free speech rights in the digital arena. In an analysis published on its website, the EFF warned that, if the Authors Guild’s legal position held sway, “Parents everywhere should be on the look out for legal papers haling them into court for reading to their kids.”

Amazon responded that “Kindle 2’s experimental text-to-speech feature is legal: no copy is made, no derivative work is created, and no performance is being given.” But, to the surprise of many, in that same statement and with little explanation, Amazon reversed course and agreed to “modify our systems so that rightsholders can decide on a title by title basis whether they want text-to-speech enabled or disabled.”

So, if the Authors Guild was blatantly overreaching, why did Amazon back off? As it turns out, the legality of the Kindle 2’s text-to-speech feature is not as clear-cut as Amazon or the EFF suggest.

At first blush, the law offers some support for the Authors Guild’s contention that the text-to-speech feature encroaches on the exclusive right of copyright holders to prepare derivative works [1]. When the Kindle 2 reads aloud the text of a copyrighted e-book, the e-book has arguably been “recast, transformed, or adapted” by the device [2].

The EFF attacked this theory of infringement, arguing that reproducing the text of an e-book in audio using text-to-speech software does not result in a copyrightable derivative work, for two reasons: One, because it fails to add anything original to the underlying work, and, two, because the audio is not stored in a concrete or permanent form. The EFF concluded that the copyright holders’ exclusive rights to prepare derivative works are therefore not infringed.

Despite these worthy arguments, the Authors Guild’s derivative-work theory of infringement is not so easily dismissed.

While the EFF focuses on whether the audio produced by the Kindle 2’s text-to-speech feature would itself qualify for copyright protection, commentators and courts have noted that the act of preparing a derivative work may well infringe a copyright holder’s exclusive rights under the Copyright Act regardless of whether the derivative work itself is entitled to protection under the Act. See Lewis Galoob Toys, Inc. v. Nintendo of America, Inc., 964 F.2d 965, 968 (9th Cir. 1992) (“the Act does not require that the derivative work be protectable for its preparation to infringe.”) (quoting Paul Goldstein, Derivative Rights and Derivative Works in Copyright, 30 J. Copyright Soc’y U.S.A. 209, 231 n. 75 (1983)).

As for the EFF’s argument that the Kindle 2’s text-to-speech functionality adds nothing original to the text of the underlying e-books when it reads them aloud, this is a subjective judgment tied to the nature and quality of the text-to-speech software being used. While such an argument might have been persuasive a decade ago when text-to-speech software was largely limited to flat, robotic speech, the quality of today’s text-to-speech software has progressed far beyond those early limits.

The Kindle 2 offers both male and female voices that are quite advanced compared to early text-to-speech offerings, and it is foreseeable that Amazon will upgrade the Kindle 2’s software to include even better voice options in the future. There are numerous software programs currently in development which will offer greater control over the characteristics of the speech generated [3].

The more advanced text-to-speech software becomes, the less persuasive the EFF’s contention will be that using that technology to read aloud copyrighted works adds nothing original to those underlying works.

Likewise, the Authors Guild has a plausible argument that the Kindle 2 creates unauthorized reproductions of copyrighted e-books in violation of the Copyright Act[4].

The EFF dismisses this argument, insisting that any such copy must be “fixed,” but it concedes that the Kindle 2 invariably copies e-book text and stores some portion of the audio it generates in one or more buffers in RAM. Whether or not such storage of portions of a work in RAM meets the fixation requirement is an open question.

The EFF relies on Cartoon Network LP, et al. v. CSC Holdings, Inc., 536 F.3d 121 (2d Cir. 2008), in which the Second Circuit held that a copy stored in RAM must remain there for a period of more than a transitory duration, and therefore data stored in a RAM buffer for no more than 1.2 seconds fails to meet the fixation requirement. See our September 2008 IP Update article on this decision.

The Second Circuit’s holding in Cartoon Network, however, is a departure from prior cases that defined a work as “fixed” so long as its embodiment endures long enough to permit it to be perceived, reproduced, or communicated. See MAI Systems Corp. v. Peak Computer, Inc., 991 F.2d 511, 517-18 (9th Cir. 1993).

The Cartoon Network plaintiffs petitioned the Supreme Court for certiorari in October 2008, seeking review of the Second Circuit’s holding on the durational aspect of the fixation requirement. On January 9, 2009, the Supreme Court invited the Solicitor General to file a brief stating the position of the United States.

Given the split of authority, the numerous amicus briefs filed in support of the Cartoon Network’s petition, and the invitation to the Solicitor General to file a brief, there is a fair chance that the Supreme Court will soon weigh in on this issue.

If the Supreme Court grants certiorari to review Cartoon Network and reverses the Second Circuit’s holding that buffer copies stored in RAM fail to meet the fixation requirement, the Amazon and EFF arguments in favor of the Kindle 2 would be significantly weakened.

While potential fair use arguments would survive, such a ruling would likely cut off arguments that the Kindle 2 does not violate the exclusive reproduction rights of copyright holders when it reads e-books aloud and places unauthorized copies of e-book text and audio in RAM.

Over and above the legal bases for the Authors Guild’s challenge to the Kindle 2 text-to-speech feature, rights holders also have a potent equitable argument. Authors, who indisputably control the rights to electronic text copies (e-books) and audio copies (audio books) of their copyrighted works, have argued that equity demands that Amazon be prevented from transforming the licensed electronic-text version of an e-book into an unlicensed audio version.

On average, audio books are sold to consumers at much higher prices than e-books, and, correspondingly, the amounts paid to publishers and authors for audio book rights are substantially greater than the compensation received for the e-book rights. By introducing a text-to-speech feature on the Kindle 2 without providing additional compensation to copyright holders, Amazon would have arguably secured for itself – to the detriment of authors and publishers – expensive audio book rights at cheap e-book prices.

In the end, however, the likeliest explanation for Amazon’s decision to reverse course on the Kindle 2’s text-to-speech feature is found neither in the legal merits of the Authors Guild’s claims nor in an altruistic belief that copyright holders should not have their audio book rights devalued.

Instead, the real reason appears in Amazon’s press release announcing its change of position: “We ourselves are a major participant in the professionally narrated audiobooks business through our subsidiaries Audible and Brilliance.” Amazon professes that, given the significant difference in quality between e-books read aloud by the Kindle 2 and professionally produced audio books, it has no fear that audio book sales may suffer in the near term as a result of the Kindle 2’s text-to-speech functionality.

Yet, Amazon is well aware of the rapidly developing state of text-to-speech technology. As a major player in the audio book business, Amazon itself surely has no desire to see its growing e-book business cannibalize its more lucrative audio book business.

[1] See 17 U.S.C. § 106(2) (“the owner of copyright under this title has the exclusive rights to . . . prepare derivative works based upon the copyrighted work”).

[2] The Copyright Act defines “derivative works” to include “any other form in which a work may be recast, transformed, or adapted.” 17 U.S.C. § 101.

[3] Text-to-speech software in development by Loquendo, for example, creates speech complete with contextually appropriate emotion and emphasis. See http://www.loquendo.com/en/demos/demo_tts.htm.

[4] See 17 U.S.C. § 106(1) (“the owner of copyright under this title has the exclusive rights to . . . reproduce the copyrighted work in copies”). The Copyright Act defines “copies” to include “material objects . . . in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device.” 17 U.S.C. § 101