What is text and data mining?

Text and data mining (TDM) is defined by the UK Intellectual Property Office (IPO) as:

“The use of automated analytical techniques to analyse text and data for patterns, trends and other useful information”

Text and data mining usually requires copying works for analysis.

The Law

Since June 2014, reforms to UK Copyright Law included a ‘TDM Copyright Exception’ (section 29A of the Copyright, Designs and Patents Act 1988 (CDPA))

“An exception to copyright exists which allows researchers to make copies of any copyright material for the purpose of computational analysis if they already have the right to read the work (that is, they have “lawful access” to the work). This exception only permits the making of copies for the purpose of text and data mining for non-commercial research.”

“Researchers will still have to buy subscriptions to access material; this could be from many sources including academic publishers.”

The exception permits any published and unpublished in-copyright works to be copied for the purpose of text mining for non-commercial research. This includes sound, film/video, artistic works, tables and databases, as well as data and text, as long as the researcher has lawful access.

Researchers who are based outside the UK, and not affiliated with a UK institution would need to refer to the copyright law of their own jurisdiction for the equivalent exception. If an individual outside the UK is affiliated with a UK institution, and they have lawful access to the original content being mined, this activity is permitted under the exception.

Copies made for the purpose of commercial TDM must have the permission of the rights holder.

Within the context of research projects involving groups of people across institutions, sharing access to a lawfully mined copy is likely to be acceptable as long as each member of the group has lawful access to the original content being mined.

Sharing and Publishing results of TDM

The ability to share outputs is dependent upon to what extent there are copyright or database rights in the derived materials being shared. Database rights can arise where data is arranged in a specific way.

Copyright covers original work that has been created as a result of an intellectual process. There is no copyright in a fact or a collection of facts unless some intellectual rigour has applied in the interpretation or presentation of those facts.

For example, a list of numbers reflecting probabilities against certain key terms, or a count of how often specific words appear in a film/song/text is highly unlikely to contain any copyright or database right from the original dataset.

In such instances, the data can be shared with anyone, irrespective of whether they have access to the original work or what country they are based in.

It is lawful to share the results/conclusions of the text data mining process if the material contains facts only. If the results/conclusions also contain copyright material from the original, then it will still be possible in some circumstances to share the outputs. For example under another copyright exception, the new quotation exception (section 30 of the CDPA), it is possible to share results with individuals who were not lawfully involved in the original computational analysis.

This “fair dealing” exception allows limited quotation of copyright works as long as the length of the quote used does not undermine the legitimate (often commercial) interests of the rights holders. Please see the section on ‘sharing outputs.

Acknowledgement

The TDM exception requires sufficient acknowledgement of the copied works unless an acknowledgement is impractical, i.e. if the TDM process involves the works of many hundreds, or thousands, of contributors, it may be impractical to acknowledge all sources.

Personal Data

This guide only deals with copyright aspects of TDM. When sharing data that can be considered personal data contact the Library Research Support Team at iss-research@swansea.ac.uk for further assistance.

TDM Copyright Checklist

1. Any TDM undertaken by research groups should ensure that all individuals have lawful access to the original work either through their own institution or via registration at the institution where the mining takes place.
2. The TDM exception in UK law permits the act of copying copyright material. In order for the individual researcher to be covered by the TDM exception, the act of copying would need to take place in the UK.
3. Copied works have been acknowledged. If an individual uses defined databases or data-sets, the researcher should make reference to these to point to where the works were obtained.

 

(Source: This article draws heavily on the full JISC advice regarding UK Text and Data Mining Copyright Exception at: https://www.jisc.ac.uk/guides/text-and-data-mining-copyright-exception)


css.php

© Swansea University

Hosted by Information Services and Systems, Swansea University