Rights and Licenses
License Categories
The Language Bank of Finland uses the three license categories defined by CLARIN: PUB, ACA and RES. These divide the data sets into three coarse categories based on how widely the resources are made available and how easy it is to get access to them.
Each category can contain resources with a wide range of specific licenses, and thus it is always necessary to read the actual license text in addition to checking the category. For example CC0, GNU GPL and CC BY-NC-SA could all fall under the PUB category, but they are different licenses, the latter two imposing some restrictions on how the data can be used and shared.
In addition to the license terms, laws and regulations must of course be followed. If the data set for example contains personally identifiable information, researchers must follow GDPR, Data protection act and other relevant legislation in their region.
PUB Licenses
PUB is the license category for resources that can be made freely available online. This means that the user does not need to log in or otherwise authenticate themself to access the resource.
Despite the being available, its license can still restrict what can be done with the data set. Many of the PUB licenses for example request attribution, require sharing any derivatives under the same license or forbid commercial use. There might also be further obligations: for example the license for Parliament of Finland session video data set allows redistributing the videos but forbids using them in a way that implies endorsement from Parliament of Finland for the context in which they are placed.
ACA Licenses
ACA is the license category that is used when resource can not be shared to the general public, but should be accessible for the academic community relatively easily. In Language Bank this means that the resource is available to all users who have logged in with an account that is affiliated with a known research institute.
In this category, further restrictions to using the resource and distributing it or derivatives made using it are common.
RES Licenses
RES is the most restrictive license gategory. Resources in this category often contain copyrighted material or personally identifiable information that limits sharing and using them. Thus resource can only be accessed after an application process.
Language Bank Rights (LBR)
Applications for RES-licensed resources are made and processed in the LBR service. New applications can be create both by browsing the resource list in the service, adding desired resource(s) to cart and creating an application or directly from the resource list in kielipankki.fi by clicking the small icon in the “apply” column of a RES-licensed resource.
During the application process, the applicant is at least authenticated, accepts the license terms and submits a research plan. Depending on the applicant’s position and the properties of the data set, application process might also include other steps like getting approval from the applicant’s supervisor or submitting a privacy statement.
Privacy Statement
When the data set contains personally identifiable information, you need to submit a privacy statement. The official, detailed and up-to-date information on the required contents of the statement can be obtained from The Office of the Data Protection Ombudsman, but some of the things the statement must include are for example:
- purpose of the processing of the personal data
- who processes the data
- is the data transferred outside European Economic Area
- how long the data is retained
Many universities and other research institutes publish instructions and/or templates for writing privacy statements for research. For example University of Helsinki data protection instructions, University of Turku research data guide and University of Jyväskylä data privacy page provide such information.
If you need further help, your institute likely offers support data support that can assist you. The Langage Bank offers guidance in some very basic matters, but can not give legal aid. Universities and other research institutes do tend to offer data support though, and they can help you e.g. determine whether your research plan is in agreement with the license for the intended data set and fill in the paper work. For example University of Helsinki data support and University of Turku OpenUTU project list email addresses from which you can get help with your data related questions.
Exercise
Locate your organization’s data support (web page and/or email) and fill it in to the interactive course document.
Sensitive Data
By definition, any data that needs to be protected against unauthorized access is sensitive. The reasons for such need can be for example legal and/or ethical and the data can contain for example:
- Sensitive personal data (health or genetic data, trade union membership, religion, racial or ethnic origin…)
- Sensitive ecological data (exact locations of very old specimens, details about conservation efforts…)
- Confidential data (patented or copyrighted data, trade secrets, national secrets…)
For some data sets it might be sufficient that the access to the data set is allowed only for a limited set of people, it is processed in a reasonably secure environment and not transferred abroad. For other data sets, this might not be sufficient and stricter access limits need to be in place such as limiting access geographically, adopting measures to prevent copying the data or encrypting the data.
At the time of writing, Language Bank does not provide resources that would be so sensitive that special measures such as encryption at rest would be needed. That will likely change in the future, and users might also have sensitive data sets of their own. This can mean any data that needs to be protected due to e.g. legal and/or ethical reasons.
CSC’s Sensitive Data (SD) Services offers services for storing and sharing sensitive data as well as a computing environment to carry out research using it. Additional services for publishing the existence of such data sets and allowing reuse based on applications are being piloted.
This material will not go into details about working with sensitive data, but CSC does provide services for working with data that cannot be processed in the normal computational environments. For more information on processing sensitive research data, see docs.csc.fi and research.csc.fi.