Legal aspects

FAQ

General

How is AI defined and why are we talking mainly about language models here?

Large language models are a form of artificial intelligence that can mimic human intelligence in the form of language production. They use statistical models to analyse large amounts of data and to recognise patterns and connections between words and sentences. Language models like ChatGPT are trained with large amounts of text and are able to interact with questioners, answer questions and write longer texts. For practical reasons, this guide is limited to AIs in so-called language models that can generate text.
How should teachers deal with AI applications in their students?
It is to be expected that text-generating AI applications will play an increasingly important role in scientific work in the near future; therefore, tools relevant to the subject and the task at hand should also be presented in the course of studies and how to use them should be practised.

As teachers, you introduce students to the rules of good scientific practice.

Students should use the current version of the declaration of originality of the institute (Salden & Leschke, 2023, p. 32 f.).

When supervising scientific work, it is particularly important to make it very clear to students at the beginning of the work in what form and in how much detail the use of tools should be documented and whether there are any whitelists of applications that do not require separate documentation (e.g. MS Office up to version X).
Is a language model suitable as a research tool?
Language models are not knowledge databases and do not work like search engines, i.e. they do not search for possible sources on the internet using keywords or entire questions. Rather, the AI derives semantic relationships between words from the text material, which lead to grammatically (morphologically, syntactically and semantically) correct speech output and give the impression that human communication is taking place.

Using ChatGPT as an example, some of the current limitations that need to be considered when using language models for scientific work are shown here:

State of knowledge: The training data of the model (MO) currently ends in January 2022, so no current information can be researched. This aspect of time-limited training data is also relevant for other LLM. Exceptions are systems that also have access to the internet and can include search results from there in their answers, such as Microsoft Bing.

References: The chatbot's answers often do not include references, or include false or non-existent references. Given references should be checked, even if they seem plausible (given doi, real journal titles, etc.).

Hallucinations: The AI is not factually correct, but compiles answers based on probabilities. Since it does not reproduce knowledge, it cannot answer that it does not know something. A real understanding of contexts cannot be assumed. Instead, texts are generated by recombining data that it has learned during training. These factually often incorrect answers are called hallucinations.

It is therefore important that all information provided by language models is carefully checked and substantiated by reliable sources before it is used in scientific papers. Language models can imitate human communication, but they cannot replace expertise (Rouse, 2023 and Wikipedia, 2023).

European law & GDPR

What are the fault lines with regard to the General Data Protection Regulation (GDPR)?

Or to put it another way: How do service providers like OpenAI handle the data entered? And to what extent does the data processing comply with the provisions of the GDPR?

From a data protection perspective, a number of questions regarding the use of text-generating AI remain unanswered. Due to the lack of transparency regarding data processing for purposes that are not precisely defined by service providers such as OpenAI, state data protection officers have initiated a comprehensive audit of this provider and asked for answers to the questions. As long as this audit is ongoing and the questions have not been answered, it is difficult to provide a conclusive assessment.

Personal rights and academic freedom

Why is the use of these services voluntary for students?

Ultimately, every person who uses such a service (especially the free version) must be aware that all data provided to the service provider can be used for their own business purposes. This may also include (personal) login data.

If the services have to be used (on a compulsory basis) as part of lectures/courses, registration with personal real data should be viewed critically. It is irrelevant whether the contact data is business or private.
What should be considered with regard to confidentiality clauses and company secrets?

Even within the freedom of science, there are limits when confidentiality of research data and/or findings has been agreed upon in the context of cooperation with companies or institutions. Such confidentiality clauses are contravened if corresponding information is entered into an AI language model.

Copyright law

Who is the author when texts are generated with an AI tool?
Texts from ChatGPT are newly generated and in the public domain (Salden & Leschke, 2023, p. 26).

The copyright for generated texts may lie with those who generate the prompt (the input) – provided that there is a sufficiently high level of creative design (Salden & Leschke, 2023, p. 25 f.).

It should be noted that texts may be unintentionally plagiarised when ChatGPT reproduces text sequences from the underlying sources 1:1, which can happen especially with very specific word sequences. This also applies to copyrighted texts.
Who owns the copyright of the data entered or generated?
Copyright is tied to a creative or inventive human achievement. Therefore, users retain the copyright to the prompts they enter. This does not affect difficulties that arise from the lack of transparency of many language models regarding the further use of prompts as training data.

Users must check whether data they enter, such as prompts or texts, may be used by the AI providers for their own purposes.
How should AI-generated texts be cited or labelled?
"The extent to which texts that have been created using an AI tool must be labelled accordingly in an academic context depends on whether the examination candidates would otherwise have attempted to deceive or whether scientific misconduct would have occurred. At this point, the examination regulations, statutes or other framework regulations of the universities must be observed." (Salden & Leschke, 2023, p. 29)

"The licence or usage conditions of the respective software may also be relevant at this point. If these stipulate that reference must be made to the use of AI-generated texts, users are obliged to comply with the conditions." (Salden & Leschke, 2023, p. 29)

"How such labelling is to be done regularly depends on the individual case. In this context, it is crucial that it is recognisable to third parties which parts of the text were generated by an AI and to what extent. If the texts have been adopted word for word, it is recommended that the passage be treated similarly to a ‘classic’ quotation. If, on the other hand, the AI programme was used as a source of inspiration or food for thought, a reference at the beginning or end could suffice." (Salden & Leschke, 2023, p. 29)
Do terms such as plagiarism and ghostwriting apply when content is generated using AI applications?

Can texts generated using content-generating AI applications be plagiarised?

The spheres of copyright, good scientific practice and examination regulations are not congruent (Salden & Leschke, 2023, p. 34). Plagiarism is difficult to define in legal terms and certainly cannot be identified with certainty.

The term ghostwriting is just as difficult to grasp, since at least the prompts are generated by the system itself (Salden & Leschke, 2023, p. 35).

Examination regulations & good scientific practice

What is the connection between text-generating AI tools and scientific misconduct?
How should one deal with the suspicion that a text comes from a language model and corresponds to scientific misconduct in its presentation? Texts generated by language models cannot be reliably identified as plagiarism.

"Within the framework of the usual rules of good scientific practice, a violation is regularly defined as follows: 'Scientific misconduct occurs when, in a scientifically relevant context, false information is provided deliberately or through gross negligence, intellectual property of others is violated or their research activities are otherwise impaired.' (HRK)." (Salden & Leschke, 2023, p. 31)

Here, the authors refer to the "Recommendation for Dealing with Scientific Misconduct in Higher Education Institutions" ("HRK Recommendation" 1998, p. 3) and the "Code of Procedure for Suspected Scientific Misconduct" ("MPG Code of Procedure" 1997, amended 2000, p. 4).

"Misrepresentation is (...) understood to mean falsifying or inventing data, but not omitting to state that the text was generated by an AI." (Salden & Leschke, 2023, p. 31)

"the unmarked adoption of AI-generated texts from ChatGPT nevertheless violates the rules of good scientific practice" (Salden & Leschke, 2023, p. 31), because it should be stated in a comprehensible way for third parties "which content arises from one's own thoughts and which sentences have been taken from external sources" (Salden & Leschke, 2023, p. 31)

"It is also conceivable that corresponding sets of rules could declare a certain way of using AI tools compatible with scientific behaviour, for example if a significant amount of intellectual input has been incorporated into the work with the tool." (Salden & Leschke, 2023, p. 32)
What can be deduced from this for the cooperation between students, teachers and departments?
Students should be made aware that this quickly becomes apparent in normal scientific discourse and leads to permanent damage to one's own career opportunities and reputation as a scientist.

Teachers and departments should be advised to provide students with an appropriate and transparent system for documentation (DO) of the use of content-generating AI applications so that they can be used in a legally compliant manner.

Teachers and departments should be advised that it is difficult to prove the use of content-generating AI applications without labelling and that any suspicion must be well founded.

If students are supervised appropriately, especially when writing their theses, it is ensured that not only a thesis has been produced, but also that the necessary knowledge and skills have been acquired.

The use of plagiarism software at Leibniz University is regulated in circular 26/2021. See also https://www.luis.uni-hannover.de/en/services/applications/applikations-hosting/plagiarism-search
What rights do students have when they submit self-written written assignments (AAs) as part of their coursework or as an assessment?
Assessments created by students are "usually protected by copyright. The moment the examiner copies the examination paper into the AI software, duplication takes place." (Salden & Leschke, 2023, p. 37)

"Most examination regulations stipulate that the 'assessment of examination results must be carried out by each examiner (individual assessment) [...] and justified in writing.' " (Salden & Leschke, 2023, p. 36)

"The assessment must be associated with an individual contribution." (Salden & Leschke, 2023, p. 36)

The right to a non-automated assessment (usually derived from the examination regulations "Assessment is carried out by the examiner") allows you to have your own assessment formulated by entering keywords as prompts. The name and enrolment number must not be included (data protection). It goes without saying that the text output must be checked to see if it corresponds to the statement of your own assessment.

Authors and sources

Who compiled the handout?

Compiled by the "AI in Teaching" working group with the participation of Melanie Bartell (Dez. 2 / SG23), Sylvia Feil (ZQS/elsa), Kati Koch (TIB), Jens Krey (Dez. 1 / SG11), Prof. Dr. Marius Lindauer (Institute of Artificial Intelligence), Dr. Katja Politt (German Department), Dr. Inske Preißler (Faculty of Electrical Engineering and Computer Science), Dr. Klaus Schwienhorst (Leibniz Language Center), Felix Schroeder (ZQS/elsa), Prof. Dr. Henning Wachsmuth (Institute of Artificial Intelligence).

This guide will need to be revised in line with ongoing legal and judicial developments. Any questions you may have can be asked in the Stud.IP course "LUH-Forum: Lehre". It contains an internal forum where answers have already been collected and further questions can be asked.
Which sources are cited?
EU Commission (2018) "Artificial Intelligence for Europe". (Retrieved: 16.08.2023)

HRK Recommendation (1998) "On dealing with academic misconduct at universities". (Retrieved: 16.08.2023)

MPG Rules of Procedure (1997, amended 2000) "Rules of Procedure in cases of suspected scientific misconduct". (Retrieved: 16.08.2023)

Rouse, R. (2023) "Large Language Model (LLM)". In: Technopedia. (Retrieved: 28.07.2023)

Salden, P., Leschke, J. (eds., 2023) "Didactic and legal perspectives on AI-supported writing in higher education". (Retrieved: 16.08.2023)

ChatGPT. In: Wikipedia - The free encyclopedia. Editing status: 27.07.2023, 17:25 UTC. (Retrieved: 28.07.2023)

Stöcker, C. (2023) "Machines that read stolen books". In: Der Spiegel (column). (Retrieved: 20.09.2023)

Tips for legally compliant use of AI at LUH

FAQ

General

European law & GDPR

Personal rights and academic freedom

Copyright law

Examination regulations & good scientific practice

Authors and sources

Handouts

LUHKI

Contact