LibGuides: Artificial Intelligence Literacy: Concerns

Hallucinations and Misinformation

AI Hallucinations occur when generative AI produces wrong responses to a prompt or query. This could involve: Image by Andrew Martin from Pixabay

Mixing true and false statements
Misstating a detail
Making entirely false claims and fabricating information (Morris, 2024)

IBM describes an AI hallucination as when a large language model (LLM) - often a chatbot - perceives patterns or objects that are nonexistent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate. Image by Andrew Martin from Pixabay

The way AI is trained can have a lot to do with it hallucinating
- AI that is trained on incredibly large language models without "correct" answers can end up hallucinating.
- Generative AI operates using a sophisticated algorithm that examines how humans arrange words online. It doesn't determine what is true or false.
- By identifying patterns in the data, a large language model (LLM) learns to predict the next word in a sequence, functioning similarly to an advanced autocomplete tool.
- Since the internet contains false information, the model can also pick up and replicate those inaccuracies. Additionally, chatbots sometimes fabricate content, creating new text by blending billions of patterns in unpredictable ways. As a result, even if the model were trained solely on accurate data, it might still generate incorrect information (Weise et al., 2023).

Generative AI is great at pattern detection and mimicry. It is trained to make predictions and then generate a reponse to a prompt, based off its training data. Since the internet is filled with people making unfounded and outrageous claims, confidently stating falsehoods, and making things up, it is no surprise that AI trained on the internet would do the same things (Morris, 2024).

IBM. (n.d.). AI hallucinations. https://www.ibm.com/topics/ai-hallucinations

Morris, S. (2024). Exploring AI with critical information literacy. ALA Learning.

Weise, K., Metz, C., Lohr, S., & Grant, N. (2023, May 8). When Chatbots “Hallucinate.” New York Times, 172(59782), B4.

Bad at Math

Surprisingly, many current AI tools struggle with math. Kyle Wiggers, in his TechCrunch article, "Why is ChatGPT so bad at math?" dives into this issue to explain some of AI's shortcomings. He cites two main reasons for this anomaly: tokenization and statistical prediction (Wiggers, 2024).

Tokenization

Generative AI that's primarily been trained on lots of textual information (LLMs) has some logical functionality programmed into it to help distinguish our conversations and pick up on all the nuances and connections we are making. To do this, a process called tokenization takes place, meaning that AI is breaking down words and sentences into meaningful chunks, placing them together in unique ways, and using this logical formula to generate content. So, when it's dealing with numerical data - for example, multiplication - tokenization continues to take place to the detriment of the user. Why is this? Wiggers tells us that AI that do tokenization "don't really know what numbers are" and the numbers get chunked up incorrectly, destroying mathematical relationships (Wiggers, 2024).

Statistical Prediction

Building on the problem of tokenization, statistical prediction is the next inhibitor to AI's ability to solve mathematical problems accurately. Statistical prediction is a large part of what makes generative AI so successful. Patterns emerge in all our various modes of communications, so AI picks up on our cues in the context they are given to guess at what we want...with a high degree of accuracy! However, it's this very predictive quality that causes trouble with math. Currently, many AI models skip important steps in problem-solving, trying to arrive too quickly at based on all its previous encounters. It doesn't help that there isn't one prescribed way to solve math problems, so it's easy for different approaches to be blended incorrectly.

In Summary: AI tools like ChatGPT will eventually become better at solving mathematical problems and filtering through numerical data with the accuracy it does with natural language. For now, it is best to know how to solve mathematical problems in traditional ways, with traditional tools. This is also a reminder to verify the data provided by AI.

Wiggers, K. (2024, October 2). Why is ChatGPT so bad at math? TechCrunch. https://techcrunch.com/2024/10/02/why-is-chatgpt-so-bad-at-math/

Copyright

The main concerns about AI and copyright revolve around issues of ownership, infringement, and the use of copyrighted material to train AI models. Here are some key concerns:

1. Use of Copyrighted Material for AI Training: Many AI models, especially generative ones, are trained on vast amounts of data that may include copyrighted works. The unauthorized use of such material in training can raise legal and ethical questions. Creators may argue that AI companies are benefiting from their work without permission or compensation.

Image by Mohamed Hassan from Pixabay

2. Ownership of AI-Generated Content: AI-generated content blurs the lines of authorship. Since copyright typically protects works created by human authors, there are questions about whether AI-generated works can be copyrighted, and if so, who would own the copyright—the user, the AI developer, or someone else.

3. Potential for Infringement: AI tools can generate content that closely mimics or copies existing works, leading to potential copyright infringement. If an AI model recreates parts of a copyrighted work without significant transformation, it may violate copyright law.

4. Fair Use Doctrine: There are debates about whether the use of copyrighted works to train AI falls under "fair use." While some argue that training an AI model on large datasets for research or transformative purposes could qualify as fair use, others contend that large-scale commercial AI projects do not meet this standard.

5. Right to Protect and Monetize Creations: Artists, writers, and creators are concerned that AI-generated content could undermine their ability to monetize their own work, as AI can produce similar works more quickly and cheaply. This may impact creative industries economically.

6. Data Mining and Licensing: AI companies often rely on data scraping and mining to gather vast amounts of content for training purposes, which may include copyrighted material. Some argue that content creators should be compensated or have the option to license their works for AI training purposes.

Chaudhary, A., & Zhao, J. (2023, April 19). Generative AI has an intellectual property problem. Harvard Business Review. https://hbr.org/2023/04/generative-ai-has-an-intellectual-property-problem

Crawford, L., & Schultz, J. (2023, October 11). Generative AI: Challenges for copyright law. Issues in Science and Technology. https://issues.org/generative-ai-copyright-law-crawford-schultz/

OpenAI. (2024). ChatGPT (Oct 22 version). [Large language model].

World Economic Forum. (2024, January). Cracking the code: Generative AI and intellectual property. https://www.weforum.org/agenda/2024/01/cracking-the-code-generative-ai-and-intellectual-property/

Artificial Intelligence Literacy

The Environment

Privacy

Hallucinations and Misinformation

Bad at Math

Copyright