Stepping Stones for Self-Learning: Exploring the Use of Multimodal Text- and Image-Making Generative AI Tools

Stepping Stones for Self-Learning: Exploring the Use of Multimodal Text- and Image-Making Generative AI Tools

Shalin Hai-Jew
Copyright: © 2024 |Pages: 58
DOI: 10.4018/979-8-3693-0074-9.ch005
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

One of the themes in the emergence of text- and image-making (multimodal) generative AIs is their value in the learning space, with the vast potential just beginning to be explored by mass humanity. This chapter explores the potential and early use of large language models (LLMs) harnessed for their mass learning, human-friendly conversations, and their efficacies, for self-learning for individuals and groups, based on a review of the literature, system constraints and affordances, and abductive logic. There are insights shared about longitudinal and lifelong learning and foci on co-evolving processes between the human learner and the computing machines and large language models.
Chapter Preview
Top

1. Introduction

In late November 2022, OpenAI shocked the world with its personable and informative chatbot, ChatGPT, dubbed “the poster child of generative AI” (Schäfer, 2023, p. 1). Then, Google rolled out its Bard AI in March 2023. ChatGPT, with its Generative Pretrained Transformer (GPT) aspect encoded in its name, attracted a million users in its first five days of public release and 100 million in the first two months. The tools went truly global, with usage in both developed and developing countries (Kshetri, 2023, p. 16). With web-facing user interfaces and free access (for these initial versions), these were two of the most popular text- and image-making generative artificial intelligence (GAI) tools based on large language models (LLMs). The LLMs do not only create conversation, but they have been harnessed to create various types of digital content (from generic to novel). The advent of “neural networks and learning systems” enable current iterations of machine creativity (Franceschelli, & Musolesi, 2023, p. 3). The speed of the emergence of text-to-image generative AI and other artmaking generative AI tools is explained based on three macro “moments”: “the advent of AI in the middle of the last century; the second at the ‘reawakening’ of a specific approach to machine learning at the turn of this century; the third that documents a rapid sequence of innovations, dubbed ‘clever little tricks, that occurred just across 18 months” leading up to the summer of 2022 (Steinfeld, 2023, p. 211). There have been complex advances by generations of computer scientists to enable arrival at the present moment.

Generative AI is defined as “the group of technologies that automatically generate visual or written content based on text prompts” (Inie, Falk, & Tanimoto, Apr. 2023, p. 1) in ways that mimic human creativity and innovations. Another definition of generative AI is “a class of machine learning (ML) algorithms that can learn from content such as text, images, and audio in order to generate new content” (Sun, Liao, Muller, Agarwal, Houde, Talamadupula, & Weisz, Mar. 2022, p. 212). There is a sense that generative AIs may provide inspirations and “higher quality output” with human inputs combined with the capabilities of the AI (Inie, Falk, & Tanimoto, Apr. 2023, p. 2).

ChatGPT and Bard AI brought global attention to the remarkable achievement of computer scientists to artificial intelligences with smooth human-like language abilities (combined with mass-scale learning). The year 2023 “is characterized by a dizzying array of landmark AI developments” (Leong, 2023, p. 52). In terms of generative AIs, people are trying to anticipate “who will win the race” (Aydın, & Karaarslan, 2023, p. 1) for “supremacy in the market” (Rahaman, Ahsan, Anjum, Rahman, & Rahman, 2023, n.p.). Meanwhile a lot of investor dollars are being invested in this “AI gold rush” (Rudolph, Tan, & Tan, 2023, p. 1). Many see this moment as nothing less than an earthshaking one, with anticipated changes to “many aspects of our lives” (Rathore, 2023, p. 63).

Key Terms in this Chapter

GPT (Generative Pre-Trained Transformer): A type of AI technology designed to communicate textually and verbally with humans.

Text-Making Generative AI: Language models that produce text in various styles in response to human-written text prompts.

Zero-Shot Behavior: The ability of a generative AI model to create new outputs from a statistical distribution that it has not been directly trained on.

Memorization: The ability of a language model to generate the true continuation when choosing the most likely token at every step of decoding” ( Carlini, Ippolito, Jagielski, Lee, Tramer, & Zhang, 2022 , p. 6), which can lead to verbatim output of original text from the training data used to train the large language model.

Generative Search Engines: Online tools that output responses (with in-line source citations) based on user queries.

Parameters (in LLMs): A numerical value that “defines the behavior of the model” based on learning from text, code, images, or other training data (such as word probability, the weight of the linkage between two neurons in a neural network); larger models have more parameters and so have learning about more complexities from the training data.

Pre-Trained Language Models (PLMs): Language models trained on often large curated datasets of natural language training data.

Chain-of-Thought Prompting: An assist to LLMs working complex reasoning problems with humans providing a chain of thought or intermediate reasoning steps in the decomposed process.

Prompt programming (or prompt engineering): The fashioning of instructions (whether single-modal or multi-modal) to use with generative AI models to elicit particular human-desired outputs.

Hallucination (in LLM Systems): The generating of textual contents that may be inaccurate or “based on false assumptions” even if syntactically and semantically correct; considered a bug, not a feature.

Natural Language Processing (NLP): Computational means to understand, translate, and extract meaning from natural human language.

Prompting: Also known as “prompt engineering” in a more formal context, this means submitting an initial or starting instructional input (such as text, visual, or other input); an emergent way that people work with complex AI systems, including generative AI ones, to acquire particular digital outcomes.

Generative Artificial Intelligence (Generative AI): Technologies that enable the generation of text, still visuals, sound effects, music, video, multimedia, and other digital contents based on prompts (whether mono-modal or multi-modal).

Whisper APIs: Application programming interfaces (APIs) that integrate artificial intelligence (AI) in quiet or more hidden ways.

Visual-Making Generative AI: Visual models that produce visuals in various styles in response to text, visual, and multimodal prompts.

Natural language generation (NLG) systems: These are a category of tools which include AI-powered ones.

Perplexity: The (in)accuracy of a language model to predict the next word(s) in a given sequence (where lower perplexity means high capability of language predictivity, and higher perplexity as less accuracy in word prediction).

Attention Mechanism: An aspect of some generative AIs that enable the tool to focus on particular parts of input training data in making target outputs (selectively).

Supercopying: Occasions when large language models take entire passages from training data verbatim that are 1,000 words or more in length (compared to the rare cases of n-grams larger than 10 tokens), perceived as a form of plagiarism or illegal copying.

Misprime: Queries sent to LMMs that can mislead the output or result, such as opening with a (one-word) question followed by a prompt that is unrelated to the initial question (which is the mispriming).

Artificial Intelligence Generated Content (AIGC): Digital contents of various modalities (text, image, video, multimedia, and others) that are originated by generative artificial intelligence (GAI), often based on a human prompt (which may be single-modal or multi-modal).

Complete Chapter List

Search this Book:
Reset