> For the complete documentation index, see [llms.txt](https://book.bsdcn.org/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://book.bsdcn.org/ask/flat/chapter-20-artificial-intelligence/di-20.1-jie-ren-gong-zhi-neng-shu-yu-yu-gai-nian.md).

# 20.1 AI Terminology and Concepts

~~The following is all nonsense; here lies the essence of artificial intelligence and large models: Super Invincible Big Open Door. What radish tissue? \[EB/OL]. (2025-12-21)\[2026-05-11].~~ [~~https://www.bilibili.com/video/BV1Kvq2BiEqT/~~](https://www.bilibili.com/video/BV1Kvq2BiEqT/)~~.~~

## Artificial Intelligence Terminology

The research field of Artificial Intelligence (AI) encompasses robotics, speech recognition, image recognition, Natural Language Processing (NLP), and expert systems, among others. It is both a branch of computer science and a subject of humanities research. At the 1956 Dartmouth Conference, John McCarthy first proposed the concept of "artificial intelligence."

Strong Artificial Intelligence refers to intelligent machines capable of realizing all human cognitive abilities: they can truly reason and solve problems, and possess perception and self-awareness. Weak Artificial Intelligence, on the other hand, relies on human intervention to set learning algorithm parameters and provide training data to ensure accuracy; it only appears intelligent on the surface and does not truly possess perception or self-awareness.

An Automaton is an imitation of human computational behavior, belonging to language recognizers, used to determine whether an object belongs to a certain set and whether a string belongs to a certain language. The simplest automaton consists of an input tape and a finite-state controller, sometimes with an auxiliary storage attached.

A Turing Machine is an enhanced form of automaton, consisting of a finite controller and a read-write tape that can extend infinitely. The fundamental thesis of computability theory is the Church-Turing Thesis: if a function can be computed manually through some algorithm, then it must also be computable on a Turing machine.

Machine Learning (ML) is a subset of artificial intelligence. Computers automatically analyze and synthesize data, facts, or their own experience to acquire knowledge. Its algorithms focus on learning patterns from training data, and then making accurate **inference** on new data. Machine learning, especially deep learning (DL), is the core technology of modern AI systems.

* In Supervised Learning, the expected outputs of training samples are known, and the learning objective is to predict the outputs of new samples. Typical tasks include Classification and Regression.
* In Semi-supervised Learning, the learner independently utilizes a small number of labeled samples and a large number of unlabeled samples for learning.
* Unsupervised Learning refers to learning from samples without class labels, aiming to discover the intrinsic structure of data. Typical tasks include Clustering and Dimensionality Reduction:
  * Clustering partitions a set of unlabeled samples into several clusters, making data within each cluster more similar than data between clusters;
  * Dimensionality Reduction reduces the number of variables under consideration, encompassing strategies such as Feature Extraction and Feature Selection.

Self-supervised Learning uses unsupervised methods to accomplish tasks that typically rely on supervised learning. Self-supervised models do not depend on manually labeled datasets; instead, they generate implicit labels from Unstructured Data. Self-supervised learning involves two types of tasks: **Pretext Tasks** and **Downstream Tasks**. In the pretext task, the model learns meaningful representations of unstructured data; these representations can then be used as input for downstream tasks (such as supervised learning or Reinforcement Learning (RL) tasks). The practice of reusing a pre-trained model on a new task is called "Transfer Learning."

Self-supervised learning is particularly important in fields such as Computer Vision (CV) and Natural Language Processing, the latter encompassing Natural Language Understanding (NLU), generation, and their derivative technologies. These fields require massive amounts of annotated data to train AI models, while constructing annotated datasets consumes considerable time and collecting sufficient data is extremely difficult. Self-supervised methods are more time-efficient and cost-effective, and can partially or entirely replace the manual annotation of training data.

To train a deep learning model to perform high-precision tasks such as classification or regression, it is necessary to compare the model's output **prediction** for a given input against the "correct" **annotation** for that input, commonly known as the Ground Truth. Manually annotated training data serves as the ground truth: this method requires direct human intervention, hence the term "supervised" learning. In self-supervised learning, tasks are designed so that ground truth can be inferred from unlabeled data.

Reinforcement learning is a learning process that maps environmental states to actions, with the objective of maximizing the cumulative reward value that actions obtain from the environment. Reinforcement Learning from Human Feedback (RLHF) further introduces human preference signals to guide the learning direction. Reinforcement learning translates learning into action: it assumes that input data consists of interdependent tuples, i.e., ordered data sequences, and organizes data in the form of "state-action-reward." Reinforcement learning operates through repeated trial-and-error and reward functions, with many of its applications aiming to mimic real-world biological learning through positive reinforcement.

Unlike supervised learning, reinforcement learning does not use labeled examples to indicate correct or incorrect behavior: supervised learning uses manually annotated data to generate **predictions** or classifications. Reinforcement learning also differs from unsupervised learning: unsupervised learning discovers hidden patterns from unlabeled data, whereas reinforcement learning is action-oriented. Self-supervised learning derives pseudo labels from unlabeled training data as a basis for measuring model accuracy. Reinforcement learning, however, is not a classification method but an action learning method; it neither produces pseudo labels nor measures against ground truth.

Deep learning is a machine learning method and the core technology of modern AI systems. In 2006, research on deep belief networks by Hinton et al. drove the resurgence of deep learning. Deep learning aims to study the optimal representation of information and how to acquire it. In neural networks and Belief Networks, deep learning learns input-output mappings based on deep structures or network representations.

A Neural Network (NN) is a nonlinear system that simulates the structure of the human brain, composed of a large number of Artificial Neurons (abstracted and simplified from biological neurons) interconnected according to a topology defined by circuits and mathematical models. Unlike the explicitly defined mathematical logic in traditional machine learning, the neural networks of deep learning models consist of multiple interconnected layers of "neurons," each performing specific mathematical operations. By adjusting the connection strengths (weights and biases) between neurons in adjacent layers, the network is progressively optimized to produce more accurate results. Neural networks did not achieve breakthrough progress until the late 2000s to early 2010s.

The Transformer model excels at processing sequential data. The paper "Attention is All You Need" published by Vaswani et al. in 2017 first proposed this architecture. The Transformer architecture was originally proposed to replace Recurrent Neural Network (RNN) Sequence-to-Sequence (Seq2Seq) models in Machine Translation (MT). Since then, Transformer has achieved significant progress across various subfields of machine learning.

The Attention Mechanism is a machine learning technique that guides deep learning models to prioritize the most relevant parts of input data. Mathematically, the attention weights computed by the attention mechanism reflect the relative importance of each segment in the input sequence to the current task. The core value of the Transformer lies in the Self-Attention Mechanism, which enables the model to "attend to" different tokens at different positions. The self-attention mechanism can compute relationships and dependencies between tokens (especially between distant tokens in text), which is its key advantage. Furthermore, the Transformer architecture supports parallelized processing, making it far more efficient than earlier methods. These characteristics enable LLMs to process datasets of unprecedented scale.

**Generative Artificial Intelligence** (GenAI) refers to models and related technologies capable of generating content such as text, images, audio, and video. Generative AI relies on complex machine learning models (i.e., deep learning models, algorithms that simulate the human brain's learning and decision-making processes). These models work by identifying and encoding patterns and relationships in massive amounts of data, then using this information to understand users' natural language requests or questions and generating relevant new content as responses. A Language Model (LM) is a **probability** distribution over natural language sentences or word strings, **estimated** from language samples.

A Large Language Model (LLM) is a type of foundation model trained on massive amounts of data, capable of understanding and generating natural language and other types of content to perform various tasks. The initial training of LLMs employs self-supervised learning. Among them, Large Language Models using the Decoder-Only architecture have driven the contemporary development of generative AI. A typical representative is GPT (Generative Pre-trained Transformer), whose third version GPT-3 and its subsequent improvement GPT-3.5 directly led to OpenAI's release of ChatGPT in November 2022. The Context Window refers to the maximum number of tokens that a model can use at once when generating text: early LLMs had shorter windows, while newer-generation LLMs already possess context windows of millions of tokens.

## References

* Vaswani A, et al. Attention is All You Need\[C]//Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook: Curran Associates, Inc., 2017.
* IBM. 2026 Machine Learning Guide\[EB/OL]. \[2026-05-10]. <https://www.ibm.com/think/topics/machine-learning>.

## Exercises

1. Briefly describe the difference between strong AI and weak AI. What scenarios are supervised learning, unsupervised learning, and reinforcement learning each suited for? Please give one example for each.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.bsdcn.org/ask/flat/chapter-20-artificial-intelligence/di-20.1-jie-ren-gong-zhi-neng-shu-yu-yu-gai-nian.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
