The Development History of AI

[TOC]

Res

Intro

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#

The history of artificial intelligence (AI) began in antiquity, with myths, stories, and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The study of logic and formal reasoning from antiquity to the present led directly to the invention of the programmable digital computer in the 1940s, a machine based on abstract mathematical reasoning. This device and the ideas behind it inspired scientists to begin discussing the possibility of building an electronic brain.

The field of AI research was founded at a workshop held on the campus of Dartmouth College in 1956. Attendees of the workshop became the leaders of AI research for decades. Many of them predicted that machines as intelligent as humans would exist within a generation. The U.S. government provided millions of dollars with the hope of making this vision come true.

Eventually, it became obvious that researchers had grossly underestimated the difficulty of this feat. In 1974, criticism from James Lighthill and pressure from the U.S.A. Congress led the U.S. and British Governments to stop funding undirected research into artificial intelligence. Seven years later, a visionary initiative by the Japanese Government and the success of expert systems reinvigorated investment in AI, and by the late 1980s, the industry had grown into a billion-dollar enterprise. However, investors' enthusiasm waned in the 1990s, and the field was criticized in the press and avoided by industry (a period known as an "AI winter"). Nevertheless, research and funding continued to grow under other names.

In the early 2000s, machine learning was applied to a wide range of problems in academia and industry. The success was due to the availability of powerful computer hardware, the collection of immense data sets, and the application of solid mathematical methods. Soon after, deep learning proved to be a breakthrough technology, eclipsing all other methods. The transformer architecture debuted in 2017 and was used to produce impressive generative AI applications, amongst other use cases.

Investment in AI boomed in the 2020s. The recent AI boom, initiated by the development of transformer architecture, led to the rapid scaling and public releases of large language models (LLMs) like ChatGPT. These models exhibit human-like traits of knowledge, attention, and creativity, and have been integrated into various sectors, fueling exponential investment in AI. However, concerns about the potential risks and ethical implications of advanced AI have also emerged, causing debate about the future of AI and its impact on society.

Review: Modern AI Fields

↗ Academics 🎓 (In CS) ↗ 🌲 Road To CS

https://en.wikipedia.org/wiki/ACM_Computing_Classification_System https://www.acm.org/publications/class-2012 https://dl.acm.org/ccs ACM CCS 2012 Artificial intelligence ✅

Natural language processing
- Information extraction
- Machine translation
- Discourse, dialogue and pragmatics
- Natural language generation
- Speech recognition
- Lexical semantics
- Phonology / morphology
- Language resources
Knowledge representation and reasoning
- Description logics
- Semantic networks
- Nonmonotonic, default reasoning and belief revision
- Probabilistic reasoning
- Vagueness and fuzzy logic
- Causal reasoning and diagnostics
- Temporal reasoning
- Cognitive robotics
- Ontology engineering
- Logic programming and answer set programming
- Spatial and physical reasoning
- Reasoning about belief and knowledge
Planning and scheduling
- Planning for deterministic actions
- Planning under uncertainty
- Multi-agent planning
- Planning with abstraction and generalization
- Robotic planning
  - Evolutionary robotics
Search methodologies
- Heuristic function construction
- Discrete space search
- Continuous space search
- Randomized search
- Game tree search
- Abstraction and micro-operators
- Search with partial observations
Control methods
- Robotic planning
  - Evolutionary robotics
- Computational control theory
- Motion path planning
Philosophical/theoretical foundations of artificial intelligence
- Cognitive science
- Theory of mind
Distributed artificial intelligence
- Multi-agent systems
- Intelligent agents
- Mobile agents
- Cooperation and coordination
Computer vision
- Computer vision tasks
  - Biometrics
  - Scene understanding
  - Activity recognition and understanding
  - Video summarization
  - Visual content-based indexing and retrieval
  - Visual inspection
  - Vision for robotics
  - Scene anomaly detection
- Image and video acquisition
  - Camera calibration
  - Epipolar geometry
  - Computational photography
  - Hyperspectral imaging
  - Motion capture
  - 3D imaging
  - Active vision
- Computer vision representations
  - Image representations
  - Shape representations
  - Appearance and texture representations
  - Hierarchical representations
- Computer vision problems
  - Interest point and salient region detections
  - Image segmentation
  - Video segmentation
  - Shape inference
  - Object detection
  - Object recognition
  - Object identification
  - Tracking
  - Reconstruction
  - Matching
Machine learning ✅
- Learning paradigms
  - Supervised learning
    - Ranking
    - Learning to rank
    - Supervised learning by classification
    - Supervised learning by regression
    - Structured outputs
    - Cost-sensitive learning
  - Unsupervised learning
    - Cluster analysis
    - Anomaly detection
    - Mixture modeling
    - Topic modeling
    - Source separation
    - Motif discovery
    - Dimensionality reduction and manifold learning
  - Reinforcement learning
    - Sequential decision making
    - Inverse reinforcement learning
    - Apprenticeship learning
    - Multi-agent reinforcement learning
    - Adversarial learning
  - Multi-task learning
    - Transfer learning
    - Lifelong machine learning
    - Learning under covariate shift
- Learning settings
  - Batch learning
  - Online learning settings
  - Learning from demonstrations
  - Learning from critiques
  - Learning from implicit feedback
  - Active learning settings
  - Semi-supervised learning settings
- Machine learning approaches
  - Classification and regression trees
  - Kernel methods
    - Support vector machines
    - Gaussian processes
  - Neural networks
  - Logical and relational learning
    - Inductive logic learning
    - Statistical relational learning
  - Learning in probabilistic graphical models
    - Maximum likelihood modeling
    - Maximum entropy modeling
    - Maximum a posteriori modeling
    - Mixture models
    - Latent variable models
    - Bayesian network models
  - Learning linear models
    - Perceptron algorithm
  - Factorization methods
    - Non-negative matrix factorization
    - Factor analysis
    - Principal component analysis
    - Canonical correlation analysis
    - Latent Dirichlet allocation
  - Rule learning
  - Instance-based learning
  - Markov decision processes
  - Partially-observable Markov decision processes
  - Stochastic games
  - Learning latent representations
    - Deep belief networks
  - Bio-inspired approaches
    - Artificial life
    - Evolvable hardware
    - Genetic algorithms
    - Genetic programming
    - Evolutionary robotics
    - Generative and developmental approaches
- Machine learning algorithms
  - Dynamic programming for Markov decision processes
    - Value iteration
    - Q-learning
    - Policy iteration
    - Temporal difference learning
    - Approximate dynamic programming methods
  - Ensemble methods
    - Boosting
    - Bagging
  - Spectral methods
  - Feature selection
  - Regularization
- Cross-validation

Precursors & Foundations

Mythical, Fictional, and Speculative Precursors

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Mythical,_fictional,_and_speculative_precursors

Formal Reasoning

↗ Mathematical Logic (Foundations of Mathematics) ↗ Mechanized (Formal) Reasoning & Automated Reasoning (Inference)

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Formal_reasoning

Artificial intelligence is based on the assumption that the process of human thought can be mechanized. The study of mechanical—or "formal"—reasoning has a long history. Chinese, Indian and Greek philosophers all developed structured methods of formal deduction by the first millennium BCE. Their ideas were developed over the centuries by philosophers such as Aristotle (who gave a formal analysis of the syllogism), Euclid (whose Elements was a model of formal reasoning), al-Khwārizmī (who developed algebra and gave his name to the word algorithm) and European scholastic philosophers such as William of Ockham and Duns Scotus.

Spanish philosopher Ramon Llull (1232–1315) developed several logical machines devoted to the production of knowledge by logical means; Llull described his machines as mechanical entities that could combine basic and undeniable truths by simple logical operations, produced by the machine by mechanical meanings, in such ways as to produce all the possible knowledge. Llull's work had a great influence on Gottfried Leibniz, who redeveloped his ideas.

In the 17th century, Leibniz, Thomas Hobbes and René Descartes explored the possibility that all rational thought could be made as systematic as algebra or geometry. Hobbes famously wrote in Leviathan: "For reason ... is nothing but reckoning, that is adding and subtracting". Leibniz envisioned a universal language of reasoning, the characteristica universalis, which would reduce argumentation to calculation so that "there would be no more need of disputation between two philosophers than between two accountants. For it would suffice to take their pencils in hand, down to their slates, and to say each other (with a friend as witness, if they liked): Let us calculate." These philosophers had begun to articulate the physical symbol system hypothesis that would guide AI research.

The study of mathematical logic provided the essential breakthrough that made artificial intelligence seem plausible. The foundations had been set by such works as Boole's The Laws of Thought and Frege's Begriffsschrift. Building on Frege's system, Russell and Whitehead presented a formal treatment of the foundations of mathematics in their masterpiece, the Principia Mathematica in 1913. Inspired by Russell's success, David Hilbert challenged mathematicians of the 1920s and 30s to answer this fundamental question: "can all of mathematical reasoning be formalized?" His question was answered by Gödel's incompleteness proof, Turing's machine and Church's Lambda calculus.

Their answer was surprising in two ways. First, they proved that there were, in fact, limits to what mathematical logic could accomplish. But second (and more important for AI) their work suggested that, within these limits, any form of mathematical reasoning could be mechanized. The Church-Turing thesis implied that a mechanical device, shuffling symbols as simple as 0 and 1, could imitate any conceivable process of mathematical deduction. The key insight was the Turing machine—a simple theoretical construct that captured the essence of abstract symbol manipulation. This invention would inspire a handful of scientists to begin discussing the possibility of thinking machines.

Information Technology & Computer Science

↗ 🌲 Road To CS ↗ History of Information Systems & Security Systems ↗ History of Computer Evolution & Devt. of Computer Org. & Arch.

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Computer_science

Calculating machines were designed or built in antiquity and throughout history by many people, including Gottfried Leibniz, Joseph Marie Jacquard, Charles Babbage, Percy Ludgate, Leonardo Torres Quevedo, Vannevar Bush, and others. Ada Lovelace speculated that Babbage's machine was "a thinking or ... reasoning machine", but warned "It is desirable to guard against the possibility of exaggerated ideas that arise as to the powers" of the machine.

The first modern computers were the massive machines of the Second World War (such as Konrad Zuse's Z3, Alan Turing's Heath Robinson and Colossus, Atanasoff and Berry's ABC, and ENIAC at the University of Pennsylvania). ENIAC was based on the theoretical foundation laid by Alan Turing and developed by John von Neumann, and proved to be the most influential.

👉 Birth of Artificial Intelligence (1941-1956)

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Birth_of_artificial_intelligence_(1941%E2%80%931956)

The earliest research into thinking machines was inspired by a confluence of ideas that became prevalent in the late 1930s, 1940s, and early 1950s. Recent research in neurology had shown that the brain was an electrical network of neurons that fired in all-or-nothing pulses. Norbert Wiener's cybernetics described control and stability in electrical networks. Claude Shannon's information theory described digital signals (i.e., all-or-nothing signals). Alan Turing's theory of computation showed that any form of computation could be described digitally. The close relationship between these ideas suggested that it might be possible to construct an "electronic brain".

In the 1940s and 50s, a handful of scientists from a variety of fields (mathematics, psychology, engineering, economics and political science) explored several research directions that would be vital to later AI research. Alan Turing was among the first people to seriously investigate the theoretical possibility of "machine intelligence". The field of "artificial intelligence research" was founded as an academic discipline in 1956.

Imitation Game & Turing Test

↗ Computability (Recursion) Theory - Turing Machine and R.E. Language (turing complete)

🔗 https://en.wikipedia.org/wiki/Turing_test

The Turing test, originally called the imitation game by Alan Turing in 1949, is a test of a machine's ability to exhibit intelligent behaviour equivalent to that of a human. In the test, a human evaluator judges a text transcript of a natural-language conversation between a human and a machine. The evaluator tries to identify the machine, and the machine passes if the evaluator cannot reliably tell them apart. The results would not depend on the machine's ability to answer questions correctly, only on how closely its answers resembled those of a human. Since the Turing test is a test of indistinguishability in performance capacity, the verbal version generalizes naturally to all of human performance capacity, verbal as well as nonverbal (robotic).

The test was introduced by Turing in his 1950 paper "Computing Machinery and Intelligence" while working at the University of Manchester. It opens with the words: "I propose to consider the question, 'Can machines think?'" Because "thinking" is difficult to define, Turing chooses to "replace the question by another, which is closely related to it and is expressed in relatively unambiguous words". Turing describes the new form of the problem in terms of a three-person party game called the "imitation game", in which an interrogator asks questions of a man and a woman in another room in order to determine the correct sex of the two players. Turing's new question is: "Are there imaginable digital computers which would do well in the imitation game?" This question, Turing believed, was one that could actually be answered. In the remainder of the paper, he argued against the major objections to the proposition that "machines can think".

Since Turing introduced his test, it has been highly influential in the philosophy of artificial intelligence, resulting in substantial discussion and controversy, as well as criticism from philosophers like John Searle, who argue against the test's ability to detect consciousness.

Neuroscience and Hebbian theory

Artificial Neural Networks

Cybernetic Robots

Game AI

Symbolic Reasoning and The Logic Theorist

Dartmouth Workshop

Cognitive Revolution

👉 Early Successes (1956–1974)

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Early_successes_(1956%E2%80%931974)

There were many successful programs and new directions in the late 50s and 1960s. Here names the ones most influential:

Reasoning, Planning and Problem Solving as Search

Natural Language

An important goal of AI research is to allow computers to communicate in natural languages like English. An early success was Daniel Bobrow's program STUDENT, which could solve high school algebra word problems.

A semantic net represents concepts (e.g. "house", "door") as nodes, and relations among concepts as links between the nodes (e.g. "has-a"). The first AI program to use a semantic net was written by Ross Quillian and the most successful (and controversial) version was Roger Schank's Conceptual dependency theory.

Example of a semantic network

Joseph Weizenbaum's ELIZA could carry out conversations that were so realistic that users occasionally were fooled into thinking they were communicating with a human being and not a computer program (see ELIZA effect). But in fact, ELIZA simply gave a canned response or repeated back what was said to it, rephrasing its response with a few grammar rules. ELIZA was the first chatbot.

Micro-Worlds

Perceptrons and Early Neural Networks

👉 1st AI Winter (1974–1980)

👉 1st AI Boom (1980–1987)

👉 New Directions In The 1980s

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#New_directions_in_the_1980s

Although symbolic knowledge representation and logical reasoning produced useful applications in the 80s and received massive amounts of funding, it was still unable to solve problems in perception, robotics, learning and common sense. A small number of scientists and engineers began to doubt that the symbolic approach would ever be sufficient for these tasks and developed other approaches, such as "connectionism", robotics, "soft" computing and reinforcement learning. Nils Nilsson called these approaches "sub-symbolic".

Revival of Neural Networks: "Connectionism"

Robotics and Embodied Reason

Soft Computing and Probabilistic Reasoning

Reinforcement Learning

👉 2nd AI Winter (1990s)

👉 Big Data, Deep Learning, AGI (2005–2017)

↗ Deep Learning (Neural Networks) /The Technical Evolution of Neural Networks

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Big_data,_deep_learning,_AGI_(2005%E2%80%932017)

In the first decades of the 21st century, access to large amounts of data (known as "big data"), cheaper and faster computers and advanced machine learning techniques were successfully applied to many problems throughout the economy. A turning point was the success of deep learning around 2012 which improved the performance of machine learning on many tasks, including image and video processing, text analysis, and speech recognition. Investment in AI increased along with its capabilities, and by 2016, the market for AI-related products, hardware, and software reached more than $8 billion, and the New York Times reported that interest in AI had reached a "frenzy".

In 2002, Ben Goertzel and others became concerned that AI had largely abandoned its original goal of producing versatile, fully intelligent machines, and argued in favor of more direct research into artificial general intelligence (AGI). By the mid-2010s several companies and institutions had been founded to pursue artificial general intelligence, such as OpenAI and Google's DeepMind. During the same period, new insights into superintelligence raised concerns that AI was an existential threat. The risks and unintended consequences of AI technology became an area of serious academic research after 2016.

Big Data and Big Machines

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Big_data_and_big_machines

The success of machine learning in the 2000s depended on the availability of vast amounts of training data and faster computers. Russell and Norvig wrote that the "improvement in performance obtained by increasing the size of the data set by two or three orders of magnitude outweighs any improvement that can be made by tweaking the algorithm." Geoffrey Hinton recalled that back in the 90s, the problem was that "our labeled datasets were thousands of times too small. [And] our computers were millions of times too slow." This was no longer true by 2010.

The most useful data in the 2000s came from curated, labeled data sets created specifically for machine learning and AI. In 2007, a group at UMass Amherst released Labeled Faces in the Wild, an annotated set of images of faces that was widely used to train and test face recognition systems for the next several decades. Fei-Fei Li developed ImageNet, a database of three million images captioned by volunteers using the Amazon Mechanical Turk. Released in 2009, it was a useful body of training data and a benchmark for testing for the next generation of image processing systems. Google released word2vec in 2013 as an open source resource. It used large amounts of data text scraped from the internet and word embedding to create a numeric vector to represent each word. Users were surprised at how well it was able to capture word meanings, for example, ordinary vector addition would give equivalences like China + River = Yangtze or London − England + France = Paris. This database in particular would be essential for the development of large language models in the late 2010s.

The explosive growth of the internet gave machine learning programs access to billions of pages of text and images that could be scraped. And, for specific problems, large privately held databases contained the relevant data. McKinsey Global Institute reported that "by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data". This collection of information was known in the 2000s as big data.

In a Jeopardy! exhibition match in February 2011, IBM's question answering system Watson defeated the two best Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin. Watson's expertise would have been impossible without the information available on the internet.

Deep Learning - 2012 AlexNet

↗ Artificial Neural Networks (ANN) & Deep Learning Methods

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Deep_learning

In 2012, AlexNet, a deep learning model,[am] developed by Alex Krizhevsky, won the ImageNet Large Scale Visual Recognition Challenge, with significantly fewer errors than the second-place winner.[272][206] Krizhevsky worked with Geoffrey Hinton at the University of Toronto.[an] This was a turning point in machine learning: over the next few years dozens of other approaches to image recognition were abandoned in favor of deep learning.[264]

Deep learning uses a multi-layer perceptron. Although this architecture has been known since the 60s, getting it to work requires powerful hardware and large amounts of training data.[273] Before these became available, improving performance of image processing systems required hand-crafted ad hoc features that were difficult to implement.[273] Deep learning was simpler and more general.[ao]

Deep learning was applied to dozens of problems over the next few years (such as speech recognition, machine translation, medical diagnosis, and game playing). In every case it showed enormous gains in performance.[264] Investment and interest in AI boomed as a result.[264]

The Alignment Problem

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#The_alignment_problem

Artificial General Intelligence Research

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Artificial_general_intelligence_research

👉 From NLP to AGI: Boom of LLM (2017~)

↗ Deep Learning (Neural Networks) /The Technical Evolution of Neural Networks ↗ Natural Language Processing (NLP) /📜 A Brief History of The Technical Evolution Of Language Models ↗ LLM (Large Language Model) / LLM Milestone Papers ↗ Transformers

🔗 https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Large_language_models,_AI_boom_(2017%E2%80%93present)

The AI boom started with the initial development of key architectures and algorithms such as the transformer architecture in 2017, leading to the scaling and development of large language models exhibiting human-like traits of knowledge, attention, and creativity. The new AI era began since 2020, with the public release of scaled large language models (LLMs) such as ChatGPT.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv. https://doi.org/10.48550/arXiv.2303.18223

In Figure 2, we describe the evolution process of language models in terms of the task solving capacity. At first, statistical language models mainly assisted in some specific tasks (e.g., retrieval or speech tasks), in which the predicted or estimated probabilities can enhance the performance of task-specific approaches. Subsequently, neural language models focused on learning task-agnostic representations (e.g., features), aiming to reduce the efforts for human feature engineering. Furthermore, pre-trained language models learned context-aware representations that can be optimized according to downstream tasks. For the latest generation of language model, LLMs are enhanced by exploring the scaling effect on model capacity, which can be considered as general-purpose task solvers. To summarize, in the evolution process, the task scope that can be solved by language models have been greatly extended, and the task performance attained by language models have been significantly enhanced.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv.
https://doi.org/10.48550/arXiv.2303.18223

FilesExpand file tree

The Development History of AI.md

Latest commit

History