Skip to content

Latest commit

Ā 

History

History
408 lines (286 loc) Ā· 40.4 KB

File metadata and controls

408 lines (286 loc) Ā· 40.4 KB

The Development History of AI

[TOC]

Res

Related Topics

↗ History of Computing

Intro

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#

TheĀ history of artificial intelligenceĀ (AI) began inĀ antiquity, with myths, stories, and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The study of logic and formal reasoning from antiquity to the present led directly to the invention of theĀ programmable digital computerĀ in the 1940s, a machine based on abstract mathematical reasoning. This device and the ideas behind it inspired scientists to begin discussing the possibility of building anĀ electronic brain.

The field of AI research was founded at aĀ workshopĀ held on the campus ofĀ Dartmouth CollegeĀ in 1956.Ā Attendees of the workshop became the leaders of AI research for decades. Many of them predicted that machines as intelligent as humans would exist within a generation. TheĀ U.S. governmentĀ provided millions of dollars with the hope of making this vision come true.

Eventually, it became obvious that researchers had grossly underestimated the difficulty of this feat.Ā In 1974, criticism fromĀ James LighthillĀ and pressure from the U.S.A. Congress led the U.S. andĀ British GovernmentsĀ to stop funding undirected research into artificial intelligence. Seven years later, a visionary initiative by theĀ Japanese GovernmentĀ and the success ofĀ expert systemsĀ reinvigorated investment in AI, and by the late 1980s, the industry had grown into a billion-dollar enterprise. However, investors' enthusiasm waned in the 1990s, and the field was criticized in the press and avoided by industry (a period known as an "AI winter"). Nevertheless, research and funding continued to grow under other names.

In the early 2000s,Ā machine learningĀ was applied to a wide range of problems in academia and industry. The success was due to the availability of powerful computer hardware, the collection of immense data sets, and the application of solid mathematical methods. Soon after,Ā deep learningĀ proved to be a breakthrough technology, eclipsing all other methods. TheĀ transformer architectureĀ debuted in 2017 and was used to produce impressiveĀ generative AIĀ applications, amongst other use cases.

Investment in AIĀ boomedĀ in the 2020s. The recent AI boom, initiated by the development of transformer architecture, led to the rapid scaling and public releases ofĀ large language modelsĀ (LLMs) likeĀ ChatGPT. These models exhibit human-like traits of knowledge, attention, and creativity, and have been integrated into various sectors, fueling exponential investment in AI. However, concerns about the potential risks andĀ ethical implications of advanced AIĀ have also emerged, causing debate about the future of AI and its impact on society.

Review: Modern AI Fields

↗ Academics šŸŽ“ (In CS) ↗ 🌲 Road To CS

https://en.wikipedia.org/wiki/ACM_Computing_Classification_System https://www.acm.org/publications/class-2012 https://dl.acm.org/ccs ACM CCS 2012 Artificial intelligence āœ…

  • Natural language processing
    • Information extraction
    • Machine translation
    • Discourse, dialogue and pragmatics
    • Natural language generation
    • Speech recognition
    • Lexical semantics
    • Phonology / morphology
    • Language resources
  • Knowledge representation and reasoning
    • Description logics
    • Semantic networks
    • Nonmonotonic, default reasoning and belief revision
    • Probabilistic reasoning
    • Vagueness and fuzzy logic
    • Causal reasoning and diagnostics
    • Temporal reasoning
    • Cognitive robotics
    • Ontology engineering
    • Logic programming and answer set programming
    • Spatial and physical reasoning
    • Reasoning about belief and knowledge
  • Planning and scheduling
    • Planning for deterministic actions
    • Planning under uncertainty
    • Multi-agent planning
    • Planning with abstraction and generalization
    • Robotic planning
      • Evolutionary robotics
  • Search methodologies
    • Heuristic function construction
    • Discrete space search
    • Continuous space search
    • Randomized search
    • Game tree search
    • Abstraction and micro-operators
    • Search with partial observations
  • Control methods
    • Robotic planning
      • Evolutionary robotics
    • Computational control theory
    • Motion path planning
  • Philosophical/theoretical foundations of artificial intelligence
    • Cognitive science
    • Theory of mind
  • Distributed artificial intelligence
    • Multi-agent systems
    • Intelligent agents
    • Mobile agents
    • Cooperation and coordination
  • Computer vision
    • Computer vision tasks
      • Biometrics
      • Scene understanding
      • Activity recognition and understanding
      • Video summarization
      • Visual content-based indexing and retrieval
      • Visual inspection
      • Vision for robotics
      • Scene anomaly detection
    • Image and video acquisition
      • Camera calibration
      • Epipolar geometry
      • Computational photography
      • Hyperspectral imaging
      • Motion capture
      • 3D imaging
      • Active vision
    • Computer vision representations
      • Image representations
      • Shape representations
      • Appearance and texture representations
      • Hierarchical representations
    • Computer vision problems
      • Interest point and salient region detections
      • Image segmentation
      • Video segmentation
      • Shape inference
      • Object detection
      • Object recognition
      • Object identification
      • Tracking
      • Reconstruction
      • Matching
  • Machine learning āœ…
    • Learning paradigms
      • Supervised learning
        • Ranking
        • Learning to rank
        • Supervised learning by classification
        • Supervised learning by regression
        • Structured outputs
        • Cost-sensitive learning
      • Unsupervised learning
        • Cluster analysis
        • Anomaly detection
        • Mixture modeling
        • Topic modeling
        • Source separation
        • Motif discovery
        • Dimensionality reduction and manifold learning
      • Reinforcement learning
        • Sequential decision making
        • Inverse reinforcement learning
        • Apprenticeship learning
        • Multi-agent reinforcement learning
        • Adversarial learning
      • Multi-task learning
        • Transfer learning
        • Lifelong machine learning
        • Learning under covariate shift
    • Learning settings
      • Batch learning
      • Online learning settings
      • Learning from demonstrations
      • Learning from critiques
      • Learning from implicit feedback
      • Active learning settings
      • Semi-supervised learning settings
    • Machine learning approaches
      • Classification and regression trees
      • Kernel methods
        • Support vector machines
        • Gaussian processes
      • Neural networks
      • Logical and relational learning
        • Inductive logic learning
        • Statistical relational learning
      • Learning in probabilistic graphical models
        • Maximum likelihood modeling
        • Maximum entropy modeling
        • Maximum a posteriori modeling
        • Mixture models
        • Latent variable models
        • Bayesian network models
      • Learning linear models
        • Perceptron algorithm
      • Factorization methods
        • Non-negative matrix factorization
        • Factor analysis
        • Principal component analysis
        • Canonical correlation analysis
        • Latent Dirichlet allocation
      • Rule learning
      • Instance-based learning
      • Markov decision processes
      • Partially-observable Markov decision processes
      • Stochastic games
      • Learning latent representations
        • Deep belief networks
      • Bio-inspired approaches
        • Artificial life
        • Evolvable hardware
        • Genetic algorithms
        • Genetic programming
        • Evolutionary robotics
        • Generative and developmental approaches
    • Machine learning algorithms
      • Dynamic programming for Markov decision processes
        • Value iteration
        • Q-learning
        • Policy iteration
        • Temporal difference learning
        • Approximate dynamic programming methods
      • Ensemble methods
        • Boosting
        • Bagging
      • Spectral methods
      • Feature selection
      • Regularization
    • Cross-validation

Precursors & Foundations

Mythical, Fictional, and Speculative Precursors

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Mythical,_fictional,_and_speculative_precursors

Formal Reasoning

↗ Mathematical Logic (Foundations of Mathematics) ↗ Mechanized (Formal) Reasoning & Automated Reasoning (Inference)

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Formal_reasoning

Artificial intelligence is based on the assumption that the process of human thought can be mechanized. The study of mechanical—or "formal"—reasoning has a long history.Ā Chinese,Ā IndianĀ andĀ GreekĀ philosophers all developed structured methods of formal deduction by the first millennium BCE. Their ideas were developed over the centuries by philosophers such asĀ AristotleĀ (who gave a formal analysis of theĀ syllogism),Ā EuclidĀ (whoseĀ ElementsĀ was a model of formal reasoning),Ā al-Khwārizmī (who developedĀ algebraĀ and gave his name to the wordĀ algorithm) and EuropeanĀ scholasticĀ philosophers such asĀ William of OckhamĀ andĀ Duns Scotus.

Spanish philosopherĀ Ramon LlullĀ (1232–1315) developed severalĀ logical machinesĀ devoted to the production of knowledge by logical means;Ā Llull described his machines as mechanical entities that could combine basic and undeniable truths by simple logical operations, produced by the machine by mechanical meanings, in such ways as to produce all the possible knowledge.Ā Llull's work had a great influence onĀ Gottfried Leibniz, who redeveloped his ideas.

In the 17th century, Leibniz, Thomas Hobbes and René Descartes explored the possibility that all rational thought could be made as systematic as algebra or geometry. Hobbes famously wrote in Leviathan: "For reason ... is nothing but reckoning, that is adding and subtracting". Leibniz envisioned a universal language of reasoning, the characteristica universalis, which would reduce argumentation to calculation so that "there would be no more need of disputation between two philosophers than between two accountants. For it would suffice to take their pencils in hand, down to their slates, and to say each other (with a friend as witness, if they liked): Let us calculate." These philosophers had begun to articulate the physical symbol system hypothesis that would guide AI research.

The study of mathematical logic provided the essential breakthrough that made artificial intelligence seem plausible. The foundations had been set by such works as Boole's The Laws of Thought and Frege's Begriffsschrift. Building on Frege's system, Russell and Whitehead presented a formal treatment of the foundations of mathematics in their masterpiece, the Principia Mathematica in 1913. Inspired by Russell's success, David Hilbert challenged mathematicians of the 1920s and 30s to answer this fundamental question: "can all of mathematical reasoning be formalized?" His question was answered by Gödel's incompleteness proof, Turing's machine and Church's Lambda calculus.

Their answer was surprising in two ways. First, they proved that there were, in fact, limits to what mathematical logic could accomplish. But second (and more important for AI) their work suggested that, within these limits,Ā anyĀ form of mathematical reasoning could be mechanized. TheĀ Church-Turing thesisĀ implied that a mechanical device, shuffling symbols as simple asĀ 0Ā andĀ 1, could imitate any conceivable process of mathematical deduction.Ā The key insight was theĀ Turing machine—a simple theoretical construct that captured the essence of abstract symbol manipulation.Ā This invention would inspire a handful of scientists to begin discussing the possibility of thinking machines.

Information Technology & Computer Science

↗ 🌲 Road To CS ↗ History of Information Systems & Security Systems ↗ History of Computer Evolution & Devt. of Computer Org. & Arch.

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Computer_science

Calculating machines were designed or built in antiquity and throughout history by many people, includingĀ Gottfried Leibniz,Ā Joseph Marie Jacquard,Ā Charles Babbage,Ā Percy Ludgate,Ā Leonardo Torres Quevedo,Ā Vannevar Bush,Ā and others.Ā Ada LovelaceĀ speculated that Babbage's machine was "a thinking or ... reasoning machine", but warned "It is desirable to guard against the possibility of exaggerated ideas that arise as to the powers" of the machine.

The first modern computers were the massive machines of theĀ Second World WarĀ (such asĀ Konrad Zuse'sĀ Z3,Ā Alan Turing'sĀ Heath RobinsonĀ andĀ Colossus,Ā AtanasoffĀ andĀ Berry'sĀ ABC, andĀ ENIACĀ at theĀ University of Pennsylvania).Ā ENIACĀ was based on the theoretical foundation laid byĀ Alan TuringĀ and developed byĀ John von Neumann,Ā and proved to be the most influential.

šŸ‘‰ Birth of Artificial Intelligence (1941-1956)

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Birth_of_artificial_intelligence_(1941%E2%80%931956)

The earliest research into thinking machines was inspired by a confluence of ideas that became prevalent in the late 1930s, 1940s, and early 1950s. Recent research inĀ neurologyĀ had shown that the brain was an electrical network ofĀ neuronsĀ that fired in all-or-nothing pulses.Ā Norbert Wiener'sĀ cyberneticsĀ described control and stability in electrical networks.Ā Claude Shannon'sĀ information theoryĀ described digital signals (i.e., all-or-nothing signals).Ā Alan Turing'sĀ theory of computationĀ showed that any form of computation could be described digitally. The close relationship between these ideas suggested that it might be possible to construct an "electronic brain".

In the 1940s and 50s, a handful of scientists from a variety of fields (mathematics, psychology, engineering, economics and political science) explored several research directions that would be vital to later AI research.Ā Alan Turing was among the first people to seriously investigate the theoretical possibility of "machine intelligence". The field of "artificial intelligence research" was founded as an academic discipline in 1956.

Imitation Game & Turing Test

↗ Computability (Recursion) Theory - Turing Machine and R.E. Language (turing complete)

šŸ”— https://en.wikipedia.org/wiki/Turing_test

TheĀ Turing test, originally called theĀ imitation gameĀ byĀ Alan TuringĀ in 1949,Ā is a test of a machine's ability toĀ exhibit intelligent behaviourĀ equivalent to that of a human. In the test, a human evaluator judges a text transcript of aĀ natural-languageĀ conversation between a human and a machine. The evaluator tries to identify the machine, and the machine passes if the evaluator cannot reliably tell them apart. The results would not depend on the machine's ability toĀ answer questions correctly, only on how closely its answers resembled those of a human. Since the Turing test is a test of indistinguishability in performance capacity, the verbal version generalizes naturally to all of human performance capacity, verbal as well as nonverbal (robotic).

The test was introduced by Turing in his 1950 paper "Computing Machinery and Intelligence" while working at theĀ University of Manchester.Ā It opens with the words: "I propose to consider the question, 'Can machines think?'" Because "thinking" is difficult to define, Turing chooses to "replace the question by another, which is closely related to it and is expressed in relatively unambiguous words".Ā Turing describes the new form of the problem in terms of a three-personĀ party gameĀ called the "imitation game", in which an interrogator asks questions of a man and a woman in another room in order to determine the correct sex of the two players. Turing's new question is: "Are there imaginable digital computers which would do well in theĀ imitation game?"Ā This question, Turing believed, was one that could actually be answered. In the remainder of the paper, he argued against the major objections to the proposition that "machines can think".

Since Turing introduced his test, it has been highly influential in theĀ philosophy of artificial intelligence, resulting in substantial discussion and controversy, as well as criticism from philosophers likeĀ John Searle, who argue against the test's ability to detectĀ consciousness.

Neuroscience and Hebbian theory

Artificial Neural Networks

Cybernetic Robots

Game AI

Symbolic Reasoning and The Logic Theorist

Dartmouth Workshop

Cognitive Revolution

šŸ‘‰ Early Successes (1956–1974)

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Early_successes_(1956%E2%80%931974)

There were many successful programs and new directions in the late 50s and 1960s. Here names the ones most influential:

Reasoning, Planning and Problem Solving as Search

Natural Language

An important goal of AI research is to allow computers to communicate inĀ natural languagesĀ like English. An early success wasĀ Daniel Bobrow's programĀ STUDENT, which could solve high school algebra word problems.

AĀ semantic netĀ represents concepts (e.g. "house", "door") as nodes, and relations among concepts as links between the nodes (e.g. "has-a"). The first AI program to use a semantic net was written by Ross Quillian and the most successful (and controversial) version wasĀ Roger Schank'sĀ Conceptual dependency theory.

|450 Example of a semantic network

Joseph Weizenbaum'sĀ ELIZAĀ could carry out conversations that were so realistic that users occasionally were fooled into thinking they were communicating with a human being and not a computer program (seeĀ ELIZA effect). But in fact, ELIZA simply gave aĀ canned responseĀ or repeated back what was said to it, rephrasing its response with a few grammar rules. ELIZA was the firstĀ chatbot.

Micro-Worlds

Perceptrons and Early Neural Networks

šŸ‘‰ 1st AI Winter (1974–1980)

šŸ‘‰ 1st AI Boom (1980–1987)

šŸ‘‰ New Directions In The 1980s

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#New_directions_in_the_1980s

Although symbolicĀ knowledge representationĀ andĀ logical reasoningĀ produced useful applications in the 80s and received massive amounts of funding, it was still unable to solve problems inĀ perception,Ā robotics,Ā learningĀ andĀ common sense. A small number of scientists and engineers began to doubt that the symbolic approach would ever be sufficient for these tasks and developed other approaches, such as "connectionism",Ā robotics,Ā "soft" computingĀ andĀ reinforcement learning.Ā Nils NilssonĀ called these approaches "sub-symbolic".

Revival of Neural Networks: "Connectionism"

Robotics and Embodied Reason

Soft Computing and Probabilistic Reasoning

Reinforcement Learning

šŸ‘‰ 2nd AI Winter (1990s)

šŸ‘‰ Big Data, Deep Learning, AGI (2005–2017)

↗ Deep Learning (Neural Networks) /The Technical Evolution of Neural Networks

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Big_data,_deep_learning,_AGI_(2005%E2%80%932017)

In the first decades of the 21st century, access to large amounts of data (known as "big data"),Ā cheaper and faster computersĀ and advancedĀ machine learningĀ techniques were successfully applied to many problems throughout the economy. A turning point was the success ofĀ deep learningĀ around 2012 which improved the performance of machine learning on many tasks, including image and video processing, text analysis, and speech recognition.Ā Investment in AI increased along with its capabilities, and by 2016, the market for AI-related products, hardware, and software reached more than $8 billion, and theĀ New York TimesĀ reported that interest in AI had reached a "frenzy".

In 2002,Ā Ben GoertzelĀ and others became concerned that AI had largely abandoned its original goal of producing versatile, fully intelligent machines, and argued in favor of more direct research intoĀ artificial general intelligenceĀ (AGI). By the mid-2010s several companies and institutions had been founded to pursue artificial general intelligence, such asĀ OpenAIĀ andĀ Google'sĀ DeepMind. During the same period, new insights intoĀ superintelligenceĀ raised concerns that AI was anĀ existential threat. The risks and unintended consequences of AI technology became an area of serious academic research after 2016.

Big Data and Big Machines

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Big_data_and_big_machines

The success of machine learning in the 2000s depended on the availability of vast amounts of training data and faster computers. Russell and Norvig wrote that the "improvement in performance obtained by increasing the size of the data set by two or three orders of magnitude outweighs any improvement that can be made by tweaking the algorithm." Geoffrey HintonĀ recalled that back in the 90s, the problem was that "our labeled datasets were thousands of times too small. [And] our computers were millions of times too slow."Ā This was no longer true by 2010.

The most useful data in the 2000s came from curated, labeled data sets created specifically for machine learning and AI. In 2007, a group atĀ UMass AmherstĀ releasedĀ Labeled Faces in the Wild, an annotated set of images of faces that was widely used to train and testĀ face recognitionĀ systems for the next several decades.Ā Fei-Fei LiĀ developedĀ ImageNet, a database of three million images captioned by volunteers using theĀ Amazon Mechanical Turk. Released in 2009, it was a useful body of training data and a benchmark for testing for the next generation of image processing systems.Ā Google releasedĀ word2vecĀ in 2013 as an open source resource. It used large amounts of data text scraped from the internet andĀ word embeddingĀ to create a numeric vector to represent each word. Users were surprised at how well it was able to capture word meanings, for example, ordinary vector addition would give equivalences like China + River = Yangtze or London āˆ’ England + France = Paris.Ā This database in particular would be essential for the development ofĀ large language modelsĀ in the late 2010s.

The explosive growth of the internet gave machine learning programs access to billions of pages of text and images that could beĀ scraped. And, for specific problems, large privately held databases contained the relevant data.Ā McKinsey Global InstituteĀ reported that "by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data".Ā This collection of information was known in the 2000s asĀ big data.

In aĀ Jeopardy!Ā exhibition match in February 2011,Ā IBM'sĀ question answering systemĀ WatsonĀ defeated the two bestĀ Jeopardy!Ā champions,Ā Brad RutterĀ andĀ Ken Jennings, by a significant margin.Ā Watson's expertise would have been impossible without the information available on the internet.

Deep Learning - 2012 AlexNet

↗ Artificial Neural Networks (ANN) & Deep Learning Methods

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Deep_learning

In 2012, AlexNet, a deep learning model,[am] developed by Alex Krizhevsky, won the ImageNet Large Scale Visual Recognition Challenge, with significantly fewer errors than the second-place winner.[272][206] Krizhevsky worked with Geoffrey Hinton at the University of Toronto.[an] This was a turning point in machine learning: over the next few years dozens of other approaches to image recognition were abandoned in favor of deep learning.[264]

Deep learning uses a multi-layer perceptron. Although this architecture has been known since the 60s, getting it to work requires powerful hardware and large amounts of training data.[273] Before these became available, improving performance of image processing systems required hand-crafted ad hoc features that were difficult to implement.[273] Deep learning was simpler and more general.[ao]

Deep learning was applied to dozens of problems over the next few years (such as speech recognition, machine translation, medical diagnosis, and game playing). In every case it showed enormous gains in performance.[264] Investment and interest in AI boomed as a result.[264]

The Alignment Problem

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#The_alignment_problem

Artificial General Intelligence Research

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Artificial_general_intelligence_research

šŸ‘‰ From NLP to AGI: Boom of LLM (2017~)

↗ Deep Learning (Neural Networks) /The Technical Evolution of Neural Networks ↗ Natural Language Processing (NLP) /šŸ“œ A Brief History of The Technical Evolution Of Language Models ↗ LLM (Large Language Model) / LLM Milestone Papers ↗ Transformers

šŸ”— https://en.wikipedia.org/wiki/History_of_artificial_intelligence#Large_language_models,_AI_boom_(2017%E2%80%93present)

The AI boom started with the initial development of key architectures and algorithms such as theĀ transformer architectureĀ in 2017, leading to the scaling and development of large language models exhibiting human-like traits of knowledge, attention, and creativity. The new AI era began since 2020, with the public release of scaledĀ large language modelsĀ (LLMs) such asĀ ChatGPT.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv. https://doi.org/10.48550/arXiv.2303.18223

In Figure 2, we describe the evolution process of language models in terms of the task solving capacity. At first, statistical language models mainly assisted in some specific tasks (e.g., retrieval or speech tasks), in which the predicted or estimated probabilities can enhance the performance of task-specific approaches. Subsequently, neural language models focused on learning task-agnostic representations (e.g., features), aiming to reduce the efforts for human feature engineering. Furthermore, pre-trained language models learned context-aware representations that can be optimized according to downstream tasks. For the latest generation of language model, LLMs are enhanced by exploring the scaling effect on model capacity, which can be considered as general-purpose task solvers. To summarize, in the evolution process, the task scope that can be solved by language models have been greatly extended, and the task performance attained by language models have been significantly enhanced.

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., … Wen, J.-R. (2025). A Survey of Large Language Models (arXiv:2303.18223). arXiv.
https://doi.org/10.48550/arXiv.2303.18223

Ref