Skip to content

Latest commit

 

History

History
124 lines (71 loc) · 20.6 KB

File metadata and controls

124 lines (71 loc) · 20.6 KB

Q13: Neurally_Inspired_AI

Corpus Findings Informing AI Architectures

(a) Biological Timescale Ratios: Naturalistic neuroscience reveals that the brain processes continuous sensory input through a hierarchy of intrinsic timescales, mirroring the statistical structure of the natural world [1-3]. Sensory areas (e.g., primary visual cortex) operate on fast timescales to process rapidly changing inputs, while higher-order associative areas, such as the default mode network (DMN) and prefrontal cortex, integrate information over long, slow timescales spanning seconds to minutes [1, 2, 4]. In AI, incorporating multi-scale recurrent or attention architectures with hierarchical temporal receptive windows could directly mimic this mechanism. Furthermore, intrinsic timescales in the brain are dynamically modulated by cognitive states; for instance, spatial attention actively elongates the slow timescale of neural processing by increasing the efficacy of recurrent interactions, a mechanism that directly correlates with reaction times and optimized sensory processing [5-8].

(b) Event-Segmented Processing: The brain does not process information as a continuous, undifferentiated stream, but rather chunks it into discrete, meaningful "events" [9]. According to Event Segmentation Theory, this segmentation is an automatic consequence of predictive processing: event boundaries are formed exactly when ongoing top-down predictions fail (i.e., when prediction error spikes) due to changes in the situation, such as shifts in character, location, or goals [10, 11]. Furthermore, these events are stored as interconnected narrative networks. Events with high "centrality"—those possessing the highest number of semantic or causal connections to other events—trigger the strongest hippocampal encoding signals at their boundaries and are prioritized for later recall [12-14]. AI systems could adopt this by abandoning fixed-size context windows in favor of dynamic, prediction-error-triggered context chunking, utilizing graph-based memory structures that weight nodes based on their causal centrality.

(c) Hippocampal Replay-Inspired Memory & Memory Updating: The brain dynamically integrates past and present knowledge, allowing memories to be completely restructured after the fact. When individuals encounter new information that changes the context of a past event (such as a plot twist in a movie), the DMN and hippocampus retrospectively update the neural representation of the previously encoded memory to align with the new interpretation [15-17]. This indicates that memory in an AI system should not be a static, append-only database, but a malleable architecture where retrieved representations are actively transformed and reintegrated when new semantic information contradicts past context [18].

(d) Attention-Modulated Feedback & Cross-Modal Interaction: The human brain utilizes top-down dorsal attention networks to modulate ventral sensory streams, directing processing power based on volitional goals rather than just bottom-up stimulus salience [19, 20]. Importantly, cross-modal integration in the brain happens very early; for example, acoustic information is represented in early visual regions [21]. Modern two-branch deep neural networks trained on audiovisual data typically fuse modalities only at the highest layers, failing to capture the early cross-modal interactions observed in human brains [22]. AI architectures could benefit from early, low-level cross-attention between modality-specific encoders.

Robust Computational Principles vs. Species/Tissue-Specific Limitations

Robust, Transferable Principles:

  • Hierarchical Predictive Coding: Current language models typically optimize to predict only the very next adjacent word [23-25]. However, brain mapping demonstrates that the human cortex continuously predicts a hierarchy of representations, with higher-level cortical areas projecting forecasts much further into the future and at higher levels of semantic abstraction [23, 26].
  • Performance-Optimized Network Topologies: Principles of structural hierarchical processing are highly robust. Studies show that when artificial hierarchical neural networks are optimized purely for performance on challenging, high-variation categorization tasks, their internal layer representations naturally converge to match the neural response patterns of the primate inferior temporal (IT) and V4 cortices, even without being explicitly trained on neural data [27-30].

Species/Tissue-Specific Constraints (Non-Transferable):

  • Biological Hardware Gradients: The physical mechanisms dictating timescale hierarchies in the biological brain—such as spine density on pyramidal dendritic trees, gray matter myelination, and the expression gradients of NMDA and GABA receptor genes—are strictly tissue-specific and do not directly map to artificial silicon architectures [31].
  • Developmental Anatomical Constraints: The infant brain lacks the hierarchical gradient of timescales seen in adults, defaulting instead to longer, coarser event segmentation even in early visual cortices, likely due to immature neural pathways [9, 32, 33]. This reflects biological developmental limitations rather than an optimal computational end-state.
  • Hemodynamic and Physiological Artifacts: Principles derived strictly from BOLD signal properties (like the inherent sluggishness of blood-oxygen-level-dependent responses or interference from cardiac/respiratory noise) reflect biological maintenance systems, not cognitive computations [34, 35].

The "Naturalistic-Inspired Foundation Model" vs. Brain-Agnostic Large Models

A naturalistic-inspired foundation model would fundamentally shift away from standard transformer-based, next-token prediction over fixed-size context windows. Instead, it would feature a continuous, multi-scale recurrent architecture that generates parallel predictions at multiple levels of abstraction (from immediate phonetic/visual frames to long-range semantic outcomes) [23, 36].

  • Efficiency: Instead of maintaining dense attention across uniformly sampled tokens, this model would leverage event-segmented processing. By evaluating its own prediction errors, the model would dynamically chunk sequences, discarding redundant frames during highly predictable states and only committing compressed, discrete "event boundaries" to memory [10, 11, 14]. This would massively reduce compute costs for processing infinite-horizon continuous streams (e.g., video or agentic environments) compared to standard attention mechanisms.
  • Alignment: Alignment in brain-agnostic models often relies on late-stage RLHF, which can be superficial. A naturalistic model would feature retroactive memory updating, actively rewriting the latent representations of its past context when it receives corrective instructions or encounters a "plot twist" [15, 16]. This ensures deep, consistent alignment across the model’s entire temporal context, reducing hallucinations caused by conflicting prior context.
  • Interpretability: Because the model would encode memory as a semantic and causal network of discrete events (mimicking the DMN and hippocampus), researchers could map the model's memory exactly like a graph [12, 13, 37]. Analyzing a node's "centrality" would allow developers to perfectly trace which past events are most heavily influencing the model's current decisions, offering transparent interpretability that black-box LLMs lack.
  • Out-of-Distribution (OOD) Generalization: Brain-agnostic models often fail OOD because they overfit to surface-level statistics [38]. A naturalistic model, driven by long-range hierarchical predictive coding [23, 26] and early cross-modal fusion [21, 22], would be anchored in deep causal representations of the environment. By matching the multiscale statistics of the natural world [3], the model would exhibit the robust adaptability seen in biological brains, easily generalizing across dynamic, noisy, and unconstrained real-world environments.

References

[1] (src:d22f0100) The brain processes information and coordinates behavioral sequences over a wide range of timescales1–3. While sensory inputs can be processed as fast as tens of milliseconds4–7, cognitive pro-cesses such as decision-making or working memory require inte-grating information over slower timescales fr...

[2] (src:d22f0100) association cortex, and slower in prefrontal cortical areas11. The hierarchy of intrinsic timescales is observed across different recording modalities including spiking activity11,12, intracranial elec-trocorticography (ECoG)13,14, and functional magnetic resonance imaging (fMRI)15,16. The hierarchy...

[3] (src:d830a9d5) Naturalistic stimuli, such as movies3 and spoken narratives17, offer the constraint and replicability that resting state acquisitions lack while adding greater ecological validity than traditional task designs18. Recent analyses of movie viewing fMRI data using the HMM have revealed a hierarchy of t...

[4] (src:80911ca8) Functional MRI (fMRI) has proved effective at capturing event representations in adults during continuous, naturalistic experience (27). In one fMRI approach, behavioral boundaries from an explicit parsing task are used as event markers to model fMRI activity during passive movie watching. Regions s...

[5] (src:d22f0100) Article https://doi.org/10.1038/s41467-023-37613-7 Intrinsic timescales in the visual cortex change with selective attention and reflect spatial connectivity Roxana Zeraati 1,2, Yan-Liang Shi 3,4, Nicholas A. Steinmetz 5, Marc A. Gieselmann6, Alexander Thiele6, Tirin Moore 7, Anna Levina 2,8,9,10 & ...

[6] (src:d22f0100) We examined how the intrinsic timescales of spiking activity in visual cortex were affected by the trial-to-trial alterations in the cog-nitive state due to visual spatial attention. We analyzed spiking activity recorded from local neural populations within cortical columns in primate area V4 during...

[7] (src:d22f0100) prediction in our V4 recordings. In contrast, heterogeneous biophy-sical properties of individual neurons alone cannot account for both temporal and spatial structure of V4 correlations. Thus, the V4 time-scales arise from spatiotemporal population dynamics shaped by the local spatial connectivity s...

[8] (src:d22f0100) relevant neurons. Our results further show that the modulation of timescales also occurs in sensory cortical areas and cognitive processes other than memory maintenance13 which explicitly requires temporal integration of information. The correlation of slow timescales with reaction times during atte...

[9] (src:80911ca8) Neural event segmentation of continuous experience in human infants Tristan S. Yatesa, Lena J. Skalabana, Cameron T. Ellisb , Angelika J. Bracherc,d , Christopher Baldassanoe, and Nicholas B. Turk-Brownea,f,1 Edited by Linda Smith, Indiana University Bloomington, Bloomington, IN; received January 9,...

[10] (src:f176890b) Some psychological theories of event segmentation have pro-posed that segmentation is automatic and ongoing. In Newtson’s (1976) account, behavior perception is a feature monitoring process in which perceivers monitor for changes in some criterial set of features. A change in one or more of these fe...

[11] (src:f176890b) A recent account of event segmentation predicts both that seg-mentation is automatic and that it depends on processing situ-ational changes. According to Event Segmentation Theory (EST; Zacks et al., 2007), the perception of event boundaries is a side effect of prediction during ongoing perception. ...

[12] (src:5d9910b7) Inter-event connections could benefit both memory encoding and retrieval. At encoding, events with strong connections to numerous other events might be frequently reactivated by these links to form robust and integrated representations19,20. At retrieval, events with many connections might be more l...

[13] (src:5d9910b7) events25–27. These non-causal (semantic) relations, based on shared meaning and overlapping components between events, may constitute a previously underexplored pathway through which inter-event connections enhance memory. Here, we propose that when people view and recall realistic, continuous audio...

[14] (src:5d9910b7) In this and all the above analyses involving pISC during recall, twelve events recalled by fewer than five participants were excluded. However, our main pISC analysis results remained qualitatively identical when all events were included in the analysis (Supplementary Fig. 13). Narrative network cen...

[15] (src:4fbb427f) Neural representations of naturalistic events are updated as our understanding of the past changes Asieh Zadbood1*, Samuel Nastase2, Janice Chen3, Kenneth A Norman2, Uri Hasson2 1Department of Psychology, Columbia University, New York, United States; 2Princeton Neuroscience Institute and Department ...

[16] (src:4fbb427f) In addition, we qualitatively reproduced our results by performing an ROI- based whole brain anal- ysis (Appendix 1—figure 3, p<0.01 uncorrected). This analysis confirmed the importance of DMN regions for updating neural event representations. However, strong differences in pISC in the hypoth-esized...

[17] (src:4fbb427f) The default mode network, traditionally known to support internally oriented processes, is now considered a major hub for actively processing incoming external information and integrating it with prior knowledge in the social world (Yeshurun et al., 2021). Our experimental design targets natural- is...

[18] (src:4fbb427f) Our findings are consistent with the view that DMN synthesizes incoming information with one’s prior beliefs and memories (Yeshurun et al., 2021). We add to this framework by providing evidence for the involvement of DMN regions in updating prior beliefs in light of new knowledge. Across our differe...

[19] (src:a5c01860) When reading a narrative text, both the dorsal and ventral visual systems are activated. To illustrate the patterns of interactions between the dorsal and ventral visual systems in text reading, we conducted analyses of functional connectivity (FC) and effective connectivity (EC) in a left-hemispher...

[20] (src:a5c01860) A further important aim of the current study was to ask how the regions in the dorsal and ventral visual systems causally affect each other in text reading. The directional influence (i.e., top– down versus bottom–up) between the dorsal and ventral visual streams is a hotly debated issue in the disc...

[21] (src:b0437311) https://doi.org/10.1038/s42003-024-07434-5 Neural processing of naturalistic audiovisual events in space and time Check for updates Yu Hu 1,2 & Yalda Mohsenzadeh 1,2,3 Our brain seamlessly integrates distinct sensory information to form a coherent percept. However, when real-world audiovisual events...

[22] (src:b0437311) Currently, DNNmodels serve as the best models of the human visual or auditory system140–144. However, their similarity with human brain responses in multisensory perception is less explored145. Generally, the match between DNN models and the brain depends on multiple factors, such as the training da...

[23] (src:a24af881) nature human behaviour Article https://doi.org/10.1038/s41562-022-01516-2 Evidence of a predictive coding hierarchy in the human brain listening to speech Charlotte Caucheteux   1,2 , Alexandre Gramfort1,2 & Jean-Rémi King   1,3 Considerable progress has recently been made in natural language proc...

[24] (src:a24af881) Predictive coding theory25–27 offers a potential explanation to these shortcomings; while deep language models are mostly tuned to predict the very next word, this framework suggests that the human brain makes predictions over multiple timescales and levels of repre-sentations across the cortical hi...

[25] (src:a24af881) we highlight that this issue also prevails in language models, where word sequences, but arguably not their meaning, rapidly become unpredictable. Our results suggests that predicting multiple levels of representations over multiple temporal scopes may be critical to address the indeterminate nature...

[26] (src:a24af881) The time range of predictions varies along the brain hierarchy Both anatomical and functional studies have shown that the cortex is organized as a hierarchy28,45: for example, low-level acoustics, pho-nemes and semantics are primarily encoded in Heschl’s gyrus, the superior temporal gyrus and the as...

[27] (src:4927f9ca) Explaining the neural encoding in these higher ventral areas thus remains a fundamental open question in systems neuroscience. As with V1, models of higher ventral areas should be neurally predictive. However, because the higher ventral stream is also believed to underlie sophisticated behavioral ob...

[28] (src:4927f9ca) Significance Humans and monkeys easily recognize objects in scenes. This ability is known to be supported by a network of hierarchically interconnected brain areas. However, understanding neurons in higher levels of this hierarchy has long remained a major challenge in visual systems neuroscience. W...

[29] (src:4927f9ca) dictivity in all three selection regimes. Models that performed better on the categorization task were also more likely to pro-duce outputs more closely aligned to IT neural responses. Al-though the class of HLN-consistent architectures contains many neurally inconsistent architectures with low IT p...

[30] (src:4927f9ca) Discussion Here, we demonstrate a principled method for achieving greatly improved predictive models of neural responses in higher ventral cortex. Our approach operationalizes a hypothesis for how two biological constraints together shaped visual cortex: (i) the functional constraint of recognition ...

[31] (src:d22f0100) The mechanism underlying the diversity of intrinsic timescales across cortical areas can be related to differences in the connectivity. The hierarchical organization of timescales correlates with the gra-dients in the strength of neural connections in different cortical areas24,25. These gradients e...

[32] (src:80911ca8) With this adult comparison in hand, we tested three hypotheses about event segmentation in the infant brain. The first hypothesis is that infants possess an adult-like hierarchy of event timescales across the brain. This would fit with findings that aspects of adult brain function, including resting...

[33] (src:80911ca8) In adults, we replicated previous work showing a hierarchical gradient of event timescales across cortex, with more/shorter events in early visual compared to higher-order associative regions (Fig. 2A). Qualitative inspection revealed that boundaries in EVC seemed to correspond to multiple types of ...

[34] (src:02e01963) relates to the complexity of the neural hemodynamic responses and the heterogeneous vascular network topology of the cerebral cortex (Duvernoy et al., 1981; Havlicek & Uludag, 2020). Neuronal activity is coupled to an increase in the cerebral metabolic rate of oxygen, which, under conditions of norm...

[35] (src:7b279216) BOLD signal decomposition and the rationale behind ISFC. We model the measured BOLD signal in each voxel as a sum of three components (Fig. 1a): stimulus-induced signal (S), intrinsic neural signal (I) and non-neuronal (for example, physiological) noise signal (N)21–25. The stimulus-induced signal (...

[36] (src:a24af881) Discussion In the present study, we put specific hypotheses of predictive coding theory to the test25–27. While deep language algorithms are typically trained to make nearby and word-level predictions1–3,53–55, we assessed whether cortical hierarchy predicts multiple levels of representations, spann...

[37] (src:5d9910b7) Discussion In this study, we found that the structure of inter-event con-nections in complex naturalistic experiences predicts the beha-vioral and neural signatures of their memory traces. We applied an approach of transforming audiovisual movies into networks, whose nodes are events and whose edges...

[38] (src:4927f9ca) N EU RO SC IE N CE SE E CO M M observed heterogeneities in higher ventral cortex areas (13, 32), but much work remains to be done to confirm such a hypothesis. Top-Down Approach to Understanding Cortical Circuits. A common assumption in visual neuroscience is that understanding the tuning curves of ...

Sources used: 12 documents

  • d22f0100-a0f2-418b-ae2b-e8d706d88c08
  • d830a9d5-a8b2-4a71-9182-b901f9aacbc9
  • 80911ca8-f159-4124-afe5-b08b499b5af1
  • f176890b-672c-4728-b2be-d46fba35b3a0
  • 5d9910b7-d433-4338-8101-eaf16bd78a22
  • 4fbb427f-36b7-4589-8a01-c9721889a8b5
  • a5c01860-d259-45c9-baf6-048efa6fd684
  • b0437311-9af3-4fdc-8d71-9967fa6e1efa
  • a24af881-c199-40d5-81a3-cf38ddba7e81
  • 4927f9ca-1d17-4fe4-8e75-86ba9a89c908