The adjacency matrices for the least visited states 1, 2, and 4 are presented in Fig 6. Adjacency matrices for states 1, 2, and 4 which appeared only in the TBI sample. Given the finding that states 5 and 6 differentiated the groups, we correlated mean frequency for these two states with three behavioral outcomes variables in TBI: Given the high frequency of state 5 in HCs and that this state was an important differentiator between groups, we examined the relationship between the mean cost for state 5 hubs see Fig 11 and the frequency for state 5.
That is, as part of the second hypothesis, we aimed to determine if state-level dynamics could be driven by the connectivity of the most highly connected nodes. Only partial support for this hypothesis was revealed. Of the three hubs of interest, nodes in the SN showed some relationship with state 5 frequency, whereas nodes in the DMN and ECN were not strong predictors or drivers of state 5 frequency Table 8. Cost probability distributions for state 3.
Distributions based upon the cost for all nodes collapsed across runs for each group. Hubs identified for each of the samples for state 3. Cost probability distributions for state 5. Hubs identified for each of the samples for state 5. Cost probability distributions for state 6. Distributions based upon the cost for all notes collapsed across runs for each group. Hubs identified for each of the samples for state 6. Average cost for hubs for run 1 and run 2 and then the average between runs. To determine if the network cost associated with state 5 hubs predicted behavior, correlational analysis was conducted using the three simple visual scanning tasks mDSST, Trails A, VSAT and hubs established state 5.
These three hubs showed inconsistent relationships with comparisons revealing small to medium effect sizes, with the hubs of the SN and ECN showing some prediction of behavior and no relationship between DMN and behavior. Overall, network hubs in the DMN, SN, and ECN were generally poor to modest predictors of performance on tests of information processing efficiency collected inside and outside the scanner see Table 9.
Examining the relationship between cost in three hubs in a priori networks and behavioral variance on the in-scanner task mDSST in the TBI sample Note: One important goal in this study was to determine the predictors of network variability, or state transitions, and the relationship between network variability and performance. Subtle differences were evident when comparing the number of state transitions between the TBI and HC groups. To examine the consistency of these results we directly examined the influence of k on state transitions and the TBI sample consistently showed reduced network dynamics see Fig 5.
We interpret these data to be indicative of reliably reduced state transitions in TBI. This finding does not support the hypothesis that brain injury results in increased network variability. The findings are presented in Table 10 top , but overall, the number of run 1 transitions was a modest predictor of network dynamics for variance in performance for both behavioral runs. That is, early network transitions predicted variance in performance for run 1 and run 2. The relationship between the number of state transitions and variability on in-scanner task performance.
The relationship between state transitions and mean nodal degree total network connectivity. Transitions during the first run is a predictor of variance for both runs, whereas state transitions during the second run did not predict behavioral variance. Lastly, it was a goal to determine if local or global connectivity predicted the number of transitions. Based upon the static network cost we examined the mean connectivity of the top five nodes i. Regional influences maintained no relationship with state transitions r-values ranging from 0. The current study used a graph theoretical framework and dynamic connectivity modeling to examine functioning of large-scale neural networks after head trauma.
We used two runs of task-related data collection in order to examine the reliability of the findings. The following discussion is focused on several key points. Third, it appears that the TBI sample may be less likely to transition between states and the number of transitions during run1 was a modest predictor of performance variability for both runs.
Finally, the network variance identified as state transitions was predicted by global connectivity, indicating that the increased connectivity commonly observed in TBI may be a source of reduced network variability. We discuss the implications these findings have for understanding large-scale neural network changes occurring after significant neurological disruption. There is a growing literature demonstrating the unique role of network hubs as driving brain dynamics toward specific states [ 15 , 23 , 54 , 55 ].
In Hypothesis one, we anticipated that network hubs would predict the frequency for dynamic connectivity states. We focused the analysis on state 5 given its relatively high frequency and that it was one state that differentiated the groups. Five hubs were examined and collapsed into three distinct networks: These findings revealed that, within individuals with TBI, mean degree for state 5 hubs maintained a positive correlation with state 5 frequency for run 1 but not for run 2. Given our goal to use the two runs of data to demonstrate the reliability of findings, interpreting this finding is difficult.
If we focus on possible effects of chronology, hubs may serve as the backbone for information transfer during run 1, thus driving activity in state 5, but the network requirement for this influence diminishes over time. Similar network dynamics have been observed in schizophrenia where early measures of network flexibility during early runs predict concurrent and later network functioning including behavioral performance [ 56 ]. While our goal in this study was to focus on cortical hubs as drivers of network states post injury, one unexpected finding worth revisiting was the repeated observation that crus 1 and II functioned as a hub within the network.
This was evident for all three states showing the highest frequency post TBI and in state 6, this finding was unique to the TBI sample. The cerebellum is involved in a number of functions including timing and circadian rhythms, associative learning mechanisms, and higher level cognitive processing see [ 21 , 57 ] for meta-analytical and theoretical reviews. In particular, Crus I and II have been consistently linked to roles in information processing e. We anticipate that further investigation here is worthwhile given that the history of findings of enhanced cerebellar response in recent network connectivity modeling in both moderate and severe TBI [ 58 ] and mild TBI [ 17 , 59 ] and even altered cerebellar response as a primary indicator of response to methylphenidate intervention to improve cognition in TBI [ 32 , 60 ].
The primary prediction that brain injury results in greater variability i. Therefore, the range of network expression was more restricted in the TBI sample with respect to: Given the established literature documenting the increased variation in behavioral performance during tasks of cognition [ 61 ] and motor functioning [ 6 ] after injury, we expected that these behavioral deficits would be mirrored by more variable network dynamics operationalized here as transitions. In other words, as the variability in performance increases, individuals with TBI were less likely to show state transitions.
Other investigators have reported similar results after injury demonstrating that after TBI, higher brain signal variability may be predictive of cognitive recovery [ 62 ] and that brain injury results in reduced network variability [ 63 ]. Recent work based in control theory used simulations demonstrate a limited dynamic range of states available to individuals with mild TBI [ 64 ]. The loss of nodal specificity may result in incorporation of additional resources or engagement of alternative auxiliary pathways [ 24 , 66 ] or hubs see Fig 10 in TBI resulting in less freedom for expression of network dynamics and greater susceptibility to neural noise [ 64 ].
Finally, while there were subtle relationships between the states differentiating the two groups states 5 and 6 , the relationship between network dynamics and cognitive outcome does not appear to be straight-forward. Future work should be organized around modulating variation in performance using task-load manipulations to determine the contributors to network states arising during task that may account for performance variability.
Similar analyses might be extended to other clinical disorders, such as multiple sclerosis, where the diffuse effects of pathophysiology have been shown to result in increased performance variability. One final observation is that four of the 23 subjects with TBI showed zero frequency of time spent in states 5 and 6 and very little to no frequency in state 3.
These subjects occupied states 1, 2, and 4 and three of the four showed zero transitions between states. In each case, occupation of these rare brain states was true for both runs so the finding was reliable, but given the small numbers observed in these states, it is possible that these three rare states were occupied solely TBI cases as opposed to HC cases only by chance. If allowed to interpret the networks for these rare cases, states 1 and 4 were characterized by high overall network cost with several regions showing very high connectivity including the ECN, SN, and visual networks See Fig 6.
The reason for the emergence of these states in this TBI sub-group is not clear, but based upon clinical MRI at the time of injury, there was significant disruption of frontal systems in these four cases. Additional work will be needed to determine if state-level analyses reveal sub-types within TBI that emerge as distinct network responses to injury or if these states are not related to pathology and are also visible in HCs.
The current approach provides an important opportunity to examine whole-brain connectivity over multiple time scales. While we report the first set of findings related to the local and global changes in network connectivity and cost in moderate and severe TBI, this study is not without limitations. First, like most all studies in this literature, TBI is a heterogeneous disorder and ideally we would have a sample size that permitted subgroups for analysis. While the current data demonstrated the within subject reliability of network dynamics and the sample size here is comparable to prior graph theory analysis examining static networks after moderate and severe TBI [ 4 , 6 , 7 , 68 ], the sample size for this study does preclude direct examination of the reliability of these findings with respect to the groups e.
For this reason, replication of the current findings is needed in a separate group of individuals with moderate and severe TBI with focus on the primary findings: Second, ICA has distinct advantages both with respect to reduction of the influence of nuisance signal 17 and avoiding signal averaging across heterogeneous signals. However, there are inherent limitations to a brain parcellation of 44 network nodes in particular with respect to subtle effects within subnetworks e.
To the degree that BOLD fMRI is sensitive to these subtle differences, group-level ICA may not detect more nuanced effects where the network nodes are changing as well as the between-node interactions. However, we anticipate that ICA remains an ideal statistical application given the focus in this study on large-scale network dynamics, including observing the most reliable networks e. In summary, this study supports the reliability of examining dynamic network states after neurological disruption. It also supports prior work demonstrating the potentially unique role of some network hubs in motivating network states.
Finally, the possible loss of network dynamics after TBI may be a predictor of greater behavioral variability. This link between network dynamics and behavioral outcome in TBI appears to be an important future line of investigation. We want to thank Dr. Arnab Roy for his support of this work and for his mentorship of the first author during analysis and manuscript preparation.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. National Center for Biotechnology Information , U. Published online Jun 8. Rajtmajer , Formal analysis , Methodology , 5 and Frank G. Author information Article notes Copyright and License information Disclaimer. Received Oct 24; Accepted May 2. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract Over the past decade there has been increasing enthusiasm in the cognitive neurosciences around using network science to understand the system-level changes associated with brain disorders. Study goals and hypotheses Our objective is to understand the influence TBI has on large-scale neural networks with focus on distinct brain states assessed using dynamic functional connectivity dFC approach.
Materials and methods Subjects included 23 individuals who sustained moderate to severe TBI and 19 healthy controls of comparable age and education see Table 1 for demographic information. Table 1 Demographic information. Open in a separate window. Table 2 Injury severity, mechanism of injury and brain imaging findings for all TBI subjects. S Glasgow Coma Scale score Mechanism of Injury Injury Characteristics 1 7 Fall Edema in frontal pole bilateral, paracingulate gyrus, and bilateral cerebral white matter 2 unknown MVA No acute findings; A few weeks post injury SPECT and PET scans revealed bilateral frontal and temporal lobe and cerebellar findings 3 unknown unknown Small to moderate left temporo-parietal subdural hematoma with pneumocephalus.
Diffuse left cerebral edema and small subarachnoid blood along the superior interhemispheric falx. Tractor trailor Intraventricular hemorrhage in bilateral lateral ventricles right greater than left , the left temporal horn, the third and the right fourth ventricles.
Diminished neural network dynamics after moderate and severe traumatic brain injury
Punctate bifrontal contusions and hyperdensity adjacent to the temporal horn and adjacent to the falx is consistent with contusion. There is no evidence of acute bony fracture.
Intraventricular hemorrhage in occipital horns bilaterally, left greater than right. Small subdural along the posterior interhemispheric fissure and tentorium. There is mild sulcal enlargement advanced for age of the patient. Subarachnoid hemorrhage in right posterior frontal and temporal lobes. Cerebral contusion in bilateral frontal lobes, left parietal lobe and left temporal lobe. Nearly complete effacement of the convexity sulci, sylvian fissures resulting in subtle shift to left. Bifrontal subcortical foci of hyperdensity suggesting DAI. Left intraventricular choroid acute hematoma.
A small left frontal parasagittal hyperdensity is seen also suggesting of a subdural acute hematoma. Small scattered punctate foci of intraparenchymal hypodensities throughout the cerebral hemispheres e. Midline shift measuring approximately 4 mm. Left temporal bone fracture. Behavioral data All study participants were administered a neuropsychological battery of tests to assess level of cognitive functioning. Table 3 Neuropsychological performance of TBI group.
Mean sd raw score. Modified digit symbol modalities task. Table 4 Motion detection summary. Volumetric values To help constrain the interpretation of neural network connectivity, we integrate information about brain volume change over time. Table 5 FreeSurfer analysis: Mean sd volume ratio. Analytic approach and pipeline To test our hypotheses, a functional connectivity network for each subject was developed, and a subject-specific connectivity profile was created based upon an edge-list comparing the correlations of all pairs of viable components outputted by ICA below.
Data processing pipeline for functional data. Whole-brain mask We used a whole brain mask from all participants. Static and dynamic functional connectivity analysis Preprocessed data for both groups and both sessions were organized into spatially independent components ICs using group-level spatial ICA in GIFT toolbox http: Components from spatially constrained ICA.
Static network analysis To provide context for the dynamic analysis, we first conducted a static graph theoretical analysis using the components selected during the ICA output from GIFT i. Dynamic network analysis Following the initial ICA step which determined the subject-wise static maps, we aimed to assess the dynamic properties of run 1 and run 2 for all subjects.
Metrics from graph analysis In order to provide context regarding the network topology, several standard graph metrics were analyzed. Total number of functional connections The total number of functional connections in the brain for each subject at each time point was evaluated by counting the total number of statistically significant FDR criterion functional edges.
Network strength Network strength was defined as the total sum of the absolute value of the network weights. Clustering coefficient In order to examine local communication efficiency within the network, we examine local clustering coefficiency. The local clustering coefficient CC of vertex i is defined as: Average shortest path length In order to examine global network efficiency, average shortest path length PL was computed between all pairs of nodes.
PL is defined as follows: Network cost Consistent with Roy and colleagues [ 53 ], we define the cost of each functional edge as the product of the Euclidean distance between the ROI-pair it connects and the absolute weight of the connection. Results Static connectivity analysis Fig 3 illustrates the components included during the static and dynamic network analysis. Table 6 Global graph metrics for static analysis. Table 7 Dynamic connectivity network states. Reliability of dynamic analysis To examine the reliability of dynamic meta-states after injury Hypothesis 1 , we correlated the frequency and number of state transitions between Run 1 and Run 2 in each sample.
Test-retest reliability across runs for state frequency and transitions for both samples. Examining the influence of the number of states k on state transitions between groups. Comparing dynamic states between groups The adjacency matrices for the least visited states 1, 2, and 4 are presented in Fig 6. Adjacency matrices for network states 1, 2, and 4. Adjacency matrix and connectogram highlighting the connectivities across subnetworks for state 3. Adjacency matrix and connectogram highlighting the connectivities across subnetworks for state 6.
Adjacency matrix and connectogram highlighting the connectivities across subnetworks for state 5. Cost probability distributions and hubs for state 3 for both samples. Table 8 Correlation between state 5 hubs and state 5 frequency in TBI. Table 9 Correlation between state 5 hubs cognition in TBI. Examining network dynamics transitions One important goal in this study was to determine the predictors of network variability, or state transitions, and the relationship between network variability and performance.
Table 10 Examining network dynamics transitions after TBI. Discussion The current study used a graph theoretical framework and dynamic connectivity modeling to examine functioning of large-scale neural networks after head trauma. Hubs as network drivers There is a growing literature demonstrating the unique role of network hubs as driving brain dynamics toward specific states [ 15 , 23 , 54 , 55 ].
State transitions and behavioral variability The primary prediction that brain injury results in greater variability i. High cost network states in TBI One final observation is that four of the 23 subjects with TBI showed zero frequency of time spent in states 5 and 6 and very little to no frequency in state 3. Study conclusions and limitations The current approach provides an important opportunity to examine whole-brain connectivity over multiple time scales.
Acknowledgments We want to thank Dr. Data Availability We have our data freely available on OpenNeuro. Functional magnetic resonance imaging of working memory impairment after traumatic brain injury. J Neurol Neurosurg Psychiatry [Internet]. Brain activation during working memory 1 month after mild traumatic brain injury A functional MRI study.
AAN Enterprises; ; 53 6: Resting network plasticity following brain injury [Internet]. The Rich Get Richer: Default mode network functional and structural connectivity after traumatic brain injury. Graph analysis of functional brain networks for cognitive control of action in traumatic brain injury. Topological correlations of structural and functional networks in patients with traumatic brain injury.
Frontiers ; ; 7: Examining working memory task acquisition in a disrupted neural network. White matter integrity related to functional working memory networks in traumatic brain injury. Abnormal connectivity in the sensorimotor network predicts attention deficits in traumatic brain injury.
Springer; ; 3: Menon V, Uddin LQ. Saliency, switching, attention and control: Brain Struct Funct [Internet]. Default mode network connectivity predicts sustained attention deficits after traumatic brain injury. Salience network integrity predicts default mode network function after traumatic brain injury. Proc Natl Acad Sci [Internet]. Mapping the functional connectome in traumatic brain injury: Elsevier; ; [ PubMed ]. Functional Neuroimaging in Traumatic Brain Injury: From Nodes to Networks. Frontiers Media SA; ; 8.
Poldrack RA, Yarkoni T. From brain maps to cognitive ontologies: Annual Reviews ; ; The effect of preprocessing pipelines in subject classification and detection of abnormal resting state functional network connectivity using group ICA. Graph theory approaches to functional network organization in brain disorders: A critique for a brave new small-world bioRxiv. Cold Spring Harbor Laboratory; ; Test—retest reliability of resting-state connectivity network characteristics using fMRI and graph theoretical measures.
Elsevier; ; 59 2: Wiley Online Library; ; 30 2: Functional topography in the human cerebellum: Elsevier; ; 44 2: Elsevier; ; 84 2: High-cost, high-capacity backbone for global brain communication. Proc Natl Acad Sci. National Acad Sciences; ; Injured Brains and Adaptive Networks: Changes in resting connectivity during recovery from severe traumatic brain injury.
Int J Psychophysiol [Internet]. Chronology and chronicity of altered resting-state functional connectivity after traumatic brain injury. Mary Ann Liebert, Inc. Stated like this, FEM relates to other models of cognition such as predictive coding [ 11 , 23 , 24 ], Bayesian population coding [ 19 , 25 , 26 ] or active inference [ 27 — 29 ]. These frameworks promote a hierarchical organization of coupled systems, based on feedback error prediction. At the brain level, this paradigm is argued to occur at all scales and with different mechanisms, having always an afferent system e. In a model of visual processing, Rao and Ballard proposed that the visual cortex is organized hierarchically for encoding natural images based on feedback connections that carry predictions of lower-level neural activities [ 23 ].
In this formulation, the supervising system AM attempts to learn a model of the afferent network RNN in accordance to the evidence that a particular policy is being pursued in order to control it for generating long sequences. This may correspond to different places in the brain for decision-making and perceptual inference [ 30 ].
This architecture is a model-free reinforcement learning for exploratory behaviors in a recurrent working memory WM of spiking neurons and model-based reinforcement learning in a short-term memory STM with reward signal. The later memory model corresponds to the Basal Ganglia BG where simple signal-response rules are learned by an associative map AM to trigger one spatio-temporal sequence into the working memory.
The dopaminergic signal supervises both the exploratory search in the WM and the learning in the STM when the goal has been retrieved. RNNs, once unfolded in time, can be seen as a virtually deep feed-forward network in which all the layers share the same weights [ 33 ].
Original Research ARTICLE
The reinforcement signal on the output dynamics can serve to control the input dynamics with noise to search stochastically the inputs that diminish the error to the output dynamics. In studies on habit formation, several researchers advocate for a dynamic role of the basal ganglia BG when interacting with other areas [ 31 , 32 ].
Although the role of the striatum is commonly focused on the encoding and control of stimulus-response based on dopaminergic reward, Yin and Knowlton in [ 31 ] see the BG as a generator of dynamics that selects and amplifies certain dynamics while eliciting others. In their model, information flows from cortex to the basal ganglia to thalamus and back to cortex, but each system is dynamic. INFERNO may relate to these features of the cortico-basal system as it exploits noise as a generator for diversity and error prediction minimization for goal-directed behavior. In line with this, [ 16 ] proposes that the flexible processing of contextual situation done in the neo-cortex CX is driven by a sub-cortical controller, the basal ganglia BG , toward a targeting goal provided by the prefrontal cortex PFC.
We will discuss about the relevance of our model based on neurobiological considerations in the next section. In order to demonstrate the capabilities of our model for recursivity and boot-strapping capabilities, we will design several experimental setups for habit learning top-down control and retrieval phases bottom-up self-organization of spiking neurons sequences, and its application to sequential planning of arm movements. We will discuss then the relevance of our model with respect to neurobiological data, its computational power for robotics and AI, neuromorphic hardware implementations, and its affiliation to certain computational principles of the brain proposed by [ 21 , 33 — 35 ].
These structures are hypothesized to form a working memory of action-perception rules [ 29 ]. Recently, they have been identified to serve for sequence generation [ 9 ] and for self-generated thought [ 17 ]. In line with these proposals, we see the spiking RNN in our framework to play the role of the IPL working memory, the associative map to play the role of BG, the PFC to provide the goal task and the reinforcement signal to correspond to a dopaminergic signal; see Fig 2.
Following this, the IPL cortical neuronal chains can be assembled dynamically and recursively toward higher-level actions and functions depending on the targeting goal furnished by other brain structures, supposedly the Pre-Frontal Cortex PFC and the Basal Ganglia BG. This architecture appears important for reaching and grasping [ 40 , 41 ], arithmetic operations [ 16 , 42 ] as well as language formation.
For instance, in the language domain, lexical chains are hypothesized to be constructed dynamically based on a global context and a set of grammatical rules. Here, branching is done by BG, entering a new state in the cortical working memory until completion of the task given by PFC using a Dopaminergic reinforcement signal. Moreover, our model is greatly in line with recurrent spiking neural network models using reinforcement signals for sequential planning [ 9 ] and [ 10 ]. Its capabilities to boot-strap clusters recursively and to retrieve ordinal sequences make it compatible also with reservoir computing methods [ 44 , 45 ] such as the echo-state networks [ 46 ], RNNPB [ 47 ] or the dynamical neural fields [ 48 ].
Its properties to assemble dynamically neural chunks remind further Genetic Programming optimization of neural networks like NEAT and others [ 49 ]. Interestingly, once unfolded in time, its structure can be seen also as a virtually deep feed-forward network in which all the layers share the same weights [ 33 ]. Rolfe and LeCun proposed an architecture similar called DrSAE, in which auto-encoders evaluate and minimize the function given by the recurrent map [ 50 ]. Here, the reinforcement signal on the output dynamics can serve to control the input dynamics to search stochastically the inputs that diminish the error.
This stochastic descent gradient that we employed in RNN can remind the accumulation of evidences process sampled continuously over time of the LIP neurons [ 51 — 53 ]. These neurons show ramping responses inferring latent decision making so that the better the evidence, the larger the amplitude. The decision making can be seen as a random fluctuation Wiener process pressured by time constraints and decision thresholds [ 54 ]. We used in the recurrent neural network a variant of the Hebbian equations, the Rank-Order Coding ROC algorithm, which grasps well the structure of the Spike Timing-Dependent Plasticity algorithm and of the classical Delta rule in the spatio-temporal domain [ 55 ].
STDP has been discovered to modulate the neural activity of temporally related neurons in many brain regions by reinforcing their links. The Rank-Order Coding algorithm has been proposed by Thorpe and colleagues as a discrete and faster model of the derivative integrate-and-fire neuron and of the standard STDP reinforcement learning algorithm [ 56 ]. The rationale is that ROC neurons are sensitive to the sequential order of the incoming signals; that is, their rank code.
If the rank code of the input signal matches perfectly the one of the synaptic weights, then the neuron fully integrates this activity over time and fires. At contrary, if the rank coding of the signal vector does not match properly the ordinal sequence of the synaptic weights, then integration is weak and the neuron discharges proportionally to it. To this respect, this mechanism captures the intrinsic property of cortical spiking neurons. For an input vector signal I of dimension M and for a population of N neurons M afferent synapses , we have:.
Viewed as an optimization problem, the control of the RNN dynamics consists in retrieving the most salient inputs that will trigger the neural units to specific amplitude values. This is an inverse problem and can be solved with a gradient descent. In order to explain better the mechanism behind, we can reduce the control of the RNN dynamics to its simplest case with the controlling of one neuron solely, see Figs 1a and 3a and 3b. We can consider the control of the amplitude level V of one neuron as an optimization problem. The exploration of I search can be done by stochastic gradient descent and meta-heuristics methods.
This optimization technique can be applied and extended at a neural population-level. As a meta-heuristic method, retrieving I search can be done with a stochastic gradient descent greedy search by injecting some noise to I while using V as a metric distance: This optimization technique can be extended to a population of neurons and applied to distant rewards, in these cases the terms I , I search , E and V are vectors, see Table 1. The number of iterations necessary for the WM to converge is not taken into account, therefore the recurrent map will explore several solutions in an unlimited amount of time till convergence.
One common solution is to use a threshold value to stop the search. This problem is known in neuroscience as the credit assignment problem [ 58 ]: In its present form, the reinforcement signal algorithm corresponds in AI to a classical meta-heuristic method with random walks, which does no prevent from local minima.
It may correspond in neurocomputational theory to dopaminergic modulation and to model-free reinforcement learning [ 59 ]. However, it does not take into account more sophisticated types of signals, which could be given further by other types of neuromodulators [ 60 ]. The neural architecture consists of one recurrent neural network arranged as in Fig 1a. Its value is chosen with respect to the average synaptic time found in the neurons of the cortical maps, about 50 ms [ 61 ].
The function f is the inverse function as explained in section 2. The buffer is used to model the recurrent activity of the neural network over time. After each iteration, the buffer that retranscribes the neural activity over time is shifted and presented again to the neural population. We resume in the Table 3 below the different experiments that we have done to present our model. The first experiment corresponds to the study of the RNN optimization along with the stochastic descent gradient toward goal-driven control. The second experiment presents its application to a 3 degrees-of-freedom robotic arm control.
The fourth and fifth experiments describe the ability of AM-RNN working memory to generate long-range spatio-temporal learned sequences, in a flexible way resp. Description of the different experiments done on their corresponding section. In this section, we study solely the RNN, decoupled from the associative map, in order to explain its behavior during goal-driven control. First the recurrent map learns some spatio-temporal rules for several iterations until convergence of its dynamics.
This is done using the reinforcement mechanism presented in the previous section. This first graph shows how well all trajectories of the network are converging to a global minima. This convergence is also fast as it requires at most 20 iterations to reach it. After initial conditions, the input and output vectors converge both rapidly to a stable pattern, for which the neuron 24 is the most active neuron indicated arrow. The amplitude level of the neurons in the RNN converges to the desired output vector rapidly in dozen iterations; some solutions are more precise than others due to local minima.
The goal-directed behavior of the working memory is also exemplified in Fig 6a and 6b in which the neurons dynamics at several time steps is plotted for the input and output vectors respectively. The super-imposed activity level in black for the input and output vectors corresponds to small variations of the input vector controlled by the reinforcement signal in a dashed line that induce the convergence of the output dynamics to the desired vector in b plain line.
We observe that small amplitude variation in the input dynamics is well sufficient to make big amplitude variation in RNN as the output dynamics in blue gradually converges to the desired goal. This shows that the working memory can be controlled as a dynamical system or a chaotic system and its sensitivity to initial conditions can be used to retrieve any spatio-temporal pattern as it would be for an attractor [ 62 ]. The small amplitude variations added in the input dynamics dashed line achieve to induce big output changes in RNN with the triggering of the desired spikes plain black line.
The amplitude level of the recurrent map dynamics for the four maps are different yet they converge all to the triggering of the same neuron. We make the note that the neural activity at the population level is sub-threshold till the activation of the desired neuron at the end.
Although the network and the learning process are based on spikes, the inter-dependency among the neurons is enough to produce weak coordinated dynamics, which can have a strong effect. The four trajectories show some similar sub-threshold patterns although they exhibit also high variabilities, in the temporal delays as well as in the amplitude level. The causal chaining in the neural network is not straightforward to observe. We propose therefore to plot the spatio-temporal trajectory within the working memory for ten solutions found; see Fig 8.
We plot the neural trajectory till reaching the goal vector by selecting at each iteration the most active neuron. We emphasize that the most active neuron at each iteration is also the most influential for driving the neural activity for the next steps. We can observe from the graph that all trajectories have different lengths, although in average they converge after ten iterations. At the same time, the spatio-temporal trajectories present some similar patterns within their dynamics placed coherently at the beginning, middle and end of the sequence that we retrieve in different trials.
These patterns come from the short-range synaptic rules learned and represent one chunk or one unit that is combined with others to constitute a longer chain, up to sixteen elements in our case. We stress that these chunks are dynamically assembled and not predefinedly learned, although they present one stable shape.
The trajectories are created by picking up at each iteration the most active neuron. The ten trajectories present a mix of neural chunks common to all trajectories and of novel patterns found solely in them. Each trajectory is retrieved dynamically novelty although the solutions appear similar redundancy due to the constraint dynamics in RNN. We use the RNN as a working memory for controlling the motion of a three joints robotic arm in a 2D space, see Fig 9a. We exploit the goal-directed behavior of the recurrent network for sequential planning and for the reaching of five positions in space.
The three angles of the robotic arm are coupled to the dynamics of three neurons of the recurrent network with same properties than the one presented in the previous section and the reinforcement signal is simply the euclidean distance of the end-effector to the goal. The neural activity between [0; 0. The result of the arm trajectory is presented in Fig 9a and the output dynamics of the neural network is shown respectively in Fig 9b. The network easily retrieves the different positions in several iterations and updates its dynamics exploiting the reinforcement signal.
In order to understand better the organization of the spiking recurrent network, we analyze its functional properties at the population level and its dynamics at the neuron level. First, we analyze the redundant clusters found within the optimal sequences and the processing time necessary to discover them, resp. Fig 10a and 10b. In Fig 10a , we have counted the occurrence of clusters neural pairs, triplets, etc… retrieved for a long period of time during spontaneous activity with respect to their size. These clusters are not orthogonal from between each other but are combined into longer-range patterns so that their frequency is inversely proportional to their length; ordinal neural pairs and triplets are proportionally easier to be triggered and retrieved than longer clusters.
Meanwhile, the log-curve histogram and cluster coefficients indicate the hierarchical structure of the sequences, which corresponds to scale-free dynamics and small-world properties of the recurrent network [ 63 ]. Thus, the reaction time necessary to retrieve one goal depends on the problem complexity e. In a , number of clusters found within an optimal sequence with respect to their length. This histogram shows that for any optimal sequences, repetitive clusters are found, which are more often present when they are small than big; this shows some hierarchies within RNN and the property of scale-free dynamics.
In b , the average processing time necessary for the RNN to retrieve the goal dynamics. For one hundred trials, it requires in average a dozen of iterations till convergence. In Fig 10b , the reaction time depends mostly on the initial conditions of the recurrent network and of the explorative search. For solutions difficult to retrieve, the map requires an explorative search above a dizain of iterations. This variance can be compared with the density probability found in the real IPL neurons during visual search, which shows similar trends [ 52 ].
The histogram presents a log-curve distribution with two-third of the neurons with low or weak amplitude variability variance and one third of the neurons high amplitude variation, see Fig This results indicates how exploration is done, having one third of the neurons really effective for the neural map to converge to the output dynamics and to generate a spatio-temporal pattern, whereas the rest of the neurons is not.
These graphs attempts to explain how exploration is done. In a and b , during the solution-search, the two third of the neurons are rapidly placed within the optimal sequence and one third of the neurons are highly variable and can change positions up to twelve locations within the sequence. One-third of the neurons interact with each other so that weak amplitude variations in a small set of neurons is enough to interact with another subset and to control its activity. This feature has been emphasized in nonlinear mixed selective neurons [ 64 ]. In the previous section, we have investigated the control of a recurrent network by a reinforcement signal mechanism to drive its output dynamics to a desired goal as in Fig 1a.
By learning directly the inputs that produce a high-valued reinforcement signal, we can reduce the exploration phase and boot-strap at the same time the working memory dynamics to the goal trajectory. By doing so, we expect the two interacting learning systems to generate longer spatio-temporal sequences of sub-goals. This schema is assumed to be played by the Basal Ganglia, which learns rapidly simple stimulus-response rules, and the IPL-like RNN, working at a slower temporal rate [ 43 ]. As an analogy with reinforcement learning, it corresponds to learn the rewarding Q-values associated to an action [ 60 ].
In our framework, the Q-values correspond to the activity-level of the AM neurons. This optimization technique in our case can be viewed as model-based reinforcement learning [ 65 ]. The bi-directional coupling between the two systems can be done in two ways to generate longer spatio-temporal sequences: In our example, this second memory contains twenty neurons so that each neuron can trigger a specific spatio-temporal sequence of the RNN. These two ways are explained thereafter.
We propose to re-use the experiment done on arm control in section 3. We present in Fig 12 the averaged learning rate and convergence time when the BG-like associative network is exposed to several presentation of the same goals; respectively in a and b. We can observe that the average time interval required by the associative map to make the IPL map to convergence is decreasing for each exposure of the targeting goal: Sometimes however, the error level appears not related to the number of exposure as for the blue curve around iteration for example because we might be in a local minima, which makes the error correction to be slow.
Nonetheless, the recurrent network trains the associative network faster and the response time to retrieve any sequence is quicker, see Fig 12b. Without the BG network, the response level would have been slower and similar to the level found at its slowest performance as during the first exposure. In a , time duration and error rate for the IPL network to reach the assigned goals iteratively arm posture.
In b , as the associative map learns the recurrent map inputs, the convergence rate decreases on average with the number of exposure to the goals. The result is the autonomous recall in a self-organized fashion of spatio-temporal patterns by the AM BG-like neurons of the exact RNN ordinal sequence —, in our case of thirty steps,—so that when one BG neuron is activated, its corresponding sequence is observed; e.
Each neuron of the BG-like associative memory learns a stimulus-response pattern that triggers a specific spatio-temporal pattern in the RNN IPL-like working memory. Stable spatio-temporal clusters as long as 28 iterations can be retrieved. As similar to the RNN neurons, the BG-like neurons can form also spatio-temporal sequences to create longer patterns. When the same pair is activated as in Fig 13b and 13c in red and green traces, the slightly same sequences in RNN are reproduced.
The activation of these two chunks can be considered as part of one integrated sequence over an interval span of forty steps. In certain situations, when the two maps have a very stable bi-directional coupling, the coupled systems can generate even longer sequences above iterations, see Fig In this figure, the raster plots taken at two different period of time are almost aligned from each other within the black dashed lines. The associative map has generated a sequence over ten neurons.
Presentation of the amplitude dynamics of the recurrent map for a sequence length of two hundred iterations between and in a , and and in b. The amplitude level of the recurrent map is almost similar within the interval of the two dashed lines in black. Self-driven activity shown in the previous section can generate long range episodes, but can we generate even longer ones by forcing the temporal order of AM neurons activation?
This experiment differs from the previous one in the sense that we externally force the activation of BG-like neurons to fire in a specific order: This feature will not be investigated in this paper. At each retrieval of one RNN episodic memory, which can be more or less rapid, the next BG neuron is selected in the sequence when its activity level reaches a threshold value, therefore the temporal interval can fluctuate for each episode.
- Arak, lenfant loup (FICTION) (French Edition).
- Associated Data;
- Works of Frederick Arthur McKenzie.
Fig 15 presents the forced RNN spatio-temporal sequences at two different temporal intervals for the same serial order activation of the BG neurons. In this figure, the spatio-temporal patterns produced are spanning a very long interval range, over several hundred iterations, which is higher in comparison to the self-driven activity presented previously.
The comparison between the two dynamical systems shows an extreme stability to drive the RNN dynamics over long period of time, even without feedback, see the similarity measure at top. Fig 15 presents the activity control of the AM neurons at two different time intervals bottom and top charts. This result shows how the spiking order can be stabilized over long spatio-temporal patterns iterations even within a recurrent map for the generating of neural chains proper to the configuration of the RNN. The similarity measure computed above is based on a co-variation measure to detect the relative temporal displacements between patterns of the two intervals.
The Fig 16 presents this result with the density probability of the number of clusters found during the self-driven case plotted in blue using the left axis and found during the controlled case plotted in green using the right axis. Although robust, the working memory in the self-driven case present more variability and flexibility, which is more advantageous in unexpected situations.
Besides, the external control of the associative map green line limits strongly the variability of RNN dynamics and induces the reproduction of long-range spatio-temporal sequences as noise is reduced. We counted the number of temporal sequences found over time and we computed their probability distribution with respect to their length. In the self-driven condition in blue, left axis as done in section 3. In the controlled condition in green, right axis as done in section 3. We propose a framework based on a coupled recurrent spiking neuronal system that achieves to perform long sequential planning by controlling the amplitude level of the spiking neurons through reinforcement signals.
The control done is weak so that the propagated reinforced signals let the working memory plastic enough to converge to the desired internal states from various trajectories. Used in a robotic simulation, the neural dynamics can drive a three d. To this respect, our framework embodies some aspects of the free-energy optimization principle proposed by [ 24 ] as an optimization technique and some aspects of chaos control of neural dynamics, like chaos itinerancy [ 6 , 62 , 67 ], in which small feeded back perturbations can give rise to big amplitude variations and permit to go from one memory to another [ 68 ] [ 69 ].
It shows also the importance of slow dynamics that persist for a long period of time, which links to critical slowing that is a necessary aspect of free energy minimisation—and links usefully to self-organised criticality [ 70 — 74 ]. The free energy optimization process has been proposed to drive flexible neural dynamics in a seemingly coherent manner following the Bayesian paradigm [ 21 , 29 ]. The functioning of our architecture is partially similar also to recent proposals for sequence generation by [ 9 ] and [ 10 ], reservoir computing and echo states methods by [ 44 , 45 , 75 ] and to DrSAE model used for classification where auto-encoders iterate a recurrent map using gradient descent [ 50 ].
These orders are empirical, however, adding more AM neurons should highly increase the length of RNN sequences produced and the number of possible combinations. As there is evidence that suggests that although single actions can be selected without basal ganglia involvement, chains of actions seem to require the basal ganglia [ 77 ]. The BG with the parietal cortex are found both complementary for action planning [ 41 , 45 , 78 , 79 ], motor simulation [ 66 ] and thought generation [ 17 ].
The parietal cortex, involved in implementing complex predictive models as multi-step state-action-state maps model-based RL , and BG model-free RL form a cooperative system driving online behavior [ 15 , 18 , 80 ]. The numerical limit to subsume new memory maps, one layer at the top of another, is not clear in our model but a third complementary memory, the PFC, can play this role by learning and directing the BG sequences at a higher-level, see Fig This architecture can be replicated hierarchically in INFERNO with many maps inter-connected through continuous feedback control with top-down and bottom-up dynamics [ 21 , 29 , 81 ].
In a , INFERNO generates, selects and stores a set of rules to assemble dynamically a neuronal sequence from a reservoir of dynamics toward a desired goal and based on free-energy minimization. It has some similarities with a Turing machine that has a table of instructions, Write and Read heads to generate a code from an infinite tape. We super-impose with different colors the clusters of four optimal trajectories found in Fig 8.
In our model, we have limited the function of PFC to provide one goal at a time so that AM sequences can be formed dynamically in a self-organized fashion along with RNN, see section 3. Learning this temporal sequence by a top layer can permit to generate an even longer plan execution as done in section 3. While the IPL working memory provides, stores, and manipulates representations; the basal ganglia model maps current states to courses of action [ 83 ]. BG can serve for selection of complex, sequenced actions at the cortical map level [ 13 ]. Thus, it can be interpreted as a repertoire of if-then rules or a set of stimulus-response associations to select appropriate cortical chains.
This kind of cortical architecture has been emphasized to be used possibly for multi-step computation; i. Making an analogy with Turing machines, we can see AM as an instruction table, its operations as the injected inputs into RNN, RNN as the infinite tape and their respective neural activity as symbols and states, see Fig 17a. These meta-heuristics are optimization techniques that let the recurrent spiking neural network converge to specific trajectories with some flexibility, see the schema in Fig 17 , which are directly taken from the trajectories found in Fig 8.
On the one hand, all the trajectories derive from the spatio-temporal primitives learned by the RNN. On the other hand, they are assembled flexibly to reach one goal. Therefore, for each specific goal, the trajectories found in each structure possess roughly the same structure and prototype global coherence , see Fig 17a while the structure within each sub-cluster is however different internal variability , see Fig 17b.
This shows the capabilities of the RNN to produce hierarchical plans and tree structures, which are found important for human language and cognition [ 1 , 87 , 88 ]. Our optimization technique is based on the control of the sub-threshold activity of the neurons. We propose that this mechanism can be one candidate for flexible neural coordination, along with phase synchrony and spike timing-dependent plasticity.
Iterative free-energy optimization for recurrent neural networks (INFERNO)
For instance, sub-threshold activity optimization is similar to the phenomenon known as gain-modulation [ 89 , 90 ]. This mechanism describes how the activity level of gain-field neurons can be modulated by the amplitude-level of several neurons sensitive to different variables, which is therefore interesting for neural control [ 91 ] and context switch [ 11 ].
Gain-modulation is found important for the neural processing in the parieto-motor cortices [ 92 ] and may provide a hint on how generative causal chains are formed in a neural population for planning in PFC as proposed by [ 1 ]. Gain-modulation has been proposed recently to control the amplitude-level of a neural population its local field potential.
It conveys contextual information in a complex form of propagated neural activity; a mechanism coined as nonlinear mixed selectivity [ 64 ]. Furthermore, Botvinick and Watanabe proposed a prefrontal model based on gain-field neurons showing their ability to recall serial order information [ 90 ].