MARBLE: interpretable representations of neural population dynamics using geometric deep learning

MARBLE is a representation learning framework that works by decomposing a set of vector fields, representing possibly different dynamical systems or the same dynamical system under different latent parameters, into LFFs and then learning a similarity-preserving mapping from LFFs to a shared latent space. In latent space, the dynamical systems can be compared even if they were originally measured differently.

Table of Contents

Mathematical setup

As input, MARBLE takes a set of discrete vector fields F_c = supported over the point cloud X_c = (x₁(c), …, x_n(c)), which describe a set of smooth, compact m-dimensional submanifolds $ _$ of the state space $ ^$. Here, c indicates an experimental condition or a dynamical system, $_ (c)\in ^$ is a vector signal anchored at a point ${ }_(c)\in {{\mathbb }}^ $ and represented in absolute (world) coordinates in neural space or some d-dimensional representation space that preserves the continuity of the data. Given time series x(t; c), the vectors f_i(c) can be obtained, for example, by taking first-order finite differences f_i(c) ≔ x(t + 1; c) − x(t; c). Given this input, MARBLE jointly represents the vector fields as empirical distributions $P({{\bf}}_ )=\mathop{\sum }\nolimits_{i = 0}^{n}\delta ({{\bf{z}}}_{i}(c))$ of latent vectors Z_c = (z₁(c), …, z_n(c)). The mapping F_c ↦ Z_c is point-by-point and local in the sense that $\{{{\bf{f}}}_{j};\,j\in i\cup {\mathcal{N}}(i)\}\mapsto {{\bf{z}}}_{i}$, where ${\mathcal{N}}(i)$ represents the neighborhood of i. This locality means that it suffices to describe MARBLE applied to a single vector field. We will thus drop the subscript c below for concise notation. The generalization of the method to the joint latent representation of multiple vector fields is immediate.

We now introduce an unsupervised geometric deep learning architecture for performing this embedding. The pseudocode for the MARBLE algorithm can be found in Supplementary Note 1.

Data subsampling

We first subsample the data to ensure that LFFs are not overrepresented in any given region of the vector field due to sampling bias. We use farthest point sampling, a well-established technique in processing point clouds⁵⁶ that controls the spacing of the points relative to the diameter of the manifold $\mathop{\max }\limits_{ij}(| | {{\bf{x}}}_{i}-{{\bf{x}}}_{j}| {| }_{2}) < \alpha \,\,\text{diam}\,({\mathcal{M}})$, where α ∈ [0, 1] is a parameter setting the spacing with 0 being no subsampling.

Approximating the manifold by a proximity graph

We define locality on ${\mathcal{M}}$ by fitting a proximity graph to the point cloud X. We use the continuous k-nearest neighbor (ck-NN) algorithm⁵⁷, which, contrary to the classical k-NN graph algorithm, can be interpreted as a local kernel density estimate and accounts for sampling density variations over ${\mathcal{M}}$. The ck-NN algorithm connects i and j whenever $| | {{\bf{x}}}_{i}-{{\bf{x}}}_{j}| {| }_{2}^{2} < \delta | | {{\bf{x}}}_{i}-{{\bf{x}}}_{u}| {| }_{2}\,| | {{\bf{x}}}_{j}-{{\bf{x}}}_{v}| {| }_{2}$, where u, v are the k-th nearest neighbors of i, j, respectively, and ∣∣ ⋅ ∣∣₂ is the Euclidean norm. The scaling parameter δ can be used to control the number of nearest neighbors and thus the size of the neighborhood.

This proximity graph endows ${\mathcal{M}}$ with a geodesic structure (for any $i,j\in {\mathcal{M}}$), there is a shortest path with distance d(i, j). We can then define the LFF at i as the p-hop geodesic neighborhood ${\mathcal{N}}(i,p)$.

Parametrizing the tangent spaces

To define trainable convolution filters over ${\mathcal{M}}$, we define tangent spaces ${{\mathcal{T}}}_{i}{\mathcal{M}}$ at each point i over the manifold that linearly approximate ${\mathcal{M}}$ within a neighborhood. Specifically, we assume that the tangent space at a point i, ${{\mathcal{T}}}_{i}{\mathcal{M}}$, is spanned by the edge vectors ${{\bf{e}}}_{ij}\in {{\mathbb{R}}}^{d}$ pointing from i to K nodes j in its neighborhood on the proximity graph. We pick K > deg(i) closest nodes to i on the proximity graph where K is a hyperparameter. Larger K increases the overlaps between the nearby tangent spaces and we find that $K=1.5| {\mathcal{N}}(i,1)|$ is often a good compromise between locality and robustness to noise of the tangent space approximation. The m largest singular values ${{\bf{t}}}_{i}^{(\cdot )}\in {{\mathbb{R}}}^{d}$ of the matrix formed by column-stacking e_ij yield the orthonormal basis

$${{\mathbb{T}}}_{i}\in {{\mathbb{R}}}^{d\times m}=\left({{\bf{t}}}_{i}^{(1)},\ldots {{\bf{t}}}_{i}^{(m)}\right)$$

(1)

spanning ${{\mathcal{T}}}_{i}{\mathcal{M}}$. As a result, ${{\mathbb{T}}}_{i}^{T}{{\bf{f}}}_{i}$ acts as a projection of the signal to the tangent space in the ℓ₂ sense. We perform these computations using a modified parallel transport unfolding package⁵⁸. We illustrate the computed frames on a spherical manifold (Supplementary Fig. 1a–c).

Connections between tangent spaces

Having the local frames, we next define the parallel transport map ${{\mathcal{P}}}_{j\to i}$ aligning the local frame at j to that at i, which is necessary to define convolution operations in a common space (Supplementary Fig. 1d). While parallel transport is generally path dependent, we assume that adjacent nodes i, j are close enough to consider the unique smallest rotation, known as the Lévy–Civita connection. Thus, for adjacent edges, ${{\mathcal{P}}}_{j\to i}$ can be computed as the matrix O_ji corresponding to ${{\mathcal{P}}}_{j\to i}$, as the orthogonal transformation (rotation and reflection)

$${{\bf{O}}}_{ji}=\arg \mathop{\min }\limits_{{\bf{O}}\in O(m)}| | {{\mathbb{T}}}_{i}-{{\mathbb{T}}}_{j}{\bf{O}}| {| }_{F},$$

(2)

where ∣∣ ⋅ ∣∣_F is the Frobenius norm. The unique solution (up to a change of orientation) to equation (2) is found by the Kabsch algorithm⁵⁹. See Supplementary Note 2 for further details.

Vector diffusion

We use a vector diffusion layer to denoise the vector field (Fig. 1c), a generalization of the scalar (heat) diffusion, which can be expressed as a kernel associated with the equation⁶⁰

$${\rm{vec}}({\bf{F}}(\tau ))={e}^{-\tau {\mathcal{L}}}{\rm{vec}}({\bf{F}}).$$

(3)

Here, $\,\text{vec}\,({\bf{F}})\in {{\mathbb{R}}}^{nm\times 1}$ the row-wise concatenation of vector-valued signals, τ is a learnable parameter that controls the scale of the LFFs and ${\mathcal{L}}$ is the random-walk normalized connection Laplacian defined as a block matrix whose nonzero blocks are given by

$${\mathcal{L}}(i,j)=\left\{\begin{array}{ll}{{\bf{I}}}_{m\times m}\quad &{\rm{for}}\,i=j\\ -\,\text{deg}\,{(i)}^{-1}{{\bf{O}}}_{ij}\quad &{\rm{for}}\,j\in {\mathcal{N}}(i,1).\end{array}\right.$$

(4)

See ref. ⁶¹ for further details on vector diffusion. The advantage of vector diffusion over scalar (heat) diffusion is that it uses a notion of smoothness over the manifold defined by parallel transport and thus preserves the fixed point structure. The learnable parameter τ balances the expressivity of the latent representations with the smoothness of the vector field.

Approximating local flow fields

We now define convolution kernels on ${\mathcal{M}}$ that act on the vector field to represent the vector field variation within LFFs. We first project the vector signal to the manifold ${{\bf{f}}}_{i}^{{\prime} }={{\mathbb{T}}}_{i}^{T}{{\bf{f}}}_{i}$. This reduces the dimension of f_i from d to m without loss of information as f_i was already in the tangent space. We drop the bar in the sequel to understand that all vectors are expressed in local coordinates. In this local frame, the best polynomial approximation of the vector field around i is given by the Taylor-series expansion of each component f_i,l of f_i

$${f}_{j,l}\approx {f}_{i,l}+\nabla {f}_{i,l}({{\bf{x}}}_{j}-{{\bf{x}}}_{i})+\frac{1}{2}{({{\bf{x}}}_{j}-{{\bf{x}}}_{i})}^{T}{\nabla }^{2}{f}_{i,l}({{\bf{x}}}_{j}-{{\bf{x}}}_{i})+\ldots .$$

(5)

We construct gradient filters to numerically approximate the gradient operators of increasing order in the Taylor expansion (see Supplementary Note 3 for details). In brief, we implement the first-order gradient operator as a set of m directional derivative filters $\{{{\mathcal{D}}}^{(q)}\}$ acting along unit directions $\{{{\bf{t}}}_{i}^{(q)}\}$ of the local coordinate frame,

$$\nabla {f}_{i,l}\approx {\left({{\mathcal{D}}}^{(1)}(\,{f}_{i,l}),\ldots ,{{\mathcal{D}}}^{(m)}(\,{f}_{i,l})\right)}^{T}.$$

(6)

The directional derivative, ${{\mathcal{D}}}^{(q)}(\,{f}_{i,l})$ is the l-th component of

$${{\mathcal{D}}}^{(q)}({{\bf{f}}}_{i})=\mathop{\sum }\limits_{j=1}^{n}{{\mathcal{K}}}_{j}^{(i,q)}{{\mathcal{P}}}_{j\to i}({{\bf{f}}}_{j}),$$

(7)

where ${{\mathcal{P}}}_{j\to i}={{\bf{O}}}_{ij}$ is the parallel transport operator that takes the vector f_j from the adjacent frame j to a common frame at i. ${{\mathcal{K}}}^{(i,q)}\in {{\mathbb{R}}}^{n\times n}$ is a directional derivative filter³⁹ expressed in local coordinates at i and acting along ${{\bf{t}}}_{i}^{(q)}$. See Supplementary Note 3 for details on the construction of the directional derivative filter. As a result of the parallel transport, the value of equation (7) is independent of the local curvature of the manifold.

Following this construction, the p-th order gradient operators can be defined by the iterated application of equation (6), which aggregates information in the p-hop neighborhood of points. Although increasing the order of the differential operators increases the expressiveness of the network (Supplementary Fig. 3), second-order filters (p = 2) were sufficient for the application considered in this paper.

The expansion in equation (5) suggests augmenting the vectors f_i by the derivatives (equation (6)), to obtain a matrix

$${{\bf{f}}}_{i}\mapsto {{\bf{f}}}_{i}^{\,{\mathcal{D}}}=\left({{\bf{f}}}_{i},\nabla {f}_{i,1},\ldots ,\nabla {f}_{i,m},\nabla {(\nabla {f}_{i,1})}_{1},\ldots ,\nabla {(\nabla {f}_{i,m})}_{m}\right),$$

(8)

of dimensions m × c whose columns are gradients of signal components up to order p to give a total of c = (1 − m^p+1)/(m(1 − m)) vectorial channels.

Inner product for embedding invariance

Deformations on the manifold have the effect of introducing rotations into the LFFs. In embedding-agnostic mode, we can achieve invariance to these deformations by making the learned features rotation-invariant. We do so by first transforming the m × c matrix ${{\bf{f}}}_{i}^{{\mathcal{D}}}$ to a 1 × c vector as

$${{\bf{f}}}_{i}^{\,{\mathcal{D}}}\mapsto {{\bf{f}}}_{i}^{\,{\mathsf{ip}}}=\left({{\mathcal{E}}}^{(1)}({{\bf{f}}}_{i}^{\,{\mathcal{D}}}),\ldots ,{{\mathcal{E}}}^{(c)}({{\bf{f}}}_{i}^{\,{\mathcal{D}}})\right).$$

(9)

Then, by taking for each channel the inner product against all other channels, weighted by a dense learnable matrix ${{\bf{A}}}^{(r)}\in {{\mathbb{R}}}^{m\times m}$ and summing, we obtain

$${{\mathcal{E}}}^{(r)}({{\bf{f}}}_{i}^{\,{\mathcal{D}}})={{\mathcal{E}}}^{(r)}\left({{\bf{f}}}_{i}^{\,{\mathcal{D}}};\,{{\bf{A}}}^{(r)}\right):= \mathop{\sum }\limits_{s=1}^{c}\left\langle {{\bf{f}}}_{i}^{\,{\mathcal{D}}}(\cdot ,r),{{\bf{A}}}^{(r)}{{\bf{f}}}_{i}^{\,{\mathcal{D}}}(\cdot ,s)\right\rangle ,$$

(10)

for r = 1, …, c (Fig. 1f). Taking inner products is valid because the columns of ${{\bf{f}}}_{i}^{\,{\mathcal{D}}}$ all live in the tangent space at i. Intuitively, equation (10) achieves coordinate independence by learning rotation and scaling relationships between pairs of channels.

Latent space embedding with a multilayer perceptron

To embed each local feature, ${{\bf{f}}}_{i}^{\,{\mathsf{ip}}}$ or ${{\bf{f}}}_{i}^{\,{\mathcal{D}}}$, depending on if inner product features are used (equation (9)) we use a multilayer perception (Fig. 1g)

$${{\bf{z}}}_{i}={\rm{MLP}}\left({{\bf{f}}}_{i}^{\,{\mathsf{ip}}};\,\omega\right),$$

(11)

where ω are trainable weights. The multilayer perceptron is composed of L linear (fully connected) layers interspersed by rectified linear unit (ReLU) nonlinearities. We used L = 2 with a sufficiently high output dimension to encode the variables of interest. The parameters were initialized using the Kaiming method⁶².

Loss function

Unsupervised training of the network is possible due to the continuity in the vector field over ${\mathcal{M}}$, which causes nearby LFFs to be more similar than distant ones. We implement this via negative sampling⁴⁰, which uses random walks sampled at each node to embed neighboring points on the manifold close together while pushing points sampled uniformly at random far away. We use the following unsupervised loss function⁴⁰

$${\mathcal{J}}({\bf{Z}})=-\log\left(\sigma\left({{\bf{z}}}_{i}^{T}{{\bf{z}}}_{j}\right)\right)-Q{{\mathbb{E}}}_{k \sim U(n)}\log\left(\sigma\left(-{{\bf{z}}}_{i}^{T}{{\bf{z}}}_{k}\right)\right),$$

(12)

where $\sigma (x)={(1+{e}^{-x})}^{-1}$ is the sigmoid function and U(n) is the uniform distribution over the n nodes. To compute this function, we sample one-step random walks from every node i to obtain ‘positive’ node samples for which we expect similar LFFs to that at node i. The first term in equation (12) seeks to embed these nodes close together. At the same time, we also sample nodes uniformly at random to obtain ‘negative’ node samples with likely different LFFs from that of node i. The second term in equation (12) seeks to embed these nodes far away. We also choose Q = 1.

We optimize the loss equation (12) by stochastic gradient descent. For training, the nodes from all manifolds were randomly split into training (80%), validation (10%) and test (10%) sets. The optimizer was run until convergence of the validation set and the final results were tested on the test set with the optimized parameters.

Distance between latent representations

To test whether shifts in the statistical representation of the dynamical system can predict global phenomena in the dynamics, we define a similarity metric between pairs of vector fields F₁, F₂ with respect to their corresponding latent vectors ${{\bf{Z}}}_{1}=({{\bf{z}}}_{1,1},\ldots ,{{\bf{z}}}_{{n}_{1},1})$ and ${{\bf{Z}}}_{2}=({{\bf{z}}}_{1,1},\ldots ,{{\bf{z}}}_{{n}_{2},1})$. We use the optimal transport distance between the empirical distributions ${P}_{1}=\mathop{\sum }\nolimits_{i}^{{n}_{1}}\delta ({{\bf{z}}}_{i,1}),{P}_{2}=\mathop{\sum }\nolimits_{i}^{{n}_{2}}\delta ({{\bf{z}}}_{i,2})$

$$d({P}_{1},{P}_{2})=\mathop{\min }\limits_{\gamma }\sum _{uv}{\gamma }_{uv}| | {{\bf{z}}}_{u,1}-{{\bf{z}}}_{v,2}| {| }_{2}^{2},$$

(13)

where γ is the transport plan, a joint probability distribution subject to marginality constraints that ∑_uγ_uv = P₂, ∑_vγ_uv = P₁ and ∣∣ ⋅ ∣∣₂ is the Euclidean distance.

Further details on the implementation of case studies

Below we detail the implementation of the case studies. See Supplementary Table 3 for the training hyperparameters. In each case, we repeated training five times and confirmed that the results were reproducible.

Van der Pol

We used the following equations to simulate the Van der Pol system:

$$\begin{array}{l}\dot{x}=y\\ \dot{y}=\mu (1-{x}^{2})y-x,\end{array}$$

(14)

parametrized by μ. If μ = 0, the system reduces to the harmonic oscillator; if μ < 0, the system is unstable, and if μ > 0, the system is stable and converges to a limit cycle. In addition, we map this two-dimensional system to a paraboloid as with the map

$$\begin{array}{rcl}x,y\mapsto x,y,z&=&{\rm{parab}}(x,y)\\ \dot{x},\dot{y}\mapsto \dot{x},\dot{y},\dot{z}&=&{\rm{parab}}(x+\dot{x},y+\dot{y})-{\rm{parab}}(x,y),\end{array}$$

where parab(x, y) = −(αx)² − (αy)².

We sought to distinguish on-manifold dynamical variation due to μ while being agnostic to geometric variations due to α. As conditions, we increased μ from −1, which first caused a continuous deformation in the limit cycle from asymmetric (corresponding to slow–fast dynamics) to circular and then an abrupt change in crossing the Hopf bifurcation at zero (Fig. 1d).

We trained MARBLE in both embedding-aware and -agnostic modes by forming distinct manifolds from the flow field samples at different values of μ and manifold curvature. Both the curvature and the sampling of the vector field differed across manifolds.

Low-rank RNNs

We consider low-rank RNNs composed of n = 500 rate units in which the activation of the i-th unit is given by

$$\tau \frac{d{x}_{i}}{dt}=-{x}_{i}+\mathop{\sum }\limits_{j=1}^{N}{J}_{ij}\phi ({x}_{j})+{\tilde{u}}_{i}(t)+{\eta }_{i}(t),\quad {x}_{i}(0)=0,$$

(15)

where τ = 100 ms is a time constant, $\phi ({x}_{i})=\tanh ({x}_{i})$ is the firing rate, J_ij is the rank-R connectivity matrix, u_i(t) is an input stimulus and η_i(t) is a white noise process with zero mean and s.d. of 3 × 10⁻². The connectivity matrix can be expressed as

$${\bf{J}}=\frac{1}{N}\mathop{\sum }\limits_{r=1}^{R}{{\bf{m}}}_{r}{{\bf{n}}}_{r}^{T},$$

(16)

for vector pairs (m_r, n_r). For the delayed-match-to-sample task, the input is of the form

$${\tilde{u}}_{i}(t)={w}_{1i}^{{\mathsf{in}}}{u}_{1}(t)+{w}_{2i}^{{\mathsf{in}}}{u}_{2}(t),$$

(17)

where w_1i, w_2i are coefficients controlling the weight of inputs u₁, u₂ into node i. Finally, the network firing rates are read out to the output as

$$o(t)=\mathop{\sum }\limits_{i=0}^{N}{w}_{i}^{{\mathsf{out}}}\phi ({x}_{i}).$$

(18)

To train the networks, we followed ref. ⁴⁷. The experiments consisted of five epochs; a fixation period of 100–500 ms chosen uniformly at random, a 500-ms stimulus period, a delay period of 500–3,000 ms chosen uniformly at random, a 500-ms stimulus period and a 1,000-ms decision period. During training, the networks were subjected to two inputs, whose magnitude (the gain) was positive during stimulus and zero otherwise. We used the following loss function:

$${\mathcal{L}}=| o(T)-\hat{o}(T)| ,$$

(19)

where T is the length of the trial and $\hat{o}(T)=1$ when both stimuli were present and −1 otherwise. Coefficient vectors were initially drawn from a zero mean, unit s.d. Gaussian and then optimized. For training, we used the ADAM optimizer⁶³ with hyperparameters shown in Supplementary Tables 2 and 3.

See the main text for details on how MARBLE networks were trained.

Macaque center-out arm-reaching

We used single-neuron spike train data as published in ref. ³⁰ recorded using linear multielectrode arrays (V-Probe, 24-channel linear probes) from rhesus macaque motor (M1) and dorsal premotor cortices. See ref. ³⁰ for further details. In brief, each trial began with the hand at the center. After a variable delay, one target, 10 cm from the center position, was highlighted, indicating the GO cue. We analyzed the 700-ms period after the go consisting of a delay followed by the reach. A total of 44 consecutive experimental sessions with a variable number of trials were considered.

We extracted the spike trains using the neo package in Python ( and converted them into rates using a Gaussian kernel with a s.d. of 100 ms. We subsampled the rates at 20-ms intervals using the elephant package⁶⁴ to match the sampling frequency in the decoded kinematics in ref. ³⁰. Finally, we used PCA to reduce the dimension from 24-channels to five. In Supplementary Fig. 4, we analyze the sensitivity to preprocessing hyperparameters, showing that our results remain stable for a broad range of Gaussian kernel scales. We also note that decoding accuracy marginally increased for seven and ten principal components at the cost of slower training time.

We trained MARBLE in the embedding-aware mode for each session, treating movement conditions as individual manifolds, which we embedded into a shared latent space.

We benchmarked MARBLE against LFADS, CEBRA (Fig. 4), TDR (Extended Data Fig. 8), PCA, UMAP and t-SNE (Extended Data Fig. 9). For LFADS, we took the trained models directly from the authors. For CEBRA, we used the reach directions as labels and trained a supervised model until convergence. We obtained the best results with an initial learning rate of 0.01, Euclidean norm as metric, number of iterations 10,000 and fixed temperature 1. For TDR, we followed the procedure in ref. ²². We represented the condition-averaged firing rate for neuron i at time t as

$${r}_{i,c}(t)={\beta }_{i,0}(t)+{\beta }_{i,x}(t){c}_{x}+{\beta }_{i,y}(t){c}_{y}$$

(20)

where (c_x, c_y) denotes the regressors (the final direction in the physical space of the reaches), and β are corresponding time-dependent coefficients to be determined. For the seven reach directions, the regressors (c_x, c_y) takes the following values: ‘DownLeft’ (−1, −1), ‘Left’ (−1, 0), ‘UpLeft’ (−1, 1), ‘Up’ (0, 1), ‘UpRight’ (1, 1), ‘Right’ (1, 0) and ‘DownRight’ (1, −1). To estimate (c_x, c_y), we constructed the following matrix M with shape conditions× regressors. All but the last column of M contained the condition × condition values of one of the regressors. The last column consisted only of ones to estimate β₀. Given the conditions × neurons matrix of neural firing rates R and conditions × regressors matrix M, the regression model can be written as:

$$R(t)=M[{\beta }_{i,0},\,{\beta }_{i,x},\,{\beta }_{i,y}].$$

(21)

We perform least squares projection to estimate the regression coefficients

$$[{\beta }_{i,0},\,{\beta }_{i,x},\,{\beta }_{i,y}]={\left({M}^{T}M\,\right)}^{-1}{M}^{T}R.$$

(22)

We projected neural data into the regression subspace by multiplying the pseudoinverse of the coefficient matrix with the neural data matrix R. For decoding, we used the estimated regression coefficients to project single-trial firing rates to the appropriate condition-dependent subspaces.

To decode the hand kinematics, we used optimal linear estimation to decode the x and y reaching coordinates and velocities from the latent representation, as in ref. ³⁰. To assess the accuracy of decoded movements, we computed the tenfold cross-validated goodness of fit (R²) between the decoded and measured velocities for x and y before taking the mean across them. We also trained a support vector machine classifier (regularization of 1.0 with a radial basis function) on the measured kinematics against the condition labels.

Linear maze navigation

We used single-neuron spiking data from ref. ⁵⁰, where we refer the reader for experimental details. In brief, neural activity was recorded in CA1 pyramidal single units using two 8- or 6-shank silicon probes in the hippocampus in four rats while walking in alternating directions in a 1.6-m linear track. Each recording had between 48–120 recorded neurons.

We extracted the spike trains using the neo package ( and converted them into rates using a Gaussian kernel with an s.d. of 10 ms. We fitted a PCA across all data for a given animal to reduce the variable dimension to five. In Supplementary Fig. 5, we analyze the effect of varying preprocessing hyperparameters, showing that our results remain stable for a broad range of Gaussian kernel scales and principal components.

We trained MARBLE in embedding-aware mode separately on each animal. The output dimensions were matched to that of the benchmark models.

For benchmarking, we took the CEBRA models unchanged from the publicly available notebooks. To decode the position, we used a 32-dimensional output and 10,000 iterations as per their demo notebook ( To decode the rat’s position from the neural trajectories, we fit a k-NN decoder with 36 neighbors and a cosine distance metric. To assess the decoding accuracy, we computed the mean absolute error between the predicted and true rat positions.

For consistency between animals, we used a three-dimensional (E = 3) output and 15,000 iterations, as per the notebook We aligned the latent representations across animals post hoc using Procrustes analysis. We fit a linear decoder to the aligned latent representations between pairs of animals: one animal as the source (independent variables) and another as the target (dependent variables). The R² from the fitted model describes the variance in the latent representation of one animal that can be explained by another animal (a measure of consistency between their latent representations).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link

MARBLE: interpretable representations of neural population dynamics using geometric deep learning

Mathematical setup

Data subsampling

Approximating the manifold by a proximity graph

Parametrizing the tangent spaces

Connections between tangent spaces

Vector diffusion

Approximating local flow fields

Inner product for embedding invariance

Latent space embedding with a multilayer perceptron

Loss function

Distance between latent representations

Further details on the implementation of case studies

Van der Pol

Low-rank RNNs

Macaque center-out arm-reaching

Linear maze navigation

Reporting summary

‘Playful’ teaching gaining credibility, say Lego researchers

Machine Learning Boosts Heat Equation Solutions

Professor Creates Unique Social Work Education Resource Guide

Leave a Reply Cancel reply

Penn implements changes to study abroad application process following student feedback

PTDF screens 5,885 candidates for master’s, PhD scholarships to study abroad

Would you study abroad if social media didn’t exist?

New survey data says demand for MBA study abroad is shifting this year – ICEF Monitor

Beyond Big Four: Indian students are dumping default study abroad settings

Mathematical setup

Data subsampling

Approximating the manifold by a proximity graph

Parametrizing the tangent spaces

Connections between tangent spaces

Vector diffusion

Approximating local flow fields

Inner product for embedding invariance

Latent space embedding with a multilayer perceptron

Loss function

Distance between latent representations

Further details on the implementation of case studies

Van der Pol

Low-rank RNNs

Macaque center-out arm-reaching

Linear maze navigation

Reporting summary

More Stories

Leave a Reply Cancel reply

You may have missed