The GCM model

Author

Šimon Kucharský

Published

May 6, 2025

Code
import os

if "KERAS_BACKEND" not in os.environ:
    # set this to "torch", "tensorflow", or "jax"
    os.environ["KERAS_BACKEND"] = "jax"

import matplotlib.pyplot as plt
import numpy as np

import bayesflow as bf
import json

The basic model has three parameters - generalization (c), attention weight (w), and bias (b). In this example, we assume no bias between the categories, so b will be fixed to \(\frac{1}{2}\).

\[\begin{equation} \begin{aligned} c & \sim \text{Exponential}(0.5) \\ w & \sim \text{Uniform}(0, 1) \\ b & \leftarrow \frac{1}{2} \\ s_{ij} & \leftarrow \exp\left[- c \left(w d_{ij}^{(1)} + (1-w) d_{ij}^{(2)}\right)\right] \\ r_i & \leftarrow \frac{b \sum_j a_j s_{ij}}{b \sum_j a_j s_{ij} + (1-b) \sum_j (1-a_j) s_{ij}} \\ y_i & \sim \text{Binomial}(r_i, t) \end{aligned} \end{equation}\]

Data

We use the category learning data from the “Condensation B” condition reported by Kruschke (1993).

Code
with open(os.path.join("data", "KruschkeData.json")) as f:
    data = json.load(f)

data.keys()
dict_keys(['a', 'd1', 'd2', 'n', 'nstim', 'nsubj', 'y'])

The element y contains the array of responses \(y_{ik}\) for the trial \(i \in \{1, \dots, 8\}\) and participant \(k \in \{1, \dots, 40\}\).

Code
np.array(data['y']).shape
(8, 40)

d1 and d2 contain the physical distances between the 8 stimuli on the along the two dimensions, such that

\[\begin{equation} d_{ij}^{(k)} = | p_{ik} - p_{jk} | \end{equation}\]

The indicator vector a contains information whether the stimulus \(a_i\) is an element from Category A or an element from Category B.

Here we will assume that the design of the experiment (i.e., number of trials, number of stimuli, number of participants, nature of the stimuli) is fixed. That is, we will only amortize over the observation outcomes.

Simulator

Code
def prior():
    return dict(
        c = np.random.exponential(scale=1),
        w = np.random.uniform(low=0, high=1)
    )

def likelihood(c, w, n=data['n'], nstim=data['nstim'], nsubj=data['nsubj'], d1=np.array(data['d1']), d2=np.array(data['d2']), a=np.array(data['a'])):
    b = 0.5

    s = np.exp( - c * (w * d1 + (1-w) * d2))

    r = np.zeros((nsubj, nstim))

    for i in range(nstim):
        isA = a == 1
        isB = a != 1

        numerator = b * np.sum(s[i,isA])
        denominator = numerator + (1-b) * np.sum(s[i,isB])

        r[:, i] = numerator / denominator

    y = np.random.binomial(n=n, p=r, size=(nsubj, nstim))

    return dict(y=y)

simulator = bf.make_simulator([prior, likelihood])

Approximator

Code
adapter = (bf.Adapter()
    .constrain("c", lower=0)
    .constrain("w", lower=0, upper=1)
    .concatenate(["c", "w"], into="inference_variables")
    .rename("y", "summary_variables")
)
Code
workflow=bf.BasicWorkflow(
    simulator=simulator,
    adapter=adapter,
    inference_network=bf.networks.CouplingFlow(),
    summary_network=bf.networks.DeepSet(),
    initial_learning_rate=1e-3
)

Training

Code
history=workflow.fit_online(batch_size=256)

Validation

Code
test_data = simulator.sample(1000)
Code
figs = workflow.plot_default_diagnostics(test_data=test_data, num_samples=500)

Inference

Here we apply the networks to the data from Kruschke (1993) as reported by Lee & Wagenmakers (2013). The quantity of interest is the joint posterior distribution of parameters \(c\) and \(w\).

Code
inference_data = dict(y = np.array(data['y']).transpose()[np.newaxis,...])
Code
posterior_samples = workflow.sample(num_samples=2000, conditions=inference_data)
Code
plt.scatter(
    x = posterior_samples['c'],
    y = posterior_samples['w'],
    alpha=0.05
)
plt.xlim([0, 5])
plt.ylim([0, 1])
plt.xlabel("Generalization (c)")
f=plt.ylabel("Attention Weight (w)")

References

Kruschke, J. K. (1993). Human category learning: Implications for backpropagation models. Connection Science, 5(1), 3–36.
Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

Reuse