Censored data

Author

Šimon Kucharský

Published

May 6, 2025

Code

import os
if "KERAS_BACKEND" not in os.environ:
    # set this to "torch", "tensorflow", or "jax"
    os.environ["KERAS_BACKEND"] = "jax"

import numpy as np
import bayesflow as bf
import matplotlib.pyplot as plt

INFO:bayesflow:Using backend 'jax'

A person attempts an exam that consists of 50 four-choice questions. In order to pass the exam, one must answer at least 30 correctly. If the person fails, they retake the exam. Once they pass the exam, we obtain only the following information: how many attempts they took, what was their score on the final, successfull attempt, and what was the range of their scores on the attempts that came before.

The goal is to estimate the probability \(\theta\) of answering any of the test questions correctly.

Simulator

Code

def prior():
    return dict(
        theta = np.random.uniform(low=0.25, high=1)
    )

def summary(scores):
    attempts = len(scores)
    if attempts == 1:
        return dict(
            attempts=attempts,
            score=scores[0],
            min=-1,
            max=-1
        )
    else:
        return dict(
            attempts=attempts,
            score=scores[-1],
            min=np.min(scores[:-1]),
            max=np.max(scores[:-1])
        )

def likelihood(theta, n_questions=50, min_correct=30):
    scores = []
    score = 0
    while score < min_correct:
        score = np.random.binomial(n=n_questions, p=theta)
        scores.append(score)
    scores = np.array(scores)

    return summary(scores)

simulator = bf.make_simulator([prior, likelihood])

Approximator

Code

adapter = (
    bf.Adapter()
    .constrain("theta", lower=0.25, upper=1)
    .rename("theta", "inference_variables")
    .concatenate(["attempts", "score", "min", "max"], into="inference_conditions")
)

Code

workflow=bf.BasicWorkflow(
    simulator=simulator,
    adapter=adapter,
    inference_network=bf.networks.CouplingFlow()
)

Training

Code

train_data = simulator.sample(1000)
val_data = simulator.sample(100)

Code

history=workflow.fit_offline(data = train_data, batch_size=100, validation_data=val_data)

Validation

Code

test_data = simulator.sample(1000)

Code

figs=workflow.plot_default_diagnostics(test_data=test_data, num_samples=500)

Inference

We will make an inference for Cha Sa-soon (Lee & Wagenmakers, 2013), who took 950 attempts to pass the test with a final score 30 correct questions. The range of the previous 949 attempts was between 15 and 25.

Code

inference_data = dict(
    attempts = np.array([[950]]),
    score=np.array([[30]]),
    min=np.array([[15]]),
    max=np.array([[25]])
)

Code

samples=workflow.sample(num_samples=2000, conditions=inference_data)

Code

plt.hist(samples["theta"].flatten(), density=True, color="lightgray", edgecolor="black", bins=np.arange(0.25, 1.05, 0.025))
plt.xlabel(r"$\theta$")
plt.ylabel("Density")
plt.tight_layout()

References

Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

Reuse

CC BY NC