Parameter Estimation (EM)

The knowledgespaces.estimation module estimates BLIM parameters from observed response data using the Expectation-Maximization algorithm.

What it estimates

Given a knowledge structure and a matrix of student responses, the EM algorithm estimates:

  • \(\beta_q\) (slip per item): \(P(\text{incorrect} \mid q \text{ mastered})\)

  • \(\eta_q\) (guess per item): \(P(\text{correct} \mid q \text{ not mastered})\)

  • \(\pi_K\) (state prior): \(P(\text{student is in state } K)\)

High-level API

import knowledgespaces as ks

structure = ks.space_from_prerequisites(
    ["add", "sub", "mul"],
    [("add", "sub"), ("sub", "mul")],
)

result = ks.fit_blim(
    structure,
    items=["add", "sub", "mul"],
    responses=[[1,1,1], [1,1,0], [1,0,0], [0,0,0]],
    counts=[45, 30, 20, 5],  # optional: pattern frequencies
)

print(result["converged"])      # True
print(result["n_iterations"])   # number of EM iterations
print(result["beta"])           # slip per item (dict)
print(result["eta"])          # guess per item (dict)
print(result["log_likelihood"]) # final log-likelihood

Low-level API

For full control:

from knowledgespaces.estimation import estimate_blim, ResponseMatrix
import numpy as np

data = ResponseMatrix(
    items=["add", "sub", "mul"],
    patterns=np.array([[1,1,1], [1,1,0], [1,0,0], [0,0,0]]),
    counts=np.array([45, 30, 20, 5]),
)

result = estimate_blim(
    structure, data,
    max_iter=500,
    tol=1e-6,
    beta_init=np.array([0.05, 0.1, 0.15]),  # per-item initialization
    eta_init=0.1,                          # or global scalar
)

print(result.beta_for("add"))
print(result.eta_for("mul"))
print(result.pi)  # state prior distribution

How it works

The EM algorithm alternates:

  1. E-step: For each response pattern, compute the posterior probability of each knowledge state.

  2. M-step: Re-estimate \(\beta_q\), \(\eta_q\), and \(\pi_K\) from the weighted sufficient statistics.

The log-likelihood is guaranteed to increase at each iteration. Convergence is declared when the change in log-likelihood falls below tol.