2026-06-04reasoningagents

Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin, Jocelyn Shen, Blake A. Richards, Alison Gopnik, Doina Precup

Key claim

Active exploration enhances adults' conjunctive causal reasoning.

This paper demonstrates that allowing adults to actively explore can significantly improve their ability to reason about conjunctive causal rules. While adults still perform better in disjunctive settings, the findings suggest that agency in exploration can mitigate some of the challenges faced in identifying conjunctive rules.

In plain English

The authors studied how adults understand cause-and-effect relationships when they can actively explore their environment. They found that when given the chance to experiment, adults improved their ability to identify complex causal relationships that require multiple factors to work together. This is a shift from previous studies where participants only observed situations passively. Builders should care because it highlights the importance of agency in learning and could inform the design of educational tools or AI systems that mimic human reasoning.

Novelty

7.0/10

The study introduces active exploration in causal reasoning, extending prior findings on the conjunctive handicap.

Reliability

7.5/10

The paper provides a solid experimental setup comparing human and model performance, supporting its claims.

Deep reliability assessment

The methodology supports a controlled-task claim: contingent active intervention improves adults’ ability to infer conjunctive causal rules in a modified blicket/Nexiom detector setting, and conjunctive cases still demand more tests than disjunctive cases. Broader claims about adult causal competence in real-world scientific discovery, or LLMs as general causal reasoners, are more speculative because the task is small, symbolic, and has a constrained hypothesis space.

Reproducibility

No open-source code, dataset, or project repository is mentioned in the provided abstract, introduction, results, discussion, conclusion, or visible footnotes. The task design and summary tables are described, but replication would require reimplementing the Nexiom detector environment and participant/model protocols.

Discussion questions

1.Does the result challenge the idea of an adult conjunctive-reasoning deficit, or does it mainly show that adults can overcome a disjunctive prior when the hypothesis space is tiny and fully testable?
2.For builders of AI agents, does this imply that giving models tool-use or experiment-selection ability is insufficient unless the agent can tightly couple its own hypotheses, interventions, and contingent feedback?
3.What experimental outcome would falsify the paper’s interpretation: for example, if adults still outperformed passive learners when their proposed interventions were decoupled from feedback, or if passive learners matched active learners given equally informative evidence?

Key figure

Figure 1 contrasts active exploration, where participants freely add/remove objects and test combinations on the Nexiom machine, with passive yoked observation, where participants only observe another participant’s actions and outcomes before answering object-classification and rule-inference questions.

Benchmark results

Nexiom detector taskmean number of tests, lower is better: 9.6vs gpt-5avg human used 0.6 fewer tests than gpt-5

Nexiom detector taskmean number of tests, lower is better: 6.4vs gpt-5avg human used 1.3 fewer tests than gpt-5

Nexiom detector tasknumber of successful trials reported in Table 2: 34vs gpt-5avg human had 12 more successful conjunctive trials than gpt-5

Nexiom detector tasknumber of successful trials reported in Table 2: 48vs gpt-5avg human had 24 more successful disjunctive trials than gpt-5