Aging and the Narrowing of Scientific Innovation

Haochuan Cui, Yiling Lin, Lingfei Wu & James. A. Evans
The progress of science comes from the sacrifice of a generation of scientists

Dataset of Citation Contexts

We use 235,598,500 citation contexts (sentences surrounding citations) extracted from 12,177,040 papers published between 1895 and 2020 in the Microsoft Academic Graph (MAG). Each citation context captures the local discourse in which a reference appears, enabling analysis of its functional role and evaluative tone. To ensure data quality, we constructed a human-validated subset of 20,000 labeled citation contexts identifying critical citation statements (e.g., explicit criticism, refutation, or acknowledgment of limitations). A critical citation was operationally defined as a citation context that challenges, questions, or explicitly expresses a different viewpoint toward the cited work, as opposed to neutral or supportive citations.

The annotation process began with a calibration phase. Two annotators, a postdoctoral researcher and a doctoral student, both trained in information science, independently annotated 100 randomly selected citation contexts. These annotations were reviewed by two senior scientists (faculty in information science and sociology), who resolved disagreements and identified ten representative sentences as canonical examples of critical citations. Following calibration, the two annotators proceeded with full-scale annotation. Prior to labeling the main dataset, they were evaluated on an independent set of 200 citation contexts, achieving inter-annotator consistency above 0.9. Annotator A labeled the full set of 20,000 citation contexts, while Annotator B periodically annotated random subsets every 5,000 cases to monitor consistency. Annotation continued only when inter-annotator consistency remained above 0.9; otherwise, additional discussion and recalibration were conducted. Upon completion, two senior scientists independently evaluated a random sample of 100 annotated citation contexts, yielding a validation accuracy of 0.89.

20k hand-code critical citation contexts

Name Disambiguation Validation

Critical Citation Validation

Code and Data

Free AI Website Builder