Haochuan Cui, Yiling Lin, Lingfei Wu & James. A. Evans
The progress of science comes from the sacrifice of a generation of scientists
We use 235,598,500 citation contexts (sentences surrounding citations) extracted from 12,177,040 papers published between 1895 and 2020 in the Microsoft Academic Graph (MAG). Each citation context captures the local discourse in which a reference appears, enabling analysis of its functional role and evaluative tone. To ensure data quality, we constructed a human-validated subset of 20,000 labeled citation contexts identifying critical citation statements (e.g., explicit criticism, refutation, or acknowledgment of limitations). A critical citation was operationally defined as a citation context that challenges, questions, or explicitly expresses a different viewpoint toward the cited work, as opposed to neutral or supportive citations.
The annotation process began with a calibration phase. Two annotators, a postdoctoral researcher and a doctoral student, both trained in information science, independently annotated 100 randomly selected citation contexts. These annotations were reviewed by two senior scientists (faculty in information science and sociology), who resolved disagreements and identified ten representative sentences as canonical examples of critical citations. Following calibration, the two annotators proceeded with full-scale annotation. Prior to labeling the main dataset, they were evaluated on an independent set of 200 citation contexts, achieving inter-annotator consistency above 0.9. Annotator A labeled the full set of 20,000 citation contexts, while Annotator B periodically annotated random subsets every 5,000 cases to monitor consistency. Annotation continued only when inter-annotator consistency remained above 0.9; otherwise, additional discussion and recalibration were conducted. Upon completion, two senior scientists independently evaluated a random sample of 100 annotated citation contexts, yielding a validation accuracy of 0.89.
Free AI Website Builder