Abstract

This paper explores the possibility of promoting knowledge export by means of citation function indexing using CiTO, the Citation Typing Ontology . Instances of knowledge export are exemplified by cross-disciplinary citations, which, it is suggested, may indicate a prolonged life time use of documents. For CiTO to serve the purpose of promoting knowledge export, it should be more specific about citation functions, separating them from evaluation, and then be put to test as a discovery tool.

Introduction: the relevance of citation functions

Citations can often be seen as observable results of a transfer of knowledge, as records of used information. Citations as a potential measure of relevance was noted at least implicitly by Gilbert :116. However, the use of citations vary greatly. We focus here in particular on cross-disciplinary citations and the different functions they fulfil. What purpose do they serve? We want to know how the cited information is used in the citing context, fully aware that there may be other reasons behind citations than strictly intra-scientific judgments of relevance, e.g. as a purely rhetorical device . The references ultimately appearing in an article may also be determined by factors outside the author's immediate control, such as peer review and journal policies.

Still, why is it that certain documents are being found relevant for the most various purposes over and over again long time after their publication, while others tend to fall into oblivion only a few years after their appearance. Which factors are involved in distinguishing the potentially long-lived cited document from the less successful, more short-lived ones? Van Raan and more recently Ke et al. studied so called sleeping beauties in science, i.e. instances of a publication that goes unnoticed (‘sleeps’) for a long time and then, almost suddenly, attracts a lot of attention (‘is awakened by a prince’). Studying long 'sleeping beauties' (SBs) for the purpose of identifying cross-disciplinary citation functions promises to be rewarding, since top SBs achieve delayed exceptional importance in disciplines different from those where they were originally published. Levitt and Thelwall found a link between multi-disciplinarity and a high citedness rate. However, their study did not address the question of cross-disciplinary knowledge export. Multi-disciplinarity and even more so interdisciplinarity or transdisciplinarity have more to do with the integration or synthesis of scientific disciplines working on a common research project, as in the emerging so called I2S, Integration and Implementation Sciences :322. Cross-disciplinarity, on the other hand, is more about researchers in one scientific discipline seeking to apply new methodologies, solutions or problems taken from another, sometimes very distant discipline. Thus, results from studies of interdisciplinarity, transdisciplinarity or multi-disciplinarity cannot automatically be applied to cases of cross-disciplinary knowledge export. By knowledge export we understand here the transfer of knowledge from one discipline to another as documented by cross-disciplinary citations.

Apart from the phenomenon of sleeping beauties, citation analyses have shown substantial variations in citation patterns over time from one discipline to another. There are indications e.g. that documents within the social sciences continue to be cited for a longer period of time than what is the case for the natural sciences . However, there are also examples of remarkably long-lived documents from the natural sciences. A classic paper by Albert Einstein from 1906 was still being cited in journal articles within fields so diverse as dairy sciences, pharmacology, physiology, ceramics, water pollution, acoustics, fluid mechanics, sedimentary petrology and molecular biology during the 1960s , and well into this century again within ceramics, mechanics and sedimentology.

Another example is that of Molina and Rowland , a paper from the field of atmospheric chemistry published in 1974, which has continually been cited at least up until the mid 1990s also within disciplines such as computer science, law, management, ophthalmology, optics, political science, pharmacology, sociology, and, even more recently, risk management and medicine. Noteworthy in cases like these, where papers continue to be used and cited over a long period of time, is precisely the subject dispersion of citing papers. In the case of , the fact that the paper was published in a prestigious multi-disciplinary scientific journal like Nature most likely promoted its exposure also to scientists from outside atmospheric chemistry. The attention it received was no doubt renewed in 1995 when the authors, together with Paul Crutzen, were awarded the Nobel prize for their work in atmospheric chemistry, particularly concerning the formation and decomposition of ozone.

Still, most articles published in Nature never come near the very high citation score attained by this paper. Moreover, received most citations years after its publication, not while it was still new and outsiders, with a fresh issue of Nature in hand, were more likely to be accidentally exposed to the paper, but still before the Nobel prize award (although admittedly there was a new peak in its citation count in 1995, still lower though than in the top year 1976).

Understanding the multipurposeness of scientific papers and their potential for knowledge export calls for an explanation of the function that the cited source fulfils in the context of the citing documents. How does the cited information fit into this sometimes completely new disciplinary environment? In this paper we examine a few examples of cross-disciplinary citation functions, to see if they could also be expressed by the emerging standard citation typing ontology CiTO for the purpose of promoting knowledge export.

Content-based citation analysis and citation functions

Most citation analysis studies so far have been quantitative. Citation counts have been made, e.g, in order to identify the core literature of a scientific discipline and co-citation clustering has been used for mapping the structure of scientific disciplines . Lipetz, pioneer of qualitative citation analysis, investigated the relationship between cited reference and citing document, aiming to improve the selectivity of citation indexes, but the 29 categories he proposed were obviously not intended to constitute a final judgment on the matter .

Since then qualitative or content-based citation analysis studies have produced a multitude of different schemes describing the various functions of citations, with considerable overlap between categories, although the exact labels used for classification differ among authors .

The earlier classification schemes for citation functions relied essentially on manual citation analysis of relatively small sets of articles (typically 10 to 100 items), while later attempts have been made to use semi-automated or computational methods for citation classification of larger samples of full-text articles. An overview of these attempts is found in Ding et al. .

However, automated methods for citation classification, relying on explicit signals or cue words for identification of citation functions , may not capture more complex cross-disciplinary citation relationships of the syntagmatic kind described by Green and Bean , where the relevance of the cited source to the citing document stems rather from the provision of a missing piece of information serving e.g. as part of an evidence chain. An example of this kind of relationship is given in the next section where we will be looking closer at some cross-disciplinary citations apparently representing instances of knowledge export. Thus, this paper still depends on a small number of manually extracted citations from a limited set of articles. The purpose is simply to understand why a scientific article was found useful also outside its original field of research. What follows are some selected examples of citations of papers from the field of atmospheric chemistry or stratospheric ozone monitoring , all introduced by a description of the identified citation function followed by an analysis and discussion of a possible application of CiTO object properties.

Cross-disciplinary citation functions

Comparison: Citation refers to similar results from another field of research. It may appear as a metaphorical type of relation, in which one complex unit is perceived as being structurally equivalent (as a whole or in part) to another :660. The importance of analogical, structural comparison (of similar or dissimilar elements) for knowledge transfer has been extensively described by Day and Goldstone . So it seems only natural that it figures in cases of cross-disciplinary knowledge export and use of scientific data another field of research. A possible instance of this type is :

Similar long-term trends are to be found in total column ozone measurements.... London and Kelley (1974) examining global total ozone found an increase in both the Northern and the Southern Hemisphere during the 1960s.

This article had at the time of access no shared subject descriptors with the cited document in two different research databases (Aerospace database, accessed March, 1996, and the Pascal database, using exclusively English descriptors.) Thus, this is not a case of topic matching. However, the citation link between and appears to be rather strong, with the citation providing both measurement data, functioning as an item of comparison and lending supporting evidence together with other cited documents to the conclusion that

the long-term trend in stratospheric water and its similarity to the long-term trend in stratospheric ozone suggest that these changes arise from long-term changes in the intensity of the circulation. :2164

But obviously, the article is not about stratospheric ozone variation, which is the topic of . The main topic of is described by the title: Stratospheric water vapor variability for Washington, DC/ Boulder, CO: 1964-82. Citations for this type of non-topical comparisons seem difficult to represent by means of CiTO. A possible candidate for a suitable CiTO object property in this case would perhaps be cito:extends , but it does not capture accurately the non-topical quality of this instance.

Evidence: Citation is used for support of propositions in citing entity. Instances of conclusive, logically binding proofs may be rare; rather, reference is often to the apparent agreement between measurement data and predictions of a theory or a model. This type of citation might seem more natural for specialists within a narrower field of research, as it may sometimes require expertise in the field to seize the arguments involved. However, there are also clear examples of cross-disciplinary citations for evidence. Consider the following extract from an article published in a botanical journal as an illustration:

Good estimates of the present stratospheric distribution of ozone and subsequent UV radiation are known (Koller, 1952; Dütsch, 1969; Cutchis, 1974). The total amount of ozone in the northern hemisphere is maximal in spring and minimal in fall. ... It is suggested that among flowering plants of the northern hemisphere, many of which have white or yellow flowers (Table 2), there has been convergent evolution in floral UV absorption. Yellow and white flowers are high in flavonoid pigments which strongly absorb UV light. The seasonality of UV radiation may be one major selective pressure. Yellow and white flowers comprise as much as 85% of an arctic flora (Kevan, 1972). :26f

Discerning some of the more important of premisses involved in the inference leading to the hypothesis in the third sentence of the extract, there is first the observation of the seasonal variation of stratospheric ozone and the subsequent seasonal variation of ultraviolet radiation reaching the earth, leading to a spring maximum of stratospheric ozone and a subsequent spring minimum of UV-radiation in the northern hemisphere (since stratospheric ozone absorbs UV-radiation). Then there is the knowledge that yellow and white flowers are strong absorbants of UV-radiation. Finally there is the evidence of the predominance of yellow and white flowers in the northern hemisphere. Together these premisses make probable the hypothesis that UV-absorption ability has acted as a selective evolutionary mechanism for flowers in the northern hemisphere. It is important to note here that the different premisses come from different subject areas. The first three cited sources in the extract belong to geophysics or climatology, whereas (Table 2) and (Kevan, 1972) are from botany. Despite the differences in subject, the premisses apparently fit together, as slots in a framework :660. One describes certain environmental conditions. Another describes an important property of the object being studied, influencing its adaptation to the conditions described by the first. The third premiss describes the frequency of occurrence of the object being studied, thereby corroborating the importance of the property described by the second premiss. Together they make up an evidential structure, that accounts for the relevance of the cited entities to the purpose of the citing document. Thus, all the cited entities here could apparently be ascribed the CiTO object property cito:isCitedAsEvidenceBy . Alternatively, some of these citations, e.g. those of the strictly botanical sources, might also be described by the CiTO property cito:isCitedAsDataSourceBy .

Force: Citation refers to a likely structure, mechanism or cause behind observed phenomena. A typical example is a reference to a chemical reaction described by the cited entity. Again this type of citation function would seem to be essentially an internal affair among specialists within a field of research, but examples of outsiders making use of it also occur, as this excerpt from a medical journal illustrates:

Stratospheric ozone depletion, accompanied by increases in ambient, biologically destructive ultraviolet-B radiation,104 may exacerbate the effect of climate change on infectious diseases. Arising from a different anthropogenic process than climate change, ozone destruction is occurring primarily from reactions between ozone and halogen free radicals derived from chlorofluorocarbons, other halocarbons, and methyl bromide.105 ; ref. (105) is to

No specific object property was found in CiTO for citations referring to a likely cause, mechanism or explanatory force. A significant difference between the evidence and the force citation functions appeared in , where the 32 citations of for evidence had a median publishing year of 1975, only one year after the cited source, whereas the 26 citations of the force type appeared to be among the most long-lived, in the sample, with a median publishing year of 1984, ten years after the cited source. The sample in that study was too small to allow any definite conclusions, but the apparent difference in age distribution may not be surprising anyway. The reference to an explanatory force in the form of a chemical reaction or structure should be of such permanence that it can be expected to be found not only in articles in scientific journals, but even in textbooks.

Method: Citation refers to the method employed in the cited work. This does not necessarily mean that the same method is used or even advocated by the citing article, as observed in the following example:

Total ozone data were previously analyzed by a number of authors including Angell and Korshover (1973), London and Kelly [sic!] (1974) with particular interest in quantifying long-term trends. The statistical procedure commonly used in these studies is linear regression analysis (i.e. fitting a straight line) applied to adjusted total ozone values (e.g. deviations from monthly means ...). However, problems arise in the interpretation of results from these linear regression models since these models fail to take account of the positive autocorrelation that is present in the ozone data. Hence, we consider time series analysis that accounts for autocorrelation in a quantitative trend assessment of ozone data. :460)

In CiTO, the object property relating to method presupposes that the cited method is actually used by the citing document, cito:usesMethodIn. This is a problematic feature of CiTO; while some properties seem to be too general to distinguish between different specific citation functions, other properties, like this one, presuppose an active use or endorsement of the function expressed cited entity. There are of course a number of other object properties in CiTO expressing a negative evaluation of the cited entity, but these are again more general and hold no information about which function or part of the cited entity that is negatively evaluated. The methodological citations in the aforementioned study were few in number, but their relatively long life might be more than just an accidental effect of the selection. If so, support could be gained from the results of , showing how a scientific paper that was formerly frequently cited for theoretical reasons as describing the structure of collagen suddenly ceased to be among the highly cited papers for a short time, when the focus of research in the field shifted from structural studies to biosynthesis, only to reappear as one of the high ranking cited sources a year later, but then cited rather for its methodology :127f.

Result: Citation involves an implication, viz. if information C0 contained in cited document is true, and if furthermore conditions C1, C2, ... Cn hold good, then the consequences will be such and such. Hence, the citing article does not necessarily have to endorse a claim of truth for the cited information; the only claim is for the potential result, given the conditions described by the antecedent of the implication. The auxiliary conditions C1, C2, ... Cn furthermore do not have to be topically related to the cited information. The only requirement is that there must be no contradiction among them. In several instances of this type of citation appeared in articles from journals, that were clearly peripheral to the field of research concerned with stratospheric ozone monitoring, coming from such disciplines as molecular biology, botany, or ophthalmology. Researchers from outside naturally should be more concerned with the implications of the cited information for their own field of research, rather than with trying to assess the validity of that information, lacking the necessary specialist competence for that. The following passage may serve as an example:

Recent studies by Cicerone (4) and Molina and Rowland (7) state that increased use of fluorocarbons in aerosols and refrigerants could severely deplete the protective layer of ozone in the stratosphere. This would increase the level of UV-B radiation reaching the earth's surface. ... The object of this study was to determine the effects of UV-B irradiation on local lesion development of Chenopodium quinoa Willd. 'Valdivia' plants inoculated with potato virus S (PVS).(Semeniuk and Goth ; ref. (7) is to )

Cito has an object property cito:usesConclusionsFrom that might fit for this kind of citation function, but again it seems the CiTO object property presupposes an active claim of truth for the cited information, whereas the result function described here is more neutral and conditional. In general it would be preferable to separate citation functions from evaluative judgement as clearly as possible, so that each citation function identified could be given one of three values, positive (+), negative (-) or neutral (0).

Now, as we have seen, not all the above examples of citation functions are directly translatable into CiTO object properties, but they nevertheless shed some light on the use of scientific information outside the discipline whence it originated. Possibly other, even more compelling examples such as these can be found, where the age distance between cited and citing documents is larger, as we already saw in section 1 for Einstein (1906) and .

Conclusions: indexing for knowledge export - can CiTO do the trick?

Could citation indexing with CiTO serve the purpose of knowledge export? From the examples above it appears CiTO is not specific enough to capture the finer differences between citation functions. At the same time there seems to be some redundancy in the present version of CiTO , so having index terms more more accurately describe citation functions while separating them from value judgments, does not necessarily imply that the number of object properties would have to grow substantially.

We have seen some instances of cross-disciplinary citations characterized by the kind of hierarchical or structural, syntagmatic relationships between citing and cited source, described by Green and Bean . With the citing entity representing the user need, the topic of the user need and the topic of the cited passage are related as class and subclass, or... as class and class-member :659. This kind of type-token relationship can be expressed in citations by the provision of an instance of the class referred to. It may also appear in the form of the citation function referred to above as comparison with a structurally equivalent unit.

Structural (or syntagmatic) relationships are those where the topic of the cited passage corresponds to a component within a conceptual syntagmatic structure (...), while the topic of the user need corresponds to another component within the structure, or again, the structure at large :660. We saw an example of this relationship in the evidence function in the case of above.

The limited importance of topic matching relationships in citations was confirmed in a study by Harter et al. from the area of library and information science, in which the subject similarity among pairs of cited and citing documents was found to be very small. However, independence from topic matching may vary between disciplines. Guerrero-Bote et al. found a significant correlation between the knowledge export and import rates of different subject categories: This indicates that there are Subject Categories which are more independent, importing and exporting little knowledge, and others with greater flows of knowledge across subject boundaries. :440

Indexing citation functions is not so much about representing mental models or capturing the original intention of the citing author , but rather about describing the actual and potential use - past, present and future - of document contents. It is essential then to look at both sides of citation relationship simultaneously, the citing entity and the cited source. A combination of citation functions and subject headings, extracted from both citing and cited entities might offer even better prospects for knowledge export and provide researchers and readers with new context, adding new relevance to old documents, opening new opportunities for evidence mining. What is needed is a proper test of the capability of an indexing system of citation functions like CiTO, possibly revised and revamped, to serve as a discovery tool across scientific disciplines. Preparation for such a test could perhaps start by indexing a sample of outside 'princes', who have awakened some of those long 'sleeping beauties', and then have a panel of independent researchers, unknowing of her history, find their way to la Belle au bois dormant.

The resulting indexing scheme of a conclusive test should be sufficiently easy to use, so that virtually anyone who reads and writes and cites would be able to contribute to the indexing effort. Online publishers of scientific journals, managers of digital repositories like JSTOR and existing citation indexes like the Web of Science and CiteSeerX could make it happen by means of crowd-sourcing from the users. Ideally, tagging a scientific article online with citation functions from a controlled index language should be just little more complicated than liking a post on social media.

References

  1. Bammer, G. Change! Combining Analytic Approaches with Street Wisdom. Australian National University. ISBN: 9781925022650(ebook).

  2. Ciancarini, P., Di Iorio, A., Nuzzolese, A., Peroni, S., Vitali, F. (2014). Evaluating citation functions in CiTO: cognitive issues. http://dx.doi.org/10.1007/978-3-319-07443-6_39

  3. Day, S.B., Goldstone, R.L. (2012). The import of knowledge export: connecting findings and theories of transfer of learning. Educational psychologist, 47:3, 153-176. http://dx.doi.org/10.1080/00461520.2012.696438

  4. Ding, Y., Zhang, G., Chambers,T., Song, M., Wang, X., Zhai, C. (2014). Content-based citation analysis: the next generation of citation analysis. Journal of the Association for Information Science and Technology, 65:9, 1820-1833. http://dx.doi.org/10.1002/asi.23256

  5. Garfield, E. (1979). Citation indexing: its theory and application in science, technology, and humanities. ISBN: 0-471-02559-3

  6. Gilbert, G.N. (1977). Referencing as persuasion. Social Studies of Science, 7: 113–122. http://dx.doi.org/10.1177/030631277700700112

  7. Green, R., Bean, C.A. (1995) Topical relevance relationships: I. why topic matching fails ; II. an explanatory study and preliminary typology. Journal of the American Society for Information Science,46:9, 646-662. http://dx.doi.org/10.1002/(SICI)1097-4571(199510)46:9<646::AID-ASI2>3.0.CO;2-1 ; http://dx.doi.org/10.1002/(SICI)1097-4571(199510)46:9<654::AID-ASI3>3.0.CO;2-3

  8. Guerrero-Bote, V., Zapico-Alonso, F., Espinosa-Calvo, M., Gómez-Crisóstomo, R., Moya-Anegón, F. (2007). Import-export of knowledge between scientific subject categories: the iceberg hypothesis. Scientometrics, 71:3, 423-441. http://dx.doi.org/10.1007/s11192-007-1682-3

  9. Harter, S. P., Nisonger, T. E., Weng, A. (1993). Semantic relationships between cited and citing articles in library and information science journals. Journal of the American Society for Information Science, 44:9, 543-552. http://dx.doi.org/10.1002/(SICI)1097-4571(199310)44:9<543::AID-ASI4>3.0.CO;2-F

  10. Ke, Q., Ferrara, E., Raddichi, F. and Flammini, A.(2015). Defining and identifying Sleeping Beauties in science PNAS, 112:24, 7426–7431. http://dx.doi.org/ 10.1073/pnas.1424329112

  11. Levitt, J.M., Thelwall, M. (2009). The most highly cited Library and Information Science articles: interdisciplinarity, first authors and citation patterns. Scientometrics, 78:1, 45-67. http://dx.doi.org/http://dx.doi.org/10.1007/s11192-007-1927-1

  12. Lipetz, B-A. (1965). Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American documentation, 16:2, 81-90.

  13. Liu, M.(1993). Progress in documentation - the complexities of citation practice: a review of citation studies. Journal of documentation, 49:4, 370-408. http://dx.doi.org/10.1108/eb026920

  14. London, J., Kelley, J.(1974). Global trends in total atmospheric ozone. Science, 184:4140, 987-989. http://dx.doi.org/10.1126/science.184.4140.987

  15. Mastenbrook, H.J., Oltmans, S.J. (1983).Stratospheric water vapor variability for Washington, DCIBoulder, CO: 1964-82. Journal of the Atmospheric Sciences, 40:9, 2157–2165.http://dx.doi.org/10.1175/1520-0469(1983)040<2157:SWVVFW>2.0.CO;2

  16. Moed, H. F. (2005). Citation analysis in research evaluation. Dordrecht: Springer. ISBN: 9781402037146 (ebook). http://dx.doi.org/10.1007/1-4020-3714-7

  17. Molina, M.J., Rowland, F.S. (1974). Stratospheric sink for chlorofluoromethanes: chlorine atom catalysed destruction of ozone. Nature, 249:5460, 810-812. http://dx.doi.org/10.1038/249810a0

  18. Patz JA, Epstein PR, Burke TA, Balbus JM.(1996). Global climate change and emerging infectious diseases. JAMA, 275:3, 217-223. http://dx.doi.org/10.1001/jama.1996.03530270057032

  19. Philipson, J. (1996). The relevance of citations: a case study of stratospheric ozone monitoring. ISSN: 1401-5358 http://hdl.handle.net/2320/13707

  20. Semeniuk, P., Goth, R.W. (1980). Effects of ultraviolet irradiation on local lesion development of potato virus S on Chenopodium Quinoa 'Valdivia' leaves. Environmental and experimental botany, 20:1, 95-98. http://dx.doi.org/ 10.1016/0098-8472(80)90224-5

  21. Shotton, D., Peroni, S., Ciccarese, P., Clark, T. (2015). CiTO, the Citation Typing Ontology. http://purl.org/spar/cito/

  22. Small, H.G. (1977). A co-citation model of a scientific specialty : a longitudinal study of collagen research. Social studies of science, 7: 139-166. http://www.jstor.org/stable/284873

  23. Tiao, G.C. (1983). Use of statistical methods in the analysis of environmetal data. American statistician, 37:4b, 459-470. http://dx.doi.org/ 10.1080/00031305.1983.10483166

  24. Teufel, S., Siddharthan, A., Tidhar, D. (2006). Automatic classification of citation function. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006),103–110. http://www.aclweb.org/anthology/W/W06/W06-1613.pdf

  25. Utech, F. H., Kawano, S. (1975). Spectral polymorphisms in angiosperm flowers determined by differential ultraviolet reflectance.The botanical magazine = Shokubutsu-gaku-zasshi, 88:1, 9-30. http://dx.doi.org/10.1007/BF02498877

  26. Van Raan, A.F.J.(2004). Sleeping beauties in science. Scientometrics. 59:3, 467-472. http://dx.doi.org/10.1023/B:SCIE.0000018543.82441.f1