Roger Clarke's 'Web of Science Revisited'

Roger Clarke's Web-Site

© Xamax Consultancy Pty Ltd, 1995-2024

HOME

eBusiness

Information
Infrastructure

Dataveillance
& Privacy

Identity Matters

Other Topics

What's New

Waltzing
Matilda

Advanced Site-Search

Roger Clarke's 'Web of Science Revisited'

The Web of Science Revisited:
Is it a Tenable Source for the Information Systems Discipline
or for eCommerce Researchers?

Review Version of 29 August 2012

Roger Clarke and Andreja Pucihar **

Available under an AEShareNet licence or a Creative Commons licence.

This document is at http://www.rogerclarke.com/SOS/WoSRev.html

Abstract

Only about 30% the eCommerce literature is indexed by the Web of Science. Use of the service is therefore seriously detrimental to the practice of research, and to performance evaluation of researchers active in that research domain.

For the IS discipline as a whole, the coverage of the Web of Science is similarly low. Even within the disciplinary core, the coverage of about 40% estimated in 2008 has grown by only a few percentage points during the last 4 years. Only two-thirds of the top 30 journals are indexed at all, and in most of those cases many early Volumes are excluded. It would appear that about 9,000 IS papers are indexed in WoS, but that about 12,000 papers from the top 30 journals alone are excluded from it, and perhaps a further 10,000 papers from other IS journals and perhaps 10,000 papers from conferences.

It is very much against the interests not only of eCommerce researchers, but of the IS discipline as a whole, for a citation-indexing service to be treated as important when it includes only a very small percentage of the discipline's corpus, and only a small percentage even of the disciplinary core, and hence provides information that is misleading for research and performance evaluation alike.

1. Introduction
2. Method
3. WoS Coverage of eCommerce Publishing Venues
4. WoS Coverage of the IS Core
5. The Deficiencies of WoS
6. Google Scholar as an Alternative to WoS
7. Conclusions
References

1. Introduction

The electronic publishing era has created new possibilities for academics. A highly valuable impact has been greatly improved scope for researchers to discover prior work in their research areas. Another, more contentious application has been the evaluation of academic performance through the counting of individuals' publications and citations, weighted by some indicator of the quality of the venue in which the publications appear.

The offerings of commercial publishers such as Elsevier, Blackwell, Kluwer, Springer and Taylor & Francis are largely limited to the database of publications over which each of them exercises control. Two services, however, currently have scope that extends across multiple publishers' lines.

A commercial service that is currently branded Web of Science (WoS) offers subscription-based access to an index of academic citations. It originated in the 1960s as the Science Citation Index (SCI), and during the 1980s was merged with other services into the Institute for Scientific Information (ISI). It has been Web-accessible since 1997, and is currently owned by Thomson Reuters. The company sets barriers that have to be overcome before a publishing venue is accepted. The barriers evidence strong bias in favour of long-established, commercial publishers, and against recently-commenced venues, independent journals, and conferences generally.

The other service is Google Scholar (GS), which has been openly accessible since it was established in 2004. GS has a very wide field of view, comprising some sub-set of the Google search-engine's catchment, presumably defined on the basis of documents that contain lists of references to other publications.

In CAIS 22, 1, the WoS service was found to be an inappropriate basis for citation analysis in the Information Systems (IS) discipline, because "only 6 of the 30 core IS journals [were at that time] included in their entirety, with a further 9 included in part ... [with the result that it] encompasse[d] only about 40% of the [core] publication-set. In addition, it contain[ed] significant errors even within that limited collection, and the company continue[d] to resist submissions to upgrade its holdings" (Clarke 2008a).

During 2012, further analyses were performed by the authors as part of a project reviewing electronic interaction research over the last quarter-century. The project was stimulated by the 25th anniversary of the Bled eConference, and the results are reported in Clarke & Pucihar (2012). The present paper builds on the data arising from those analyses in order to update the 2008 evaluation of the Web of Science. The paper commences by outlining the method adopted, presents the results in relation to the eCommerce research domain, presents updated results in relation to the IS discipline's core, and draws conclusions from the data.

2. Method

The conduct of a 25-year retrospective analysis of a research domain within IS requires a broad scope definition, in order to encompass the considerable changes that have occurred during that time. The term 'electronic interaction' has been the unifying theme of the Bled eConference since 2004. That term's scope is suitably broad, extending from electronic data interchange (EDI), via the many elements of eBusiness including B2B, B2C and C2C eCommerce, eGovernment, eHealth, etc., and on to other economically oriented topics such as inter-organisational systems, mobile commerce, 'Web 2.0' and social media, together with applications in the social dimension, for communities, groups and individuals. The analyses conducted in support of the primary paper (Clarke & Pucihar 2012) focussed in particular on the major topic-areas of EDI and eCommerce, but with consideration also given to more recent topic-areas such as mobile commerce and social media.

For the present discussion, a narrower view is appropriate. The peak of interest in EDI long pre-dates ePublishing and the emergence of databases of publications that could be subjected to analysis. Some other topics, on the other hand, are still too recent to support detailed analysis. This paper accordingly concerns itself primarily with a topic-area that was contemporaneous with ePublishing and that is already mature - eCommerce.

The Web of Science was accessed in July-August 2012 by both researchers through the subscriptions of their respective host universities. The primary search tool used was the Search facility. This requires experimentation in order to discover an appropriate way to isolate targeted venues. Some use was also made of the Cited Reference Search. In addiition, documents were consulted on the corporate web-site at thomsonreuters.com.

The analysis focussed on the 'engine-room' that is afforded by the specialised venues concerned with eCommerce. These represent the gathering-point for authors, editors, reviewers and readers who have an interest in specific technologies and their applications, impacts and implications, rather than a primary desire to score publication points. The following section identifies the specialist venues, and evaluates the extent to which the Web of Science services the needs of eCommerce researchers. A later section then applies a similar analysis to the venues at the core of the IS discipline.

3. WoS Coverage of eCommerce Publishing Venues

The Web of Science was examined, in order to establish which Issues of relevant journals and which occurrences of relevant conference series were indexed. The relevance of a venue was defined by the presence in the venue's title of 'eCommerce' or an equivalent term. The WoS service's 'Journal Search' feature (TS 2012c) has limited functionality, and a series of searches necessary, in each case sorting the results in order of data of publication, in order to identify start- and end-points of coverage. Table 1 presents the results. Venues are listed alphabetically within category. The 'A Journal' classifications are based on the Excellence in Research for Australia (ERA) listings (Lamp 2010), and are conventional but not necessarily authoritative.

Table 1: WoS Coverage of Major eCommerce Venues

Table 2 summarises the results. Assumptions were made as follows: that there are only 4 journal articles per Issue for excluded journals cf. 6 for WoS-indexed journals, and that there are on average only 20 papers per set of conference proceedings. Even on that conservative basis, WoS would appear to index only about 30% of the eCommerce literature, comprising rather less than half of relevant journal articles and a much smaller percentage of conference papers.

Table 2: Summary of WoS Coverage of the eCommerce Literature

The above analysis is of aggregated data. In order to understand the implications of the exclusion of a large proportion of eCommerce publications from the WoS index, it is necessary to consider the effect on users of the service.

If a researcher depends on WoS as a means of discovering relevant prior publications, a substantial number of important publications will be missed. From the performance-evaluation perspective, on the other hand, dependence on WoS will result in many individuals' impacts being under-estimated, in some cases greatly so.

In order to gain further insight into the extent of the problem, a sample of eCommerce papers was examined. This analysis showed that:

a great many papers in EDI and eCommerce that have in excess of 100 citations in Google Scholar (mostly in venues with standing and relevance) are not indexed by the WoS database
even for those papers that are in WoS, the citation-counts declared by WoS are on average only 25% of those discovered using GS, with a range from 17% to 39%

The WoS Cited Reference Search facility provides citation-counts for non-indexed papers that have been cited in indexed papers - although even more care is needed in using this mechanism than is required in extracting data from Google Scholar. For the most highly-cited paper from the Bled eConference Proceedings, WoS shows only 29% of the citation-count shown by GS. For the second- and third-highest, however, the figures were only 10% and 14%. For multiple Bled Conference papers with 25-70 citations in GS, WoS not only found zero in its main search, but also zero in its Cited Reference Search.

The problems identified above need to be seen within a broader context. During the last two decades, a strong bias has been evident in the eCommerce literature towards theoretically-oriented 'A*' and 'A' journals preferred by Deans of US Business Schools, and away from more relevant, specialist venues. The bias appears to affect both researchers' choices of venues to which to submit papers, and their choices of publications to cite in their papers.

This pattern afflicts eCommerce publications. Searches of the WoS index identified 52 eCommerce publications with more than 100 citations. Of these, only 4 (8%) were in eCommerce venues (all in IJEC). As a control, the test was repeated on Google Scholar. With its larger catchment area, it finds 187 publications with more than 100 citations. But even of these, only 33 (18%) are in eCommerce venues - AMEC(2), ECR (1), ECRA (1), EM (2), IJEC (16), JECR (2), JOCEC (1), ISeB (1), ACM Conference (6) and IEEE Conference (1).

A further element of the project assessed the extent to which the refereed literature has contributed to progress in EDI and eCommerce. The conclusion reached was that the vast majority of the thousands of papers - including most of the highly-cited papers, and even those written in a manner accessible to professionals, managers and/or executives - were published too late to have any useful impact on early adopters or even on the early majority (Clarke & Pucihar 2012).

A very high proportion of publications on eCommerce topics appears to have been intended to satisfy institutional performance evaluation criteria, and has been addressed to theoreticians and not to practitioners. Moreover, those publications primarily cite other theoretical papers, and pay little attention (or at least give little credence to) the specialist literature.

The interplay between the inadequacy of the Web of Science catchment and the bias in citation towards core journals works strongly against the interests of researchers specialising in eCommerce. It appears likely that the same pattern would apply to many other research domains that have high relevance to practitioners.

4. WoS Coverage of the IS Core

Given the serious inadequacies of WoS in relation to an important research domain within the scope of the IS discipline, the authors extended the study to the IS discipline as a whole. Table 1 above showed the extent to which the primary eCommerce venues are indexed in WoS. Table 3 below shows the results for the primary, generalist journals and conferences, which constitute the core of the theoretically-oriented zone within the IS discipline. Given the strong bias inherent in WoS policies towards academic journals, it is reasonable to expect that this zone would be much better covered than the venues with stronger orientation to research domains and real-world relevance.

To enable assessment of the changes over the last four years, the journals selected to represent the core were the same as those used in Clarke (2008a). Table 3 replicates the structure of Table 1 of the earlier paper, adding further columns that show the coverage in August 2012 and that enable estimation of the percentage of each journal's content that is indexed by WoS.

Table 3: WoS Coverage of the IS Disciplinary Core

Some progress is evident. In particular, several journals have been added - JOCEC from 1999, JGIM from 2005, JAIS from 2006 (all three apparently with at least some degree of retrospective effect), EM from 2009, IT&P from 2009 and JOEUC from 2010. In addition, MISQ appears to have had the three Volumes for 1981-83 added retrospectively. On the other hand, several journals have gone backwards - DSS seems to have lost 5 Volumes 1985-1990, IJEC 3 Volumes 1996-99, JSIS 3 Volumes 1992-94, and Data Base seems to have disappeared completely.

The situation in mid-2012 is that 10 of the 31 core IS journals are not indexed by WoS. Even within the 21 that are indexed, the dates of commencement vary, and a total of 220 Volumes are excluded. The number of Issues per Volume, and the number of papers per Issue, vary across journals, and vary within each journal over time. If the simplifying assumption is made that the number of papers per Volume is uniform, then even among the indexed journals only about 70% of the papers are indexed, and if the four major reference-discipline journals are excluded from the calculation the figure is only about 60%.

Looking at the complete set of 31 journals, however, the coverage is far worse. Only 55% of the papers in the disciplinary core are indexed by WoS. If the four reference-discipline journals are excluded, the figure drops to 43%.

In the case of the top dozen generalist journals, only about 50% of the papers are indexed. Among specialist and the major regional journals, it is only about 40%. In considering the gravity of the situation, it has to be remembered that these figures are for the core alone, i.e. the theoretically strongest 30 of about 100 journals. The remaining 70 journals are almost entirely excluded from the WoS index. Almost all conference proceedings are also excluded, including the 32 Volumes of the International Conference on Information Systems (ICIS).

Another way of looking at the problem is to consider the number of IS papers that are, and are not, included in the WoS index. Excluding the four reference-discipline journals, and making the simplifying assumption of 30 papers per Volume uniformly across all journals, about 9,000 IS papers are indexed in WoS, about 12,000 papers from the top 30 journals alone are excluded from it, and perhaps a further 10,000 papers from other IS journals and perhaps 10,000 papers from conferences are excluded as well. The additions to the index are running at about 600 p.a.. New papers that are excluded from the index are of the order of four times that figure - 300 p.a. additions to the top 30 journals that are excluded from WoS, perhaps 1,000 further papers in lower-level, sub-discipline and domain-specific journals, and perhaps 1,000 conference papers.

Further experiments with the additional WoS facility called 'Cited Ref Search' re-affirm how unsatisfactory the service is. Well-published authors have large numbers of papers that are excluded from the WoS index, but are cited in WoS-indexed papers. Moreover, the citation-counts for some of these papers are high, and in some cases the individual has far more citations for these hidden papers than for their publications that are indexed in WoS. Many users are likely to be unaware of this secondary search-facility, and unaware of the vast numbers of citations hidden in it.

A very high proportion of papers, even in A-journals, are destined to be never-cited and largely unread. On the other hand, a significant number of the un-indexed papers will prove to be important to at least some researchers, and some will garner very substantial citation-counts. It is very much against the interests of the IS discipline for a citation-indexing service to be treated as important when it includes a very small percentage of the discipline's corpus, and only a small percentage even of the disciplinary core, and provides misleading information to researchers and to institutional decision-makers.

5. The Deficiencies of WoS

The conclusions drawn above may appear strong, given the standing that the Web of Science is accorded in some parts of academe. This section accordingly summarises the factors that give rise to the problem.

The process used to select venues for indexing in WoS is explicitly targeted at exclusivity, not inclusiveness (TS 2012a). The justification provided invokes Bradford's Law: "a relatively small number of journals publish the majority of significant scholarly results". This justification is specious, or, at best, is relevant only to 'normal science' (in the Kuhnian sense). All disciplines face considerable turbulence during a period of 'paradigm shift', and the well-established journals are inherently less likely to publish the critical papers that herald the change, and drive it through. The IS discipline, in its 45-year history to date, has yet to reach a point of stability, and argues intensely about whether a dominant paradigm exists, and, if so, how it should be expressed. The discipline embodies great diversity, and it is subject to continual revolution and redefinition, driven variously by technology, by intellectual developments, and by fashion (Clarke 2008b).

Moreover, the declared policy of Thomson Reuters in relation to the Web of Science is that "Only current and forthcoming issues are considered in the evaluation. Please do not send back issues; they will not be accepted" (TS 2012b, emphasis in original), i.e. recognition of worth is not retrospective. The result of the WoS approach is that many major journals of relevance to IS generally, and to electronic interaction research in particular, are missing, or have been taken up only from recent dates and without any retrospectivity. For some senior IS researchers, the proportion of their publications that are indexed by WoS is as low as 10-20%.

A further consideration is that "journals are ... deleted from Web of Science throughout the year" (TS 2012a). This represents historical revisionism, with publications and citations being effectively cleansed from the record. The collection is of no value at all for any form of historical research. Further, publications and citation-counts are not cumulative, because they change not only upwards, as new documents are published, but also downwards, as venues are deleted. The selection criteria reek of the mores of the Victoria era, with timeliness of appearance and similar 'virtues' valued far above usefulness to researchers.

Clarke (2008a) also noted a disturbing incidence of data capture errors, omission of papers that according to WoS' own policies should be within-scope, mis-treatment of some unrefereed contributions as though they were refereed, mis-handling of diacritics and hyphens in names, and difficulty coping with name-variants. The experiences gathered during the conduct of the current project suggest that those problems remain.

More fundamentally than the error-content, however, the WoS selection policy is attuned to stasis, which is completely at odds with the realities of the IS discipline and the research domains it fosters, and is harmful to rather than supportive of academics and practitioners working in the area.

6. Google Scholar as an Alternative to WoS

A defence of WoS might be mounted along the lines of 'it's all we've got to work with, and it's better than nothing at all'. For such a defence to be unjustifiable, however, it would need to be demonstrated that use of Google Scholar for research and performance evaluation is inappropriate.

Google Scholar is publicly available rather than behind a paywall. On the other hand, it remains an unstable 'beta', and it has no discernible business model, i.e. it is dependent on cross-subsidy from 'cash cows' elsewhere in the Google product portfolio, and on ongoing belief by the company that it can be monetised somehow, someday. It is therefore subject to withdrawal without notice.

The catchment area of GS is far greater than that for WoS, which results in citations being counted not only in high-grade publications but also in lower-grade works including in unrefereed venues. On the other hand, even those citations have some value; and WoS seriously under-counts, because it excludes a vast array of relevant venues, including back-issues of journals and conferences that it has commenced indexing only after many years of complaints and representations.

GS has an unreliable search-engine whose operations are optimised (and continually re-optimised) for purposes unrelated to academic activities. It uses a precedence algorithm that is determined by the provider, is changed frequently and without notice, is not transparent, and is not manipulable by the user. This forces researchers to have great patience, and precludes reliable automation. On the other hand, the error-rate in WoS searching is also remarkably high, even within its much smaller collection. Authors and conferences need to be searched for using many variants of their names, and even then inconsistent results are achieved.

Much of the data that GS extracts is of even lower quality than that depended upon by WoS. On the other hand, the GS 'My Citations' feature that was added in 2011 enables authors to declare ownership of publications and generate a publications profile that includes all items that GS has within its database, together with key metrics. WoS has given notice of its intention to develop author-profiles built around a 'Researcher-Id', but in August 2012 the facility was non-functional.

As demonstrated in Clarke (2008a), and confirmed in this paper, GS is currently a far more appropriate tool for the IS discipline than WoS.

7. Conclusions

The analysis reported in this paper demonstrates very clearly that the Web of Science is highly inappropriate as a tool for researchers working in the research domain of eCommerce, and hence as a tool for managers evaluating such researchers. It appears very likely that the Web of Science is just as inappropriate in a wide range of other research domains of relevance to the IS discipline.

In relation to the IS discipline as a whole, even the discipline's theoretical core is very poorly represented in the WoS index. The conclusion reached in Clarke (2008a) remains valid in 2012: "[WoS] generates erroneous and seriously misleading results for IS researchers". Moreover, because of the policies adopted by the operator of WoS in relation to its catchment, it appears to be highly likely that the serious deficiencies of WoS are a permanent feature.

Google Scholar has problems, but it is currently far more effective than WoS for eCommerce, for other research domains, and for the IS discipline as a whole.

A further observation arising from the analysis is that there is a cleft within the IS discipline. The rigour-driven core of the discipline involves academicians talking amongst themselves in ways that are not relevant to IS practitioners, nor intended to be understandable by them. In the specialist venues, on the other hand, researchers are much more focussed on problems that have relevance to the real world, and reach out to practitioners, managers and executives.

References

Clarke R. (2008a) 'An Exploratory Study of Information Systems Researcher Impact' Commun. AIS 22, 1 (January 2008), PrePrint at http://www.rogerclarke.com/SOS/Cit-CAIS.html

Clarke R. (2008b) 'A Retrospective on the Information Systems Discipline in Australia' Chapter 2 of Gable G.G. et al. (2008) 'The Information Systems Discipline in Australia' ANU e-Press, 2008, pp. 47-107, at http://epress.anu.edu.au/info_systems_aus/pdf_instructions.html, PrePrint at http://www.rogerclarke.com/SOS/AISHist.html

Clarke R. & Pucihar A. (2012) 'Electronic Interaction Research 1988-2012 through the Lens of the Bled eConference' Forthcoming, Electronic Markets 22, 4 (December 2012), PrePrint at http://www.rogerclarke.com/EC/EIRes-Bled25.html

Lamp J. (2010) 'ERA Outlet Rankings Access' Deakin University, at http://lamp.infosys.deakin.edu.au/era/?page=hmain

TS (2012a) 'The Thomson Reuters Journal Selection Process' Thomson Reuters, May 2012, at http://thomsonreuters.com/products_services/science/free/essays/journal_selection_process/ [accessed 23 August 2012]

TS (2012b) 'Journal Submission Process' Thomson Reuters, at http://ip-science.thomsonreuters.com/mjl/selection/ [accessed 23 August 2012]

TS (2012c) 'Journal Search' Thomson Reuters, at http://ip-science.thomsonreuters.com/cgi-bin/jrnlst/jloptions.cgi?PC=master [accessed 23 August 2012]

TS (2012d) 'Master Journal List' Thomson Reuters, at http://ip-science.thomsonreuters.com/cgi-bin/jrnlst/jloptions.cgi?PC=master [accessed 23 August 2012]

Author Affiliations

Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., and a Visiting Professor in the Research School of Computer Science at the Australian National University.

Andreja Pucihar is an Associate Professor in the Faculty of Organizational Sciences, on the Kranj campus of the University of Maribor, Slovenia, and a member of the Faculty's eCenter. She has been involved in the Bled eConference since 1995, and has been Conference Chair since 2009.

Personalia

Photographs
Presentations
Videos

Access
Statistics

The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.

From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 65 million in early 2021.

Sponsored by the Gallery, Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer

Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916

Created: 17 August 2012 - Last Amended: 29 August 2012 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/SOS/WoSRev.html
Mail to Webmaster - © Xamax Consultancy Pty Ltd, 1995-2022 - Privacy Policy