Roger Clarke's Web-Site
© Xamax Consultancy Pty Ltd, 1995-2017
|Identity Matters||Other Topics||Waltzing Matilda||What's New|
Roger Clarke **
Revised Version of 20 November 2007
Published in Commun. AIS 22, 1 (January 2008), at http://cais.aisnet.org/articles/default.asp?vol=22&art=1
© Xamax Consultancy Pty Ltd, 2007
Available under an AEShareNet licence or a Creative Commons licence.
This document is at http://www.rogerclarke.com/SOS/Cit-CAIS.html
The previous version is at http://www.rogerclarke.com/SOS/CitAnal0707.html
Citation-counts of refereed articles are a potentially valuable measure of the impact of a researcher's work, in the information systems discipline as in many others. Citation counts can be generated from a number of data collections, including Thomson's ISI database and Google Scholar.
This paper reports on an exploratory study of the apparent impact of IS researchers, as disclosed by citation-counts of their works in those two collections. Citation analysis using currently available databases is found to be fraught with many serious problems, particularly if the ISI collection is used.
Unless these problems are appreciated and addressed, IS researchers will be under-valued by those with authority over research funding and employment, to the serious detriment of the IS discipline.
The Thomson/ISI database is being increasingly used as a basis for citation analysis, and for generating measures of the academic impact of individual IS researchers.
The Thomson/ISI collection encompasses only about 40% of the publication-set that is relevant to the IS discipline. In addition, it contains significant errors even within that limited collection, and the company continues to resist submissions to upgrade its holdings.
Use of the Thomson/ISI database may be appropriate for some disciplines, but it currently generates erroneous and seriously misleading results for IS researchers. At present, there are only limited prospects of the quality of the data being improved.
Google Scholar has access to a much more substantial proportion of the relevant publication-set. Its holdings and services are not sufficiently transparent, but the product appears to have reached a level of maturity, and it appears to be a more appropriate basis for citation analysis.
Citation analysis can produce many impact measures, which have various advantages and disadvantages. A pair of measures that may represent a fair compromise is the so-called `h-index', supplemented by the `h-count'. A researcher scores an h-index of 15 if they have 15 articles that have at least 15 citations. The h-count is the total citation-count for all 15 articles.
The Association for Information Systems (AIS) needs to take steps to avoid ISI-based citation analysis causing serious harm to the discipline's access to research funding and senior academic positions.
Information systems (IS) is a maturing discipline, with a considerable specialist literature, and relationships with reference disciplines that are now fairly stable and well-understood. In a mature discipline, various forms of 'score-keeping' are undertaken. One reason for this is as a means to distinguish among applicants for promotion, and contenders for senior appointments. A further application of score-keeping is as a factor in the allocation of resources to support research. In some countries, this second application is increasingly significant.
One approach to score-keeping is to count the number of works that a researcher publishes, and treat it as a measure of research quantum. The count of works may be moderated by the time-span over which they were published, the categories of publication-venues (such as books, conference papers and journal articles), and the quality of the publication-venues. This represents a measure of research quality rather than quantity. Yet another approach is to count the number of citations of the publications, in order to generate an indicator of the researcher's impact.
This paper performs an analysis of citations of IS researchers, in order to examine the extent to which currently available collections provide satisfactory measures of researcher impact. It is motivated by the concern that, whether or not such analyses are performed by members of the IS discipline, others will do it for us. For example, in the U.S.A, Deans of graduate schools of business are understood to use citation analyses, and in the U.K., New Zealand and Australia, government departments use bureaucratic processes partly based on citation analysis as evaluation tools. If the collections are inadequate, or the techniques are inappropriate, these uses will be detrimental to the interests of individual IS researchers, and to the IS discipline as a whole.
The paper commences by discussing citation analysis and its hazards. The research objectives and research method are described. The raw scores generated from two major sources are presented, and issues arising from the analysis are identified and examined.
This section briefly reviews the concept of citation analysis, recent developments in the area, and its use to date in IS.
"Citations are references to another textual element [relevant to] the citing article. ... In citation analysis, citations are counted from the citing texts. The unit of analysis for citation analysis is the scientific paper" (Leydesdorff 1998). Leydesdorff and others apply citation analysis to the study of cross-references within a literature, in order to document the intellectual structure of a discipline. This paper is concerned with its use for the somewhat different purpose of evaluating the quality and/or impact of works and their authors by means of the references made to them in refereed works.
Authors have cited prior works for centuries. Gradually, the extent to which a work was cited in subsequent literature emerged as an indicator of the work's influence, which in turn implied significance of the author. Whether the influence of work or author was of the nature of notability or notoriety was, and remains, generally ignored by citation analysis. Every citation counts equally, always provided that it is in a work recognised by whoever is doing the counting.
Citation analysis can be put to many purposes, including "1) paying homage to pioneers, 2) giving credit for related work, 3) substantiating one's knowledge claims, 4) providing background reading, 5) articulating a research methodology, 6) criticizing research, and 7) correcting one's earlier work" (Garfield 1977, as reported in Hansen et al. 2006).
The use of citation counts to formally measure the quality and/or impact of works, and of their authors, is a fairly recent phenomenon. Indeed, the maintenance of citation indices appears to date only to about 1960, with the establishment of the Science Citation Index (SCI), associated with Garfield (1964). SCI was later joined by the Social Sciences Citation Index (SSCI) and the Arts & Humanities Citation Index (A&HCI). The combination of the three services was referred to as the Institute for Scientific Information (ISI). ISI was subsequently acquired by Thomson Scientific. ISI became widely available only in 1988, on CD-ROM, and from 1997 on the Web (Meho 2007).
During the late 20th century, a broader movement has developed around citation analysis, under the title `bibliometrics' or `scientometrics'. The undertaking needs to be seen within the broader context of the electronic library. This was conceived by Vannevar Bush (1945), and articulated by Ted Nelson in the 1960s as 'hypertext'. As outlined in Nelson's never-completed Project Xanadu, the electronic library would include the feature of 'transclusion', that is to say that quotations would be included by precise citing of the source, rather than by replicating some part of the content of the source. The strength and weakness of the World Wide Web was its very limited form of hyperlink. As a result, the Web falls far short of the Bush/Nelson vision of a tightly-linked, reliable and minimally-redundant electronic library.
During the last 15 years, SCI has been the market-dominant citation database, but has been subject to increasing competition. The advent of the open, public Internet, particularly since the Web exploded in 1993, has stimulated many developments. Individual journal publishing companies such as Elsevier, Blackwell, Kluwer, Springer and Taylor & Francis have developed automated cross-linkage services, at least within their own journal-sets.
Meanwhile, the open access movement is endeavouring to deliver something much closer to a cohesive electronic library, including full and transparent cross-referencing within the literature. A leading project in the area was the Open Citation (OpCit) project in 1999-2002. An outgrowth from the OpCit project, the Citebase prototype, was referred to as 'Google for the refereed literature'. It took little time for Google itself to discover the possibility of a lucrative new channel: it launched Google Scholar in late 2004.
It is to be expected that citation analysis will give rise to a degree of contention, because any measure embodies biasses in favour of some categories of researcher and against others. Dissatisfaction with it as a means of evaluating the quality and impact of works and of researchers has a long history (Hauffe 1994, MacRoberts & MacRoberts 1997, Adam 2002).
A simple citation-count, for example, favours longstanding researchers over early-career researchers, because it takes time firstly to achieve publications, secondly for other researchers to discover and apply them, and thirdly for their publications to appear. Using citations per paper, on the other hand, favours researchers with few publications but one or two 'big hits' over prolific researchers whose total count is distributed over a larger denominator. Moreover, it may only be possible to achieve a meaningful measure by taking into account the quality of the publishing venues in which the citations appear, and in which the paper itself was published.
Various proposals have been put forward for particular measures that can be used for particular purposes. Hirsch's proposal for an 'h-index' (Hirsch 2005) has unleashed a flurry of activity. The measure is argued to balance quantity against impact. Hirsch is a physicist. Physics has a relatively very large population of academics. It also has the best-developed publication-and-citation mechanisms of any discipline, and unlike many other disciplines, it has not ceded control over its output to for-profit publishers. It appears that the values of h-index achieved, and the measure's effectiveness, are both highly dependent on the number of academics and publications in the discipline; and of course on the reliability of the citation-data.
A range of refinements to the h-index has been proposed, and are summarised by Harzing (2007). These endeavour to balance the measure for such factors as time in the discipline, the distribution of the citation-counts, the length of time since each work was published, and the number of co-authors.
Because the research reported on in this paper concludes that the h-index may provide an effective basis for converting raw citation data into meaningful information, a brief description and example is provided in Appendix 1.
Within the IS discipline, there is a long history of attention being paid to citations. The primary references appear to be Culnan (1978, 1986), Culnan & Swanson (1986), Culnan (1987), Cheon et al. (1992), Cooper et al. (1993), Eom et al. (1993), Holsapple et al. (1993), Eom (1996), Walstrom & Leonard (2000), Vessey et al. (2002), Schlogl (2003), Katerattanakul & Han (2003), Galliers & Meadows (2003), Hansen et al. (2006) and Whitley & Galliers (2007). That is a fairly short list of articles.
A brief assessment of the impact of these papers represents a valuable, preliminary case study in citation analysis. Two alternative searches were performed on the ISI database. The 'General Search' (which is described later) showed that the most cited among them (Holsapple et al. 1993) had only accumulated a count of 24 in April 2006. This had risen to 30 in June 2007, unchanged in November 2007. But by then it had been overtaken by Culnan (1987), with 55.
If instead the ISI 'Cited Ref' facility was used, and a deep and well-informed analysis conducted, then the most cited paper was established by combining several counts for a total of 61 for Culnan (1987) - up to 66 on re-visit in June 2007, and 71 in November 2007.
On Google Scholar, the largest citation-count in April 2006 appeared as 63, for each of Culnan (1986) and Culnan (1987). When the test was repeated on 30 June 2007, the Culnan counts were 81 and 78, with Holsapple et al. (1993) up to 64. By November, the Culnan counts had grown to 93 and 97, but the Holsapple count had mysteriously dropped to 54. As will be shown, these are significant counts, but not outstanding ones.
The primary purposes of the research reported in the papers listed above have been to develop an understanding of the intellectual structure of the IS discipline, of patterns of development within the discipline, and of the dependence of IS on reference disciplines. In some cases, the impact of particular journals has been in focus (in particular Cooper et al. 1993, Holsapple et al. 1993 and Katerattanakul & Han 2003). In one instance (Walstrom & Leonard 2000), highly-cited articles were the primary concern. Another, Hansen et al. (2006) reported on a deep analysis of the ways in which citing articles used (and abused) the cited paper. Galliers & Meadows (2003) used it to assess globalism and parochialism in IS research papers, and Whitley & Galliers (2007) analysed citations as a means of determining the characteristics of the European IS research community.
The aim of the research reported on in this paper is to understand the effectiveness of citation analysis, using available data collections, in evaluating the impact of individual IS researchers. The literature search conducted as part of the present project did not identify any articles which had utilised citation analysis for this primary purpose.
A number of deficiencies in the use of citation analysis for this purpose are apparent from the outset. In the course of presenting the research, more will emerge, and a consolidated list is provided at the end of the paper. Despite these deficiencies, 'score-keeping' is increasingly being applied to the allocation of research resources. The work reported on here accordingly has significance as scholarship, but also has a political dimension.
Because little prior research has been conducted in this specific area, the essential purpose was to provide insights into the effectiveness of citation analysis applied to individual IS researchers. For this reason, the process, and the desiderata underlying it, have been described in considerable detail.
Because of the vagaries of databases that are organised primarily on names, considerable depth of knowledge of individuals active in IS research is needed in order to achieve a reasonable degree of accuracy. The project accordingly focussed on researchers known to the author.
In-depth analysis was first conducted in relation to an extensive list of academics active from the late 1970s to 2005 in the author's country of long-term residence. This was appropriate not only as a means of achieving reasonable data quality, but also because the scale was manageable. The method and results of this part of the research are reported in Clarke (2008). Using the expertise gained in that pilot study, similar analyses were then performed for some leading researchers in North America and Europe.
One important insight that was sought related to publishing venues. It is vital that the databases that are available to support citation analysis contain the publishing venues that are most relevant to an evaluation of IS researcher impact, and do not contain many non-relevant venues. Rather than simply tabulating citation-counts, the research accordingly commenced by establishing a list of relevant journals and conference proceedings.
The set of venues was developed by reference to the now well-established literature on IS journals and their rankings, for which a bibliography is provided at Saunders (2005). Consideration was given to the lists and rankings there, including the specific rankings used by several universities, and available on that site. Details of individual journals were checked in the most comprehensive of the several collections (Lamp 2005). In parallel with this research, the international body of IS researchers, the Association for Information Systems (AIS) has addressed concerns within the USA about inappropriate valuations of IS publications by the Deans of Graduate Schools of Business. In addition, ranking lists for IS journals relevant to IS in Australia were developed by Fisher et al. (2007) and subsequently the Australian Council of Professors and Heads of Information Systems (ACPHIS). The ACPHIS list contains 182 IS journals, allocating 9 as A+ (limited by Australian government Rules to 5% of the total), 29 A, 32 B and 112 C-grade journals. There is broad correspondence among these several lists, but there are also many specific differences.
The set selected is listed in the first two columns of Exhibit 1. (The remainder of the columnar format will be explained shortly). The inclusions represent a fairly conventional view of the key refereed journals on the management side of the IS discipline. The list significantly under-represents those journals that are in reference disciplines generally, especially in computer science and at the intersection between IS and computer science. The reason for this approach is that otherwise a very large number of venues would need to be considered, and many included, in which the large majority of IS researchers neither read nor publish.
Less conventionally, the list separates out a few 'AA-rated' journals, and divides the remainder into general, specialist and regional journals. The purpose of this was to enable evaluation of the coverage of the data collections in relation to the top stratum of IS-relevant journals, as well as IS journals generally. There is, needless to say, ample scope for debate on all aspects of the selection and classification; but it was designed to aid the analysis, and did so.
|Journal Name||Journal Abbrev.||SSCI||SCI||Issues Included|
|AA Journals (3)|
|Information Systems Research||ISR||Y||Only from 1994, Vol. 4 ?|
|Journal of Management Information Systems||JMIS||Y||Only from 1999, Vol. 16|
|Management Information Systems Quarterly||MISQ||Y||Only from 1984, Vol. 8|
|AA Journals in the Major Reference Disciplines (4)|
|Communications of the ACM (Research Articles only)||CACM||Y||From 1958, Vol. 1|
|Management Science||MS||Y||From 1955, Vol. 1|
|Academy of Management Journal||AoMJ||Y||From 1958, Vol. 1|
|Organization Science||OS||Y||From 1990, Vol. 1?|
|A Journals - General (9)|
|Communications of the AIS (Peer Reviewed Articles only)||CAIS||None!|
|Database||Data Base||Y||Only from 1982 Vol. 14 ?|
|Information Systems Frontiers||ISF||Y||Only from 2001, Vol. 3|
|Information Systems Journal||ISJ||Y||Only from 1995, Vol. 5|
|Information & Management||I&M||Y||Only from 1983, Vol. 6|
|Journal of the AIS||JAIS||None|
|Journal of Information Systems||JIS||None|
|Journal of Information Technology||JIT||Y||Only 18 articles|
|Wirtschaftsinformatik||WI||Y||Only from 1990, Vol. 32|
|A Journals - Specialist (15)|
|Decision Support Systems||DSS||Y||Only from 1985, Vol. 1|
|International Journal of Electronic Commerce||IJEC||Y||From 1996, Vol. 1|
|Information & Organization||I&O||None|
|Information Systems Management||ISM||Y||Only from 1994, Vol. 11|
|Information Technology & People||IT&P||None|
|Journal of End User Computing||JEUC||None|
|Journal of Global Information Management||JGIM||None|
|Journal of Information Systems Education||JISE||None|
|Journal of Information Systems Management||JISM||None|
|Journal of Management Systems||JMS||None|
|Journal of Organizational and End User Computing||JOEUC||None|
|Journal of Organizational Computing and Electronic Commerce||JOCEC||None|
|Journal of Strategic Information Systems||JSIS||Y||From 1992, Vol. 1 ?|
|The Information Society||TIS||Y||Only from 1997, Vol. 13|
|A Journals - Regional (3)|
|Australian Journal of Information Systems||AJIS||None|
|European Journal of Information Systems||EJIS||Y||Only from 1995, Vol. 4|
|Scandinavian Journal of Information Systems||SJIS||None|
It can be argued that papers accepted for the major refereed conferences should be included within the scope of IS citation analysis. This applies in particular to the international and regional events that are accessible and indexed in the Association for Information Systems' AIS eLibrary. For pragmatic reasons, however, the analysis focussed primarily on journal papers.
As the next step, a survey was conducted of available data collections. It was clear that Thomson / ISI needed to be included, because it is well-known and would be very likely to be used by evaluators. Others considered included:
Elsevier's Scopus has only been operational since late 2004. The next three are computer science indexes adjacent to IS, and at the time the research was conducted the last of them was still experimental. The decision was taken to utilise Thomson/ISI, and to extract comparable data from Google Scholar. A more comprehensive project would be likely to add Scopus into the mix.
The third and fourth columns of Exhibit 1 show whether the journal is included in the Thomson/ISI SCI or SSCI Citation Indices. The final column shows the inferences drawn by the author regarding the extent of the Thomson/ISI coverage of the journal. Many problem areas were encountered, are reported on below, and are highlighted in the final column of Exhibit 1 in bold-face type. Only 15 of the 30 IS journals are represented, many only partially. The coverage is further considered later in greater depth.
In assembling a list of individuals as a basis for research of this nature, there are challenges to be overcome. When determining the set of IS academics in a particular country, immigration, emigration and expatriates create definitional challenges. People enter and depart from the discipline. Topic-areas do as well. For example, the various specialisations within software engineering since about 1980 can be defined to be within the IS discipline, outside it, or both, depending on the phase of history being discussed.
There are overlaps with the Computer Science discipline, with various management disciplines, and with the IS profession. One indicator of the IS discipline's diversity is that, of the 15/30 IS journals that are within-ISI, 6 are in SCI (science) and 9 are in SSCI (social science).
For the preliminary Australian study, a comprehensive list of individuals active in IS research from 1978 to 2000 was established, as described in Clarke (2008).
For the international researchers, on the other hand, the selection process was purposive. It favoured uncommon surnames, and individuals whose work was reasonably familiar to the author of this paper. The purpose of adopting this approach was to reduce the likelihood of data pollution through the conflation of articles by multiple academics. The selection of a sample of 25 people relied on this author's longstanding involvement in the field internationally, and his knowledge of the literature and the individuals concerned. The use of a sample of this nature clearly precludes any claims of external validity. In an exploratory study of this nature, that was perceived by the author to be an appropriate approach to adopt, and compromise to accept.
Data was extracted from the SCI and SSCI citation indices over several days, for the preliminary Australian study in late January 2006, and for the main study in April 2006. Access was gained through the ANU Library Reverse Proxy, by means of Thomson's 'Web of Science' offering. Both sets of searches were restricted to 1978-2006, across all Citation Indices (SCI, SSCI and A&HCI). Multiple name-spellings and initials were checked, and where doubt arose were also cross-checked with the AIS eLibrary and the (A)ISWorld Faculty Directory.
Subsequently, Google Scholar was searched for each researcher. Supplementary research was then undertaken within the Thomson/ISI database. These elements were performed in respectively early and late April 2006. It was apparent from re-testing that the contents of the ISI database and hence the citation-counts were accumulating at a modest rate. The Google Scholar data, on the other hand, grew rapidly, and, unlike ISI, ongoing changes were apparent in both the scope of the Google collection and the Google service.
Some re-sampling was undertaken in June 2007, in order to provide information about the stability of the data collections, and the rate of change of citation-counts. Further experiments were performed, in order to enhance understanding of the quality of the counts. The next section reports on the results of the Thomson/ISI study.
Thomson/ISI is accessible as three elements of the 'Web of Science' product. In January 2006, the site stated that SCI indexed 6,496 'journals' (although some are proceedings), and that SSCI indexed 1,857 'journals'. On 2 July 2007, the corresponding figures appeared to be 6,700 and 1,986. The company's policies in relation to inclusion (and hence exclusion) of venues are explained at http://scientific.thomson.com/mjl/selection/. An essay on the topic is at Thomson (2005).
The processes of extraction and analysis required some experimentation. A discussion is provided in Appendix 2. The following sub-section presents the results of the pilot citation analysis using the ISI `General Search' feature. Further sub-sections evaluate the quality of the data, and report on further analysis using the ISI `Cited Ref Search'.
Exhibit 2 shows the resulting data for some well-known leaders in the discipline in North America and Europe. 25 individuals were selected. A threshhold of 100 total citations was applied (for reasons relating to the Australian study, and explained in Clarke 2008). This resulted in a list of 16 people whose data are reported below.
The relatively low counts of the leading European academics is interesting. Rather than undertaking a necessarily superficial analysis here, the question is left for other venues. But see Galliers & Meadows (2003) and EJIS (2007).
Number of Articles
Largest Per-Article Count
|Lynne Markus (as ML)|
|Izak Benbasat (as I)|
|Dan Robey (as D)|| |
|Sirkka Jarvenpaa (as SL) |
|Detmar Straub (as D and DW)|
|Rudy Hirschheim (as R)|
|Gordon Davis (as GB)|
|Peter Keen (as PGW)|
|** Sal March (as ST)|
|** Eph(raim) McLean (as E and ER)|
|Kalle Lyytinen (as K), but in the USA since 2001|
|Leslie Willcocks (as L)|
|Trevor Wood-Harper (as T, AT and TA)|
|Bob Galliers (as RD, R and B), but in the USA since 2002|
|Guy Fitzgerald (as G)|
|Enid Mumford (as E)|
An important focus of this exploratory study concerned the effectiveness of the available databases in reflecting the extent to which the individuals concerned were actually cited. This sub-section reports on several tests that were applied, which identified a substantial set of deficiencies.
The ISI collection's coverage does not extend to all journals that are perceived by each discipline to comprise the relevant publication-set. Not only are the decisions made solely by the company, but the criteria and process are not transparent. This inevitably results in misleading citation-counts.
In the case of the IS discipline, the effect appears to be that only 6 of the 30 core IS journals are included in their entirety, with a further 9 included in part. As a result, it appears that only about 40% of the papers that have been published in the IS journals in Exhibit 1 are included within an ISI citation analysis.
A test was conducted to gain further insight into the comprehensiveness of coverage. An exhaustive search was undertaken for all refereed publications of a single author. The results were as follows:
These venue-exclusion issues are discussed in greater detail in Appendix 3A.
The inverse problem exists as well, in that some categories of material are inappropriately included, resulting in the inflation of the item-counts and citation-counts of some authors. In the comprehensiveness test referred to immediately above, ISI was found to contain 7 non-refereed items for the selected author, in comparison with only 13 of 36 journal articles and a further 2 of 27 refereed conference papers. Further discussion is provided in Appendix 3B.
Multiple problems were encountered in relation to the way in which ISI handles authors' names. The initial(s) used by and for authors can seriously affect their discoverability. Publications in languages other than English are excluded. ISI does not support diacritics. And compound surnames separated by hyphens and spaces lead to uncertain outcomes. This is discussed in Appendix 3C.
The low counts for several well-known scholars were surprising. Experiments were conducted. The most revealing related to Delone & McLean's 'Information Systems Success: The Quest for the Dependent Variable'. This is discussed further below, and in Appendix 3D.
A re-check of a sample of searches was performed in June 2007. It appeared that the ISI collection was fairly stable during the intervening 14 months, with the only additional items detected being papers published after early 2006. There was a moderate growth in the counts during this time, e.g. 25% for Peter Weill and 39% for the author of this paper (although from a base only one-quarter of Peter Weill's count).
The ISI service enables at least two other approaches to be adopted to citation analysis. They are discussed in Appendix 2. One of them, the `Cited Ref Search', was applied. It provides citation-counts for paper that are not included in the `General Search'. These are citations within papers in the ISI collection to papers that are not in the ISI collection. This search was used to extract data for a sub-sample of researchers in each category. In order to investigate the impact on researchers whose total citation-counts fall behind the leading pack, several researchers were included in the sub-sample whose counts in the previous round fell below the 100 threshhold.
Exhibit 3 provides the results of this part of the study. The first three columns of the table show the number of citations for each author of articles that are in the ISI database, together with the count of those articles, and the largest citation-count found. (This data should correspond with that for the same researcher in Exhibit 2, but in practice there are many small variations, mainly arising from the 3-month gap between the studies that gave rise to the two tables). The next three columns show the same data for articles that are not in the ISI database. The final two columns show the sum of the two Citation-Count columns, and the apparent Expansion Factor (computed by dividing the Total Citations by the Citation-Count for articles in the ISI database).
---- In ISI Database ----
-- Not in ISI Database -
|David Avison (D)|
|Ron Stamper (R, RK)|
|Frank Land (F)|
|Peter Seddon (P, PB)|
|Graeme Shanks (G)|
|Paula Swatman (PMC)|
|Roger Clarke (R, RA)|
|Guy Gable (GG)|
The data in Exhibit 3 enables the following inferences to be drawn:
Dependence on the General Search alone provides only a restricted measure of the impact or reputation of an academic. Moreover, it may give a seriously misleading impression of the impact of researchers who publish in non-ISI venues such as journals targetted at the IS profession and management, and books. To the extent that citation analysis of ISI data is used for evaluation purposes, a method needs to be carefully designed that reflects the objectives of the analysis.
Butler & Visser (2006) argue, however, that an antidote is available. On the basis of a substantial empirical analysis within the political science discipline, they conclude that the ISI collection can be mined for references to many types of publications, including books, book chapters, journals not indexed by ISI, and some conference publications. Replication of the study in the IS context would be needed before firm conclusions could be drawn.
Another alternative, an upstart competitor to ISI, is considered in the following section.
Although Google Scholar was introduced in 2004, it is still an experimental service. From a bibliometric perspective, it is crude, because it is based on brute-force free-text analysis, without recourse to metadata, and without any systematic approach to testing venues for quality before including them. Google's data and processes are new, in a state of flux, unaudited, and even less transparent than ISI's. Considerable caution must therefore be applied in using Google Scholar as a means of assessing researcher impact. On the other hand, it has the advantages of substantial depth, ready accessibility, and popularity. It is inevitable that it will be used as a basis for citation analysis, and therefore important that it be compared against the more formal ISI database.
The approach adopted necessarily differed from that taken with ISI. Exhaustive counting of all papers was trialled, but the nature of the data and the limited granularity of Google's search-tools make it a very challenging exercise. Two techniques were developed. One involves the extraction of only the `top 10' of each author's citation-counts. The other applies the h-index. In thnat case, the effort involved for each researcher is proportional to the impact of their research. A discussion of the research approach is in Appendix 4.
The following sub-sections examine leading researchers, then the `middle ground' of researchers and `early-career researchers'. This is followed by an assessment of quality, and of citation-count patterns.
Google was assessed as a potential vehicle for score-keeping by extracting and summarising Google citations for the same sets of academics as were reported on in Exhibit 2. The sequence in which the researchers is listed is the same as in the earlier Exhibit. This part of the analysis was conducted in June 2007.
The data shown in Exhibit 4 comprises Hirsch's h-index, the h-count (i.e. the citation-count for the researcher's publications that were included in the h-index), and the largest per-item count found in that set.
Citation analysis based on Google Scholar data provides deeper and potentially much richer information. The Google h-count for most of the sample was 2.5 to 6 times that of the ISI total citation-count, with outliers at 7.5 to 10 times. (The ISI count for Eph McLean is an anomaly, and is discussed below). Similarly, the largest single-item count based on Google Scholar data was in almost all cases 2 to 6 times that for ISI. Within the sample, the apparent sequencing of researchers was somewhat different, but among the top 10 of the 16 only at the margin. As reported in Clarke (2008), the relationship was similar for the leading 7 Australian and 4 Australian-expatriate researchers.
On the basis of experimentation described in Appendix 5, it would appear that fully automated citation analysis would be very challenging, but that automated support tools are feasible, even for people whose names clash with others in the database.
Citation Count of h Items
Largest Per-Item Count
|** Sal March|
|Kalle Lyytinen, USA since 2001|
|Bob Galliers, USA since 2002|
For leading IS academics, ISI provided results that were incomplete and misleading, but not entirely untenable, at least in the sense that the relativities among members of the sample are roughly maintained across the ISI and Google measures. ISI was of very limited value, however, for researchers outside the narrow band of well-established leaders with multiple publications in AA journals.
The Google Scholar data is deeper and finer-grained than that extracted from ISI. A test was therefore undertaken to determine whether meaningful impact measures could be generated from Google citation-counts for the next level of researchers. This was done using a purposive sub-sample of Australians, and is reported on in Clarke (2008). The conclusions were that the data provides an effective means of identifying `middle-ground' researchers, but may still be too shallow to support comparisons among `middle ground' researchers.
It would be seriously problematical if citation-counts were used as the primary basis for resource-allocation, because citation analysis is inherently biassed towards established researchers, and represents a severe barrier to entry for the next generation. There has to be some 'rite of passage' whereby new leading researchers can emerge.
On the one hand, it would appear to be futile to depend on the historical record of citations to play a part in that rite of passage, because of the long lead-times involved, and the challenges of performing research without already having funding to support it. Nonetheless, it appeared to be necessary to perform some experimentation, to see whether some metric might be able to be devised. For example, if the data were sufficiently finely-grained, it might be feasible to use econometric techniques to detect citation-count growth-patterns. Alternatively, counts of article downloads might be used (Harnad & Brody 2004).
A modest sample of 'early-career researchers' was prepared, who had come to the author's notice through multiple refereed publications and award-winning conference papers. All suffered the same problem within the Google Scholar collection that even Middle-Ground researchers suffered within ISI: the data was simply too shallow to enable any meaningful analysis to be performed.
A separate study provides complementary data. Hansen et al. (2006, Figure 1) shows the timeline of citations of Markus (1983) - a paper that in November 2007 had an ISI citation-count of 335 (up from 296 in January 2006) and a Google Scholar citation-count of 616. The distribution was roughly symmetrical over its first decade, peaking after 5 years, with total citations about 5 times the count in its peak year (although the measures are confounded by a 'mid-life kicker'published in its 6th year). A total of 9 citations in the first 2-1/2 years indicates no early signs of the article becoming a classic, and hence the author's subsequent eminence could not have been 'predicted' at that time by an analysis of citation-counts for this article.
Further research is needed; but citation-analysis appears to be an unpromising way of discovering 'rising stars'.
A number of experiments were undertaken in order to gain an insight into the accuracy and reliability of ISI results, by means of comparison with Google Scholar results. The experiments are described in Appendix 6.
The conclusions are as follows:
Together with evidence elsewhere in this paper, the inference is that it would be seriously inappropriate to use ISI's data collection as a basis for citation analysis to support the evaluation of the impact of IS researchers.
How many citations does a paper need in order to be considered to have had a moderate, high or outstanding impact? Several assessments were undertaken, and are reported on in Appendix 7, with implications discussed in the following section.
Another aspect of interest is the delay-factor before citations begin to accumulate. Some insight was gained from an informal sampling of recent MISQ articles, supplemented by searches for the last few years' titles of this author's own refereed works. A rule of thumb appears to be that there is a delay of 6 months before any citations are detected by Google, and of 18 months before any significant citation-count is apparent. The delay is rather longer on ISI General Search. This is to be expected, because of the inclusion of edited and lightly-refereed venues in Google, which have a shorter review-and-publication cycle than ISI-included journals, most of which are heavily refereed. Further understanding of citation accumulation patterns will depend on the development and repeated application of disciplined extractions from the citation services.
Reputation and impact are highly multi-dimensional constructs. Mechanistic reduction of a complex, multi-dimensional reality to a single score is morally dubious, intellectually unsatisfying, and economically and practically counter-productive. On the other hand, the frequency with which a researcher's publications are cited by other authors is a factor that an assessment of reputation would ignore at its peril.
This paper has presented an evaluation of the effectiveness of citation analysis of IS researchers' impact on their peers by means of two major data collections. This section draws implications in the following areas:
The research presented in this paper has demonstrated that there are enormous problems to be confronted in applying currently available databases to citation analysis as a measure of IS researcher impact. Two aspects are highlighted below, and expanded upon in Appendix 8.
The most significant concern relates to the coverage of the data collection that is used as the basis for citation analysis. ISI contains at best only 40% of the core body of IS publications, and arguably a lower proportion that that. The ISI collection also has significant quality problems. Google Scholar's content appears to be much more extensive, but its collection rules and business processes are even less transparent than ISI's.
A second major concern is the difficulties involved in generating an accurate count. Some of the difficulties are unavoidable (e.g. duplications of names). Others arise because of data quality problems, data-encoding problems (e.g. for diacritics and hyphens), and name-variants.
Some of the deficiencies of the ISI data collection appear likely to fall fairly evenly on all disciplines (e.g. the apparent incompleteness of journals that are claimed to be indexed, and the failure to differentiate refereed from unrefereed content in at least some journals).
Other impacts are likely to vary significantly between disciplines. Those that are well-established, focus on relatively stable phenomena. have recognition, have friends in high places, and have large numbers of active members, appear likely to have their journals well-represented in the ISI collection.
IS deals with rapidly mutating phenomena. The count of IS researchers is comparatively small - c. 2,000-4,000 actively publishing researchers worldwide. Few IS researchers have achieved levels of influence, either intellectual or political, beyond the IS discipline. IS journals have struggled to gain acceptance in the ISI lists.
Based on those criteria, IS is an outsider looking in. IS researchers have only a small proportion of their journals in the collection, representations on behalf of well-reputed IS journals have been declined by Thomson on multiple occasions, and there is little prospect of that changing. It appears only reasonable to conclude that citation analysis based on ISI data will indicate to people in authority, and will continue for the foreseeable future to indicate to them, that both individual IS researchers and the discipline as a whole are poorly-performed.
Moreover, the deficiencies result in differential effects on individual researchers. ISI's data-holdings are highly conservative, and the barriers to entry work against the interests of people working in new sub-disciplines and research domains.
Citation analysis, in particular using the ISI data collection, is becoming institutionalised as a means of evaluating research impact. It is a norm in US graduate schools of business, where a large proportion of North American IS researchers are employed. ISI is a readily-available tool for the government bureaucracies in the U.K., New Zealand and Australia that evaluate researcher impact - and has been formally adopted by the relevant Australian government department for the 2008 `Research Quality Framework' round of evaluations.
These uses of citation analysis directly influence the distribution of research funding, and in some contexts the allocation of resources within universities and senior appointments. In short, the IS discipline stands to lose a great deal from the deficiencies of the ISI data collection.
It is inevitable that citation analysis will be used in ways that are harmful to the interests of IS researchers. So it would be prudent for the IS discipline to develop and publish norms that will mitigate the harm. External evaluators can be legitimately challenged, if necessary through legal process, if they blindly apply general rules to a discipline that has established and well-grounded evaluation processes.
Exhibit 5 suggests heuristics emerging from the analysis reported on in this paper.
The discipline as a whole, through its professional body the Association for Information Systems (AIS), could undertake steps that would be instrumental in the emergence of an effective framework for score-keeping.
Exhibit 6 identifies actions that the AIS can take, which arise from the analysis conducted in this paper. They are discussed at greater length in Appendix 10.
There may be a world in which the electronic library envisioned by Bush and Nelson has come into existence, and in which all citations can be reliably counted, traced, and evaluated.
Back in the real world, however, the electronic library is deficient in a great many ways. It is fragmented and very poorly cross-linked. And the interests of copyright-owners (including discipline associations but particularly the for-profit corporations that publish and exercise control over the majority of journals) are currently building in additional and substantial barriers rather than working towards integration. It remains to be seen whether the barriers will be broken down, perhaps by the communitarian open access movement, or by the new generation of corporations spear-headed by Google.
Simplistic application of raw citation-counts to evaluate the performance of individual researchers and of research groupings would disadvantage some disciplines, many research groupings, and many individual researchers.
The IS discipline is highly exposed to the risk of simplistic application of citation analysis. For the many reasons identified in this paper, citation-counts will suggest that most IS researchers fall short of the criteria demanded for research funding and senior academic posts. As a result, the IS discipline in at least some countries is confronted by the spectre of reduced access to research funding, because of the application of citation analysis to an inadequate data collection.
Citation analysis is currently a very blunt weapon, which should be applied only with great care, but which appears very likely to harm the interests of the less politically powerful disciplines such as IS. Concerted action is needed by the IS discipline, through its professional body.
Except where otherwise stated, URLs were last accessed in early April 2006.
Adam D. (2002) 'Citation analysis: The counting house' Nature 415 (2002) 726-729
AJIS (2006) `AJIS Featured Theme: The Information Systems Discipline in Australian Universities' Australasian Journal of Information Systems 14, 1 (November 2006) 123-140, at http://dl.acs.org.au/index.php/ajis/issue/view/1, accessed 19 November 2007
Barki H., Rivard S. & Talbot J. (1993) 'A Keyword Classification Scheme for IS Research Literature: An Update' MIS Qtly 17, 2 (June 1993)
Bush V. (1945) 'As We May Think' The Atlantic Monthly. July 1945, at http://www.theatlantic.com/doc/194507/bush
Butler L. & Visser M. (2006) 'Extending citation analysis to non-source items' Scientometrics 66, 2 (2006) 327-343
CAIS (2007) 21 (2007) Special Volume on the IS Academic Discipline in Pacific Asia in 2006, Communications of the AIS, 2007
Cheon M.J., Lee C.C. & Grover V. (1992) 'Research in MIS - Points Of Work and Reference - A Replication and Extension of the Culnan and Swanson Study' Data Base 23, 2 (September 1992) 21-29
Clarke R. (2006) `Plagiarism by Academics: More Complex Than It Seems' J. Assoc. Infor. Syst. 7, 2 (February 2006)
Clarke R. (2008) `RQF v. IS: A Citation Analysis of Australian Information Systems Researchers, in the Context of the Emergent 'Research Quality Framework'' Working Paper, Xamax Consultancy Pty Ltd, November 2007, at http://www.rogerclarke.com/SOS/Cit-AJIS.html, accessed 19 November 2007
Clarke R. & Kingsley D. (2007) 'ePublishing's Impacts on Journals and Journal Articles' Xamax Consultancy Pty Ltd, April 2007, http://www.rogerclarke.com/EC/ePublAc.html
Cooper R.B., Blair D. & Pao M. (1993) 'Communicating MIS research: a citation study of journal influence' Infor. Processing & Mngt 29, 1 (Jan.-Feb. 1993) 113
Culnan M.J. (1978) 'Analysis of Information Usage Patterns of Academics and Practitioners in Computer Field - Citation Analysis of a National Conference Proceedings' Infor. Processing & Mngt 14, 6 (1978) 395-404
Culnan M.J. (1986) 'The Intellectual-Development of Management-Information-Systems, 1972-1982 - A Cocitation Analysis' Mngt Sci. 32, 2 (February 1986) 156-172
Culnan M.J. (1987) 'Mapping the Intellectual Structure of MIS, 1980-1985: A Co-Citation Analysis' MIS Qtly 11, 3 (September 1987) 341-353
Culnan M.J. & Swanson E.B. (1986) 'Research In Management-Information-Systems, 1980-1984 - Points Of Work And Reference' MIS Qtly 10, 3 (September 1986) 289-302
Davis F.D. (1989) 'Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology' MIS Quarterly 13, 3 (September 1989) 319-340
Davis F.D., Bagozzi R.P. & Warshaw P.R. (1989) 'User Acceptance of Computer Technology: A Comparison of Two Theoretical Models' Mngt. Sci. 35, 8 (August 1989) 982-1003
DeSanctis G. & Poole M.S. (1994) 'Capturing the Complexity in Advanced Technology Use: Adaptive Structuration Theory' Organization Science, 5, 2 (May 1994) 121-147
EJIS (2007) 'Special Section on the European Information Systems Academy' Euro. J. Infor. Syst. 16, 1 (February 2007)
Eom S.B. (1996) 'Mapping The Intellectual Structure Of Research In Decision Support Systems Through Author Cocitation Analysis (1971-1993)' Decision Support Systems 16, 4 (April 1996) 315-338
Eom S.B., Lee S.M. & Kim J.K. (1993) 'The Intellectual Structure Of Decision-Support Systems (1971-1989)' Decision Support Systems 10, 1 (July 1993) 19-35
Galliers R.D. & Meadows M. (2003) 'A Discipline Divided: Globalization and Parochialism in Information Systems Research' Commun. Association for Information Systems 11, 5 (January 2003) 108-117, at http://cais.aisnet.org/articles/default.asp?vol=11&art=5, accessed July 2007
Garfield E. (1964) 'Science Citation Index - A New Dimension in Indexing' Science 144, 3619 (8 May 1964) 649-654 , at http://www.garfield.library.upenn.edu/essays/v7p525y1984.pdf
Garfield E. (1977) 'Can citation indexing be automated?' in 'Essays of an information scientist' ISI Press, Philadelphia PA, 1977, pp. 84-90, quoted in Hansen et al. (2006)
Hansen S., Lyytinen K. & Markus M.L. (2006) 'The Legacy of 'Power and Politics' in Disciplinary Discourse' Proc. 27th Int'l Conf. in Infor. Syst., Milwaukee, December 2006, at http://aisel.aisnet.org/password.asp?Vpath=ICIS/2006&PDFpath=EPI-IS-01.pdf, accessed July 2007
Harnad S. & Brody T. (2004) 'Comparing the impact of open access (OA) vs. non-OA articles in the same journals' D-Lib 10, 6 (June 2004), at http://www.dlib.org/dlib/june04/harnad/06harnad.html, accessed July 2007
Harzing A.-W. (2007) 'Reflections on the h-index' University of Melbourne, 25 June 2007, at http://www.harzing.com/pop_hindex.htm, accessed July 2007
Hauffe H. (1994) 'Is Citation Analysis a Tool for Evaluation of Scientific Contributions?' Proc. 13th Winterworkshop on Biochemical and Clinical Aspects of Pteridines, St.Christoph/Arlberg, 25 February 1994, at http://www.uibk.ac.at/ub/mitarbeiter_innen/publikationen/hauffe_is_citation_analsis_a_tool.html
Hirsch J.E. (2005) 'An index to quantify an individual's scientific research output' arXiv:physics/0508025v5, 29 September 2005, at http://arxiv.org/PS_cache/physics/pdf/0508/0508025v5.pdf, accessed July 2007
Holsapple C.W., Johnson L.E., Manakyan H. & Tanner J. (1993) 'A Citation Analysis Of Business Computing Research Journals' Information & Management 25, 5 (November 1993) 231-244
Katerattanakul P. & Han B. (2003) 'Are European IS journals under-rated? an answer based on citation analysis' Euro. J. Infor. Syst. 12, 1 (March 2003) 60-71
Keen P.G.W. (1980) 'MIS Research: Reference Disciplines and a Cumulative Tradition' Proc. 1st Int'l Conf. on Information Systems, Philadelphia, PA, December 1980, pp. 9-18
Lamp J. (2005) 'The Index of Information Systems Journals', Deakin University, version of 16 August 2005, at http://lamp.infosys.deakin.edu.au/journals/index.php, accessed 19 November 2007
Leydesdorff L. (1998) 'Theories of Citation?' Scientometrics 43, 1 (1998) 5-25, at http://users.fmg.uva.nl/lleydesdorff/citation/index.htm
MacLeod D. (2006) 'Research exercise to be scrapped' The Guardian, 22 March 2006, at http://education.guardian.co.uk/RAE/story/0,,1737082,00.html, accessed July 2007
MacRoberts M.H. & MacRoberts B.R. (1997) 'Citation content analysis of a botany journal' J. Amer. Soc. for Infor. Sci. 48 (1997) 274-275
Markus M.L. (1983) 'Power, Politics and MIS Implementation' Commun. ACM 26, 6 (June 1983) 430-444
Meho L.I. (2007) 'The Rise and Rise of Citation Analysis' Physics World (January 2007), at http://dlist.sir.arizona.edu/1703/01/PhysicsWorld.pdf, accessed July 2007
Meho L.I. & Yang K. (2007) 'A New Era in Citation and Bibliometric Analyses: Web of Science, Scopus, and Google Scholar', Forthcoming, Journal of the American Society for Information Science and Technology, at http://dlist.sir.arizona.edu/1733/, accessed July 2007
Perkel J.M. (2005) 'The Future of Citation Analysis' The Scientist 19, 20 (2005) 24
Saunders C. (2005) 'Bibliography of MIS Journals Citations', Association for Information Systems, undated but apparently of 2005, at http://www.isworld.org/csaunders/rankings.htm
Schlogl C. (2003) 'Mapping the intellectual structure of information management' Wirtschaftsinformatik 45, 1 (February 2003) 7-16
Vessey I., Ramesh V. & Glass R.L. (2002) 'Research in information systems: An empirical study of diversity in the discipline and its journals' J. Mngt Infor. Syst. 19, 2 (Fall 2002) 129-174
Walstrom K.A. & Leonard L.N.K. (2000) 'Citation classics from the information systems literature' Infor. & Mngt 38, 2 (December 2000) 59-72
Whitley E.A. & Galliers R.D. (2007) 'An alternative perspective on citation classics: Evidence from the first 10 years of the European Conference on Information Systems' Information & Management 44, 5 (July 2007) 441-455
The work reported on in this paper was conducted within the context of a major collaborative project on the IS discipline, led by Guy Gable and Bob Smyth at QUT. The report on IS in Australia was published in a Special Issue of the Australian Journal of Information Systems 14, 1 (AJIS 2006). The report on IS in Pacific Asia was published in as a Special Issue in CAIS (2007).
Dave Naumann provided assistance in relation to data from the AISWorld Faculty Directory. The paper has benefited from feedback from colleagues within and beyond the team. All researchers mentioned in the paper were invited to comment on the draft and many of their suggestions have been incorporated. Comments by Peter Seddon of the University of Melbourne and Linda Butler of the ANU were particularly valuable. The work of the CAIS Editor and reviewers resulted in substantial re-structuring and enhancement of the paper, for which I am very grateful.
Responsibility for all aspects of the work rests, of course, entirely with the author.
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 50 million in early 2015.
Sponsored by Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916
Created: 20 November 2007 - Last Amended: 20 November 2007 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/SOS/Cit-CAIS.html