Roger Clarke's Web-Site

© Xamax Consultancy Pty Ltd,  1995-2024
Photo of Roger Clarke

Roger Clarke's 'IS Researcher Impact - Appendices'

An Exploratory Study of Information Systems Researcher Impact
Appendices

Roger Clarke **

Review Version of 20 November 2007

© Xamax Consultancy Pty Ltd, 2007

Available under an AEShareNet Free
for Education licence or a Creative Commons 'Some
Rights Reserved' licence.

This document contains the Appendices for the paper at http://www.rogerclarke.com/SOS/Cit-CAIS.html

This document is at http://www.rogerclarke.com/SOS/Cit-CAIS-Apps.html


Contents


Appendix 1: The h-index

This Appendix provides a brief overview of the h-index that was proposed in Hirsch (2005). It has attracted considerable attention in the bibliometric literature, and has potential as a means of evaluating the impact of IS researchers on other researchers. The formal definition provided by Hirsch is:

A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np - h) papers have no more than h citations each.

Less formally, a person with an h-index of 21 has published 21 papers that have at least 21 citations each.

Computation of the h-index involves the following steps:

The following is an example of data extracted from Google Scholar for a particular IS researcher. The citation-counts were:

518, 219, 199, 56, 56, 51, 43, 22, 21, 19, 17, 17, 16, 15, 13, 12, 12, 11, ...

Counting down this sequence resulted in the following:

  1,   2,   3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, ...

Inspection of the two series shows that the 14th count is 15, and the 15th is 13. The 14th qualifies, but the 15th does not. Hence the author's h-index is 14. A potentially useful complementary measure is the total citation-count of the papers in the h-index. In this case, it is 1,269.

Harzing (2007) provides a software tool called Publish or Perish, which computes the h-index and various other indices based on citation-searches conducted using Google Scholar.


Appendix 2: The Research Method for the Thomson/ISI Analysis

The data collected for each of the authors was the apparent count of articles, and the apparent total count of citations.

The ISI site provides several search-techniques. The search-technique used in the first instance was the 'General Search'. It was selected partly because it is the most obvious, and hence the most likely to be used by someone evaluating the apparent contribution of a particular academic or group of academics. It is also the most constrictive definition available, and hence could be argued to be the most appropriate to use when evaluating applicants for the most senior or well-endowed research appointments.

The ISI General Search provides a list of articles by all authors sharing the name in question, provided that they were published in venues that are in the ISI database. For each such article, a citation-count is provided, which is defined as the number of other articles in the ISI database that cite it. (It should be noted that although the term 'citation' is consistently used by all concerned, the analysis appears to actually utilise the entries in the reference list provided with the article, rather than the citations that appear within the text of the article).

The search-terms used in this study comprised author-surname combined with author-initial(s), where necessary wild-carded. Where doubt arose, AIS resources and/or the researcher's home-page and publications list were consulted. No researchers were detected in the sample who had published under different surnames, but multiple instances were detected in which initials varied. The date-range was restricted to 1978 onwards. Each list that was generated by a search was inspected, in order to remove articles that appeared to the author to be by people other than the person being targetted.

Experiments conducted with Hirsch's h-index showed it to be impracticable, because of the many problems with the ISI data, particularly the misleadingly low counts that arise for all IS academics, and especially for those who are not leading researchers.

The extraction and analysis process was first applied to Australian IS researchers. For researchers with common names, the search-terms were qualified with 'Australia', and the results are reported in Clarke (2008). As regards the international researchers, the results for the 16 / 25 who were above the threshhold of a total citation-count of 100 are reported in the main body of this paper.

In addition to ISI's `General Search', two other categories of search are available. 'Advanced Search' provides a form of Boolean operation on some (probably inferred) meta-data. This had to be resorted to on several occasions, e.g. while investigating the Bjørn-Andersen and McLean anomalies. If a common evaluation method were able to be defined, it may be feasible to use 'Advanced Search' to apply it. The facility's features seem not to be entirely consistent with the conventions of computer science and librarianship, however, so considerable care would be needed to construct a reliable scheme.

The other category of search is called 'Cited Ref Search'. A 'General Search' counts citations only of papers that are themselves within the collection. The 'Cited Ref Search', on the other hand, counts all citations within papers in the collection. This delivers a higher total-citations score for those authors who have published papers in venues which are outside ISI's collection scope, but that are cited in papers that are within ISI's collection scope.

In order to test the likely impact of applying this alternative approach to IS researchers, 'Cited Ref Search' was used to extract data for a sub-sample of researchers in each category. In order to investigate the impact on researchers whose citation-counts fall behind the leading pack, several researchers were included in the sub-sample whose counts in the previous round fell below the selected total-citation threshhold of 100.

The design of the search-facility creates challenges because it makes only a small number of parameters available. For example, it does not enable restriction to <Address includes 'Australia'>. In addition, very little information is provided for each hit, the sequence provided is alphabetical by short journal title, and common names generate in excess of 1,000 hits. Further problems are that there is enormous variation in citation styles, and in the accuracy of the data in reference lists. This results in the appearance of there being far more articles than there actually are: many articles were found to have 2 or 3 entries, with instances found during the analysis of 5 and even 7 variants.


Appendix 3: Deficiencies in the ISI Data Collection

This Appendix provides information arising from the study which demonstrated significant problems with the ISI database, both generally and from the specific perspective of the IS discipline.


Appendix 3A: Venue Exclusions

The ISI database does not contain all of the venues that would be regarded by any particular discipline as comprising its publication-set. The primary causes are the exclusion of:

Thomson has set substantial barriers to entry for journals into the collection. The criteria and process remain somewhat obscure.

An examination was conducted of the ISI database's coverage of the selected publishing venues listed in Exhibit 1. Nothing was found on the ISI site that declared which Issues of journals were in the database, and it was necessary to conduct experiments in order to infer the extent of the coverage. The examination disclosed a wide range of omissions, as follows:

In summary, only 15 of the 30 IS journals listed in Exhibit 1 are represented in ISI. Of those 15, only 6 are present in their entirety. 8 are only represented from a particular date onwards, and 1 is only represented patchily. As a result, it appears that only about 40% of the papers that have been published in this core set of 30 IS journals are included within an ISI citation analysis.

Anecdotally, and from this author's personal experience, a number of well-respected IS journals have been unable to achieve inclusion in ISI, in some cases despite strong cases and repeated requests.

Further, many branches of computer science, particularly those in rapid development and hence with a very short half-life for publications, have largely abandoned journals in favour of conference papers. This means that many leading computer scientists have very low scores on ISI, and so do members of the IS discipline who operate at the boundary with computer science - within the sample studied, notably Ross Jeffery and Sal March.

A further test was undertaken in order to provide a comprehensive assessment of the inclusion and exclusion of the refereed works of a single author.

A highly convenient sample of one was selected: the author of this paper. In the author's view, this is legitimate, for a number of reasons. This was exploratory research, confronted by many challenges, not least the problems of false inclusions and exclusions, especially in the case of researchers with common names. This author has a very common surname, and has a substantial publications record. Those publications are scattered across a wide range of topics and venues, and some of the papers have had some impact. Most crucially, however, the author is well-positioned to ensure accuracy in this particular analysis because his publications-list is well-documented, and he knows them all.

The outcome of the analysis was as follows:

The results of this comprehensiveness test has implications for all IS academics. On the broadest view, an appropriate measure of impact would take into account the citation-counts for all 63 refereed papers (possibly weighted according to venue); but only 15 are included (24%). A more restrictive scope would encompass journal-articles only, in which case the coverage was still only 13/36 (36%). At the very least, the core IS journals should be included, in which case ISI's coverage still only scores 13/25 (52%).


Appendix 3B: Over-Inclusiveness

A range of items are included within the ISI collection that are inconsistent with its purpose. The following categories were apparent:

As noted in Appendix 3A, an exhaustive examination of the ISI entries for the author of this paper located only 13 of 14 expected and 36 relevant journal-articles, but also 2 of 27 refereed conference papers, and 7 non-refereed items.


Appendix 3C: Vagaries in Author Data

The initial(s) used by and for authors can seriously affect their discoverability. Several authors in the samples have published under two sets of initials - most awkwardly Ross Jeffery as D.R. as well as R.; and three researchers were detected who have publications under three sets of initials. Considerable effort was necessary in multiple cases among the c. 130 analyses performed, and the accuracy of the measures adopted is difficult to gauge.

Niels Bjørn-Andersen suffers three separate indignities. Firstly, ISI largely excludes publications in languages other than English, including Danish. Secondly, ISI does not support diacritics, so 'ø' is both stored and rendered as 'o'. The third problem discovered was that, although a few papers with a very small number of citations were found under 'Bjorn-Andersen', for the three of his papers that appear to have attracted the most citations, his name has been recorded as 'BjornAndersen' (i.e. without a hyphen or other separator). Those papers can be detected using several search-strategies, but not using the author's actual surname.

Given the problem discovered with hyphens, a further test was performed on Trevor Wood-Harper. This disclosed that the same problem occurred - and was further complicated by the existence of three different sets of initials. (Re-testing was not performed until June 2007, and the citation-count shown for this researcher in Exhibit 2 was deflated slightly in an attempt to achieve closer correlation with the figures that would have likely been visible in April 2006).


Appendix 3D: Vagaries in Article Data

Several experiments were conducted, the most instructive relating to an article that, in this author's view at that time, could have been expected to be among the most highly-cited in the discipline (Delone & McLean's 'Information Systems Success: The Quest for the Dependent Variable'). The paper did not appear in Eph McLean's list. It was published in ISR 3, 1 (March 1992), but ISR is indexed only from 5, 1 (March 1994).

Using the 'Cited Ref Search' provided by ISI also fails to detect it under McLean E.R., but detects a single citation if the search is performed on 'INFORM SYST RES' and '1992'. It can also be located using <Author = Delone W.H.>, with 7 variants of the citation, all misleadingly shortened to 'INFORMATION SYSTEMS'. These disclose the (relatively very large) count of 448 citations.


Appendix 4: The Research Method for the Google Analysis

The aim of the research was twofold. It was important to assess the usefulness of citation-analysis using Google Scholar data as a means of measuring the impact of IS researchers. In addition, insight was sought into the quality of both the Google and the ISI data.

The analysis presents considerable challenges. Searches generate long lists of hits, each of which is either an item indexed by Google, or is inferred from a citation in an item indexed by Google. The term 'item' is used in this case because, unlike ISI, Google indexes not only articles but also some books, some reports, and some conference proceeedings. As is the case with ISI, it appears that the 'citations' counted are actually the entries in the reference-list to each item, rather than the citations within the article's text.

Each item has a citation-count shown, inferred from the index; and the hits appear to be sorted in approximate sequence of apparent citation-count, most first. Very limited documentation was found; and the service, although producing interesting and even valuable results, appeared during the period from April 2006 to June 2007 to be anything but stable.

Various approaches had to be experimented with, in order to generate useful data. From a researcher's perspective, Google's search facilities are among the weakest offered by search-engines, and it has a very primitive implementation of metadata. It is reasonable to infer that the indefinite article `a' and the pronoun `I' are stop-words in the indexing logic, and hence searches for names including the initials `A' and 'I' required careful construction. The common words 'is' and 'it' are also stopwords, and hence it is difficult to use the relevant expressions 'IS' (for 'information systems') and 'IT' (for 'information technology') in order to restrict the hits to something more manageable. Search-terms of the form <"I Vessey" OR "Vessey I"> appeared to generate the most useful results.

Experiments with searching based on article-titles gave rise to other challenges, in particular the desirability of a richer starting-point for the analysis: a comprehensive list of article-titles for each researcher.

A preliminary trial was performed on data relating to this author's own publications. After constructing what appeared to be an efficient mechanism, it was still necessary to scan 2,000 entries in order to extract 124 items totalling 1,441 citations. Application of the h-index had the effect of limiting the search to only the first 18 of the 124 items. The 106 papers omitted had at most 17 citations each and an average of only 7, so the total citation-count was reduced by a little over half, from 1,441 to 684. On the other hand, the author in question is relatively prolific, in both the good and bad senses of the term, and hence most authors would lose far less than half of their citation-count; and the 'long tail' is in any case far less significant than the high-impact papers at the top of the list.

The method adopted was to conduct searches using the names of sub-sets of the same samples of researchers whose ISI-derived data appears in Clarke (2008) for Australian researchers and Exhibit 2 for the sample of international researchers. A small sample was used, however, because of the resource-intensity involved, and the experimental nature of the procedure.

For the Australian component of the study, a purposive sub-sample was selected, in order to avoid conflation among multiple authors and the omission of entries. Only the first 10 articles for each author were gathered (generally, but not reliably, those with the highest-citation-count). The technique was also applied to a small sample of international researchers.

The intensity of the 'multiple authors with the same name' problem is highly varied. For many of the researchers for whom data is presented, there was no evident conflation with others, e.g. their Top-10 appeared on the first page of 10 entries displayed by Google Scholar. For a few, it was necessary to skip some papers, and move to the second or even third page. To reach Eph McLean's 10th-ranked paper, it was necessary to check 60 titles, and to reach Ron Weber's 10th, 190 titles were inspected. The check of this author's own entries was more problematical, and is further discussed in Appendix 5.

Experimentation showed that, because substantially more data is available, use of the h-index is feasible. Moreover it is advantageous, because:

Accordingly, for the international researchers, a revised procedure was adopted. Articles and associated citation-counts were extracted from Google Scholar sufficient to enable computation of the h-index and h-count, as described in Appendix 1.


Appendix 5: The Feasibility of Automating Google Citation Analysis

An experiment was conducted in order to establish whether the recognition of matches and spurious matches could be achieved with confidence, and in an automatable (or semi-automatable) manner. The experiment was conducted on the author's own, fairly common name.

The search on Google Scholar resulted in 12,700 hits in April 2006 (but 34,800 when the experiment was repeated in June 2007 and 35,700 in November 2007). To reach the 10th-most-cited paper, it was necessary to inspect the first 558 entries.

The challenges involved in this kind of analysis are underlined by the fact that those first 558 entries included a moderate number of papers by other R. Clarkes on topics and in literatures that are at least adjacent to topics published on and venues published in by the targeted R. Clarke. These could have easily been mistakenly assigned to the R. Clarke in question by a researcher who lacked a detailed knowledge of the targeted person's publications list. Similarly, false-negatives would have easily arisen. There are many researchers with common names, and hence accurate citation analysis based on name alone is difficult to achieve.

A further experiment was conducted in June 2007 to check the effectiveness of more restrictive search-terms. The term <information OR privacy author:Clarke author:R> was used in an endeavour to filter out most extraneous papers without losing too many genuine ones. The 10th most-cited paper was then found at number 33 of 7,380, rather than 558 of 34,800. The total citations for those 10 papers was 481 (cf. 417 when the search was first performed 14 months earlier). The lowest counts of the 10 were 16, 19 and 22; but later counts were larger (30 at no. 59, 37 at no. 74, and 25 at no. 112); so the sequencing is not reliably by citation-count, and there is no apparent way to influence the sequence of presentation of the results of a search.

A re-test in November 2007 found that the numbers had grown somewhat. The 10th most-cited paper was at 26 of 8,240, the citations for the top 10 at 525, lowest-counts 19, 20 and 21. A search of the remainder of the top 200 (of which 42 were for the author in question) located later items with significantly higher citations than 5 of the apparently top 10, plus 4 duplicates of surviving top-10 entries. The re-calculated citation-total for `the real top 10' was 623, suggesting an under-statement through mis-sequencing of close to20%. The patterns appeared to be stable, however.

The comprehensiveness of the coverage was tested by continuing the scan across the first 1,000 entries. (Google Scholar does not appear to enable display of any more than the first 1,000 hits). This identified a total of 117 items with 1,365 citations (about a dozen of which represented double-counting of publications, although apparently not of citations).

The extent to which the search-term missed papers was tested using its complement, i.e. < -information -privacy author:Clarke author:R>. A scan of the first 1,000 entries of 19,900 detected only 7 papers that had been missed by the earlier search (numbers 161, 531, 534, 690, 792, 795 and 861) with a total of 76 citations (respectively 27, 10, 10, 8, 7, 7 and 7).

The stability of the results of applying this technique was checked by means of a repeat of the procedure in November 2007. The 19,900 entries had grown to 21,100, and the first of the 7 papers found through the complement-search had moved to number 164. Because the patterns remained very similar, it is reasonable to surmise that the scope of the data collection and the processes had both stabilised by mid-2007, and that the changes evident since then have arisen primarily from the natural accretion over time of papers within existing venues.

A person-specific specification or protocol appears feasible, but it would be very challenging to fully automate it in order to support periodic re-calculation.


Appendix 6: Cross-Comparison Quality Testing

A series of experiments was conducted, in order to gain insights into the quality of the ISI database, based on cross-comparisons with Google Scholar results.

A6.1 Multiple, Person-Specific Tests

In order to assess the implications for researchers more generally, a sub-set of 7 of the Australian researchers was selected, including 3 of the 7 leaders and 4 of those whose ISI counts fell below the threshhold. Their top-10 Google citation-counts were extracted in April 2006, and comparison made with the ISI results. In each case, careful comparison was necessary, to ensure accurate matching of the articles uncovered by Google against those disclosed by ISI. The data is shown in Clarke (2008), at Exhibits A1 to A7.

Google finds many more items than ISI, and finds many more citations of those items than ISI does. In the sample, the ISI count includes only 39/70 items, and even for those 39 the total ISI citation-count is only 45% of the total Google citation-count.

To some extent, this is a natural result of the very different approaches that the two services adopt: indiscriminate inclusiveness on the one hand, and narrow exclusivity on the other. However, a deeper assessment produced the following evidence:

The analysis throws serious doubt on the adequacy of ISI as a basis on which to assess IS academics' research impact.

A6.2 Anomaly Investigations

The citation-counts were further examined for several researchers whose ISI counts had been lower than this author had anticipated.

Ron Stamper (as R. and R.K.) generated only 32 citations from 13 articles on ISI. On Google Scholar the count in April 2006 was 511 citations of 36 articles (the largest single count being 60), plus 64 citations of 1 book, for a total of 575 citations. The scan found those 37 relevant entries among the first 100 hits of a total of 7,970 hits in all, and doubtless somewhat under-counts. A repeat of the experiment in June 2007, using the search-term <author:Stamper author:R*>, found 33 relevant entries among the first 100 hits of a total of only 830 entries, but for a total of 677 citations, or 18% more than 14 months earlier.

An expansion rate of a factor of 18 from ISI to Google is extreme, and suggests that this particular researcher's specialisations are very poorly represented in ISI's collection. In Google Scholar, his h-index was 14 and his h-count around 400; yet in ISI he fell well below the threshhold of 100 total citations.

David Avison generated under 100 citations on ISI, including 56 for a CACM paper in 1999. On Google Scholar, that paper alone generates 219 citations, an Australian Computer Journal article (which is excluded from ISI) 199, three IT&P papers around 50 each, another CACM paper 43, and a book 518.

A researcher whose Google Scholar scores are an h-index of 14 and an h-count of 1,282, fell well below the 100-citation cut-off used for Exhibit 2.

A6.3 An Article-Specific Test

A further experiment was conducted in order to test the impact of ISI's collection-closedness in comparison with Google's open-endedness. Delone & McLean's 'Information Systems Success: The Quest for the Dependent Variable' was sought by keying the search-term <E McLean W DeLone> into Google Scholar, and critically considering the results. The test was performed twice, in early April 2006 and late April 2006. The results differed in ways that suggested that, during this period, Google was actively working on the manner in which its software counts citations and presents hits. The later, apparently better organised counts are used here.

The analysis was complicated by the following:

The raw results comprised 824 citations for the main entry (and a total of 832 hits). Based on a limited pseudo-random sample from the first 40, many appeared to be indeed attributable to the paper. This is a citation-count of a very high order. An indication of this is that the largest ISI citation-count for an IS paper that was located during this research was 296, for a paper in CACM by Lynne Markus. In Google, that paper scored 472. So the Delone & McLean paper scored 75% more Google-citations than the Google-citation score of the highest-ranked IS paper that had otherwise been located in the ISI database during the course of the research.

The experiment was repeated in June 2007, with significantly different results. One change was that the output was far better organised than 14 months earlier, with most duplications removed and apparently consolidated into a single entry. The second was that the citation-count was 1,166 in the principal entry (plus 22 more in a mere 4 other entries). This represented an increase of 44% on the citation-count 14 months earlier.

As indicated in Appendix 3D, this paper is not in the ISI collection (in the sense that it was published in a Volume of ISR that precedes the date on which ISR was adopted into the ISI database). It can be detected in the ISI collection by indirect means, however, because it is cited by many papers that are in the ISI database. But it could not be detected using `McLean' as a search term. Hence it appears that, as a result of what is quite possibly data capture error, ISI denies one of the authors the benefit of being seen to have co-authored one of the most highly-cited papers in the entire discipline. An assessment of the counts achieved for highly-cited papers when using the Google Scholar collection instead is provided in Appendix 7.

It is not straightforward to confidently construct searches of the ISI collection to determine citation-counts for specific papers.


Appendix 7: Google Scholar Citation-Count Threshholds

A series of experiments was conducted, in order to provide an empirically-based indication of what levels of Google Scholar citation-count were associated with high-impact papers and with `classic' papers.

The first experiment involved inspection of the third column of Exhibit 4, which showed the largest per-item count for the 16 international researchers in the sample. The top 10 of these were, largest first:

1166, 790, 635, 591, 539, 518, 484, 405. 350, 326

A second approach adopted was to conduct searches on a dozen terms of considerable popularity in recent years. The terms were not selected in any systematic manner. Consideration was given to using a pseudo-random selection of terms from Barki et al. 1993; but the set would have required substantial updating.

The following displays the counts of the top ten results for each term, highlighting the total, the largest and the 10th most highly-cited article. This provides an indication of the depth of the heavily-cited literature using that term:

On the basis of this unsystematic experiment, the 10 largest per-item counts among 12 popular terms were, largest first:

1038, 722, 370, 350, 342, 279, 238, 209, 205, 76

Re-tests in July 2007 showed substantial growth in Google citation-counts during the intervening 14 months of 67% for "strategic alignment" to 1,186, 61% for "key issues in information systems management" to 1,499, 53% for "citation analysis" AND "information systems"to 581, 50% for "technology acceptance model" to 3,127, and 46% for "B2B" to 1,395. "B2C", on the other hand, had grown only 12% to 208. These changes might result from considerable expansion in the Google Scholar catchment, an explosion in IS publications and/or the existence of bandwagon effects in IS research.

A third approach was to extract the count for Markus (1983), which was the object of study of Hansen et al. (2006). This showed a Google citation-count in June 2007 of 602. Hansen et al. also drew attention to DeSanctis & Poole (1994), which showed 718 Google citations.

A fourth approach involved the extraction of Google Scholar citation-counts for the highly-cited papers published between 1988 and 1994 that were identified in Walstrom & Leonard (2000, Table 7). This gave rise to the following top-10:

1468, 743, 712, 483, 434, 405, 296, 142, 111, 49

This identified a new contender for most highly-cited paper in IS - Davis et al. (1989). Of the three authors of that paper, two do not publish in the IS literature, but the lead-author, Fred (F.D.) Davis does. Examination of his Google Scholar citation-counts disclosed an article with a yet-higher citation-count - Davis (1989) - which scored 2,516 citations.

A tentative explanation for the scale of the citation-counts for these two papers is that many of the citations may be from researchers in disciplines that are cognate with IS (rather than from within IS), and which comprise larger numbers of researchers than does IS itself.

Davis' citation-count of h-items (column 2 of Exhibit 4) was 7,161, considerably higher than the highest otherwise encountered during this study. Davis' h-index was 22, however, which is rather lower than for many of the other leading researchers in the sample.

Following discovery of the highly-cited Fred Davis papers, the term `user acceptance' was added to the list of 12 popular terms trialled above. In July 2007, this disclosed the two Davis articles, but the counts for the 3rd to 10th most cited articles were not at the same high levels:

Based on the above experiments, in June 2007, the combined top-10 appeared as follows:

2516, 1468, 1166, 1038, 790, 743, 722, 718, 712, 635

This is of course not authoritative, because the method did not assure a comprehensive search of all journals, authors or keywords. The scope of the searching was, however, sufficiently rich that it appears reasonable to draw some cautious inferences.

Very few papers appeared to be scoring above a total citation-count above 600. Moreover, even in topic-areas that are mainstream within the discipline, it appears that a relatively small proportion of items has to date achieved 100 citations. (In the above sample of 13 topic-areas in early 2006, there were about 40, with only a few more apparent by mid-2007).

Moreover, either many topics fail to attract any more interest, or subsequent researchers do not develop a 'cumulative tradition' in that they fail to cite predecessor papers. Hence citation-counts above perhaps 75 could be argued to indicate a high-impact article, and above perhaps 40 a significant-impact paper.

Appropriate threshholds on ISI General Search would appear to be somewhat less than half of those on Google, perhaps 50 and 20.

These threshholds are of course indicative, and could be contentious. They are specific to IS, and other levels would be likely to be appropriate in other disciplines, and perhaps in (M)IS in the USA. The applicability of such threshholds is time-limited, with a half-life perhaps as short as 6 months, because of the growth inherent in citation-counts at this early stage in the maturation of both the discipline and the databases on which the analysis depends.


Appendix 8: Deficiencies in Citation Collections

This Appendix provides a consolidated list of the problems that are evident in the use of existing data collections as a basis for citation analysis generally, with particular reference to their use as a means of assessing the impact of individual IS researchers.

The Collections

Based on the evidence gathered during this study, the following deficiencies are apparent in the ISI and Google Scholar data collections.

The Data

The Services

Research Community Factors


Appendix 9: Measures Arising from Citation Analysis

This Appendix suggests a number of measures that could be generated from Google Scholar data (or ISI data if and when it covers a sufficiently large percentage of the core IS publication set).

It is strongly preferable that several measures be used rather than one, in order to reflect different patterns of research and publication, and provide deeper insight into the researcher's impact within the literature.

Measures that could be considered include:

If circumstances force the application of a very short list of measures, then it is suggested that the following be used:


Appendix 10: Elements of an A.I.S. Strategy

This Appendix provides further detail about possible actions that could be taken by AIS in order to address the serious problems in the area of citation analysis that have been identified in this paper.

A Search and Scoring Method

An AIS project could devise a search and scoring method for each individual IS researcher that is comprehensive, and that addresses the risks of both false-inclusions and false-exclusions.

The method could be supported by a tool that accepts as input a list of an individual's published items, and generates from the Google Scholar collection (or ISI, if and when it covers a sufficient proportion of the IS publication set) an ordered list of citation-counts, an analysis of them, and links to the citing items.

Further, the tool could support the computation of weighted scores. This would require the establishment of an AIS-approved quality-classification scheme of publishing venues, and a process for progressively reviewing and adapting it.

A pre-requisite for this would be the development of a comprehensive database of publication titles and reference lists. Indeed, in June 2007, the AIS Council announced that "A new, integrated AIS e-library will be launched, using ProQuest. The new e-library will enable, amongst other things, searching across all AIS journals and conferences".

More radically, the AIS could embark on an initiative to recover control over the product of its members through open access mechanisms, or if necessary through the outright re-capture of journals from uncooperative for-profit journal publishers. The politics of open content and open access are examined in Clarke & Kingsley (2007).

Indicators of the Impact of Individual Publications

An AIS project could establish indicators of the citation-count threshholds at which refereed papers could be regarded as having had significant (or greater) impact. The project would also need to establish a process for progressively reviewing and adapting the threshholds.

On the basis of the analyses conducted in this research, the following indicative values could be considered as threshholds for citation-counts for IS papers in 2007:

 
Google
ISI
Classic
500
150
High-Impact
75
40
Significant-Impact
40
20

Indicators of the Impact of Individual Researchers

An AIS project could establish indicators of the threshholds at which individual researchers could be regarded as having had significant (or greater) impact. The project would also need to establish a process for progressively reviewing and adapting the threshholds.

As was discussed earlier in this paper, the selection of a measure is very challenging, because the choice will have differential effects on different categories of researcher. A realistic compromise would see the selection of at least two measures, preferably of a composite nature.

The two measures that the analysis leads to are:

Hirsch is reported in Mayo (2007) to have suggested that an ISI h-index of 20 indicates a 'successful' physicist, and 40 an 'outstanding' physicist.

On the basis of the analyses conducted in this research, the following indicative values could be considered as threshholds for Google Scholar scores for IS researchers in 2007:

 
h-index
h-count
Outstanding / Leadership Group
25
1,500
Successful / 'Middle Ground'
15
500


Author Affiliations

Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in the Cyberspace Law & Policy Centre at the University of N.S.W., a Visiting Professor in the E-Commerce Programme at the University of Hong Kong, and a Visiting Professor in the Department of Computer Science at the Australian National University.



xamaxsmall.gif missing
The content and infrastructure for these community service pages are provided by Roger Clarke through his consultancy company, Xamax.

From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by the Australian National University. During that time, the site accumulated close to 30 million hits. It passed 75 million in late 2024.

Sponsored by the Gallery, Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and their drummer
Xamax Consultancy Pty Ltd
ACN: 002 360 456
78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 6916

Created: 20 November 2007 - Last Amended: by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/SOS/Cit-CAIS-Apps.html
Mail to Webmaster   -    © Xamax Consultancy Pty Ltd, 1995-2024   -    Privacy Policy