Social Networks 31 (2009) 271–280 Contents lists available at ScienceDirect Social Networks journal homepage: www.elsevier.com/locate/socnet Opening the black box of link formation: Social factors underlying the structure of the web Sandra Gonzalez-Bailon Oxford Internet Institute and Nuffield College, University of Oxford, 1 St. Giles, Oxford, UK a r t i c l e i n f o Keywords: Web Links Centrality Visibility Interorganisational networks ERGMs a b s t r a c t Links play a twofold role on the web: they open the channels through which users access information, and they determine the centrality of sites and their visibility. This paper adds two factors to the analysis of links that aim to draw a parallel between the web and other offline interorganisational networks: the resources that the organisations publishing online are able to mobilise, and the status or public recognition of those organisations. Exponential random graph models (ERGMs) are used to analyse a sample of the web of about one thousand sites, showing that both the economic resources of the producers of the sites (a proxy to their wider pool of resources) and their presence in traditional news media (a proxy to their status) significantly increase their probability of receiving more links, and therefore, their centrality. This adds a sociologically relevant dimension to the analysis of the web that has been disregarded so far but that is crucial to understand the way it distributes visibility. © 2009 Elsevier B.V. All rights reserved. 1. Introduction 2. Visibility and the structure of the web Links are the building blocks of the web. They open the channels through which users access information and they contribute to define the visibility of sites by making them more prominent for search engines. Links hold the key for the way information is accessed online: they not only open the roads to circulate the web but also signpost some flows of information more visibly than others. The more links a site receives, the more visible the site becomes because it is easier to encounter. Identifying the mechanisms that underlie the formation of links is relevant for three reasons: first, because links hide the local mechanisms that generate the decentralised structure of the web; second, because links determine the centrality of sites, and with that, the distribution of visibility online; and third, because by making sites more central and attracting audiences, links also contribute to attract investment from the advertising market. Finding out what mechanisms generate the structure of the web is important to reproduce its efficiency in the transmission of information; but it is also important from a less engineering, more sociological perspective: the efficiency of the web hinges on an uneven distribution of visibility that grants a competitive advantage to certain web sites when it comes to reaching audiences. By unravelling the factors that lead to the formation of links, this paper aims to shed light into the forces that give more prominence to certain sources of information and contents. Attention is a scarce commodity in all sorts of media, including the web: users can only devote a limited amount of time to process information and this leads to a competition between sources to gain their interest (DiMaggio et al., 2001, p. 313). It is therefore hardly surprising that some web sites are more successful in that competition than others; what is new is the role that links play in determining who gets the pole position, and the bias that this influence might be introducing in how information is retrieved (Lawrence and Giles, 1999). On the web a small number of sites attract a disproportionate number of links. These sites become the gravity centres of the web because, first, the more links they attracted in the past, the more links they are likely to attract in the future and, second, the more central these sites become, the more users will end up visiting them. Researchers have shown that a ‘rich get richer’ principle (Price, 1976) is enough to successfully reproduce the structure of the web: if, in a network in constant growth, new nodes send links to older nodes in proportion to the number of links they already receive, the structure that will emerge will have the same characteristic long-tail degree distribution exhibited by the web (Barabási and Albert, 1999; Barabási et al., 2000). This preferential attachment principle is not intended to capture an empirical mechanism (it is actually a black box when it comes to explaining what makes some websites send links to other sites), but it provides a stylised way of capturing the basic infrastructure of the web. It also suggests that time generates a path-dependency that is difficult to counteract and that gives an advantage to the most senior nodes. E-mail address: sandra.gonzalezbailon@oii.ox.ac.uk. 0378-8733/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.socnet.2009.07.003 272 S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 On the web links not only attract further links but also larger audiences: they open more points of entry to a given site and they influence the method that search engines use to retrieve information (Lawrence and Giles, 1999; Pennock et al., 2002; Cho and Roy, 2004). Search algorithms are very sensitive to the centrality of sites when establishing the relevance of web contents because they assume that the number of links reaching a site is a proxy to its quality: links become a key factor in determining the ranking a site will obtain in query results (Brin and Page, 1998; Tomlin, 2003). Because users are more likely to look at the top 10 results (Henzinger, 2007), search engines contribute to boost significantly the popularity of sites by making them more visible and a likelier destination. To the extent that large audiences attract the interest of online advertisers, links become relevant not only to understand how information is accessed, and what sources of information are more visible, but also who benefits from the political economy of the web. Links are therefore the roads and the signposts of the web but also the currency that measures value online—and it is this twofold function, and the social implications that derive from it, what provides the starting point of this paper. So far, most approaches to the web have assumed that links are proxies to either the quality of sites (Brin and Page, 1998; Tomlin, 2003) or to some sort of affiliation between the producers of those sites (Huberman, 2001; Adamic and Adar, 2003). These studies are important because, ultimately, they contribute to optimise techniques to retrieve information using the structure of links, and what they represent, as the main recommendation criteria. Yet these studies have not taken into account the impact that factors exogenous to the web, like the resources or status of those producing the sites, might have on online linking patterns, which is particularly striking because these have long been identified as crucial factors in shaping other interorganisational networks (Podolny, 2001; Diani, 2003; Baldassarri and Diani, 2007). The main claim this paper makes is that, given the public function that the web serves as a form of media, more attention should be paid to the impact that these factors have on its structure. The argument is developed as follows. First, the paper reviews the different approaches that have been used to analyse the structure of the web, paying special attention to how they account for the mechanisms driving the formation of links. Then, it introduces the empirical data on which the analyses are based. A description is given of the procedure used to sample the web, and the producers of those sites are characterised using a number of measures like their age, field of activity, economic resources, and presence in traditional news media, which is used as a proxy to their status. Section four presents the models used to identify the influence that these characteristics have in the creation of links and therefore in the distribution of centrality. What the models show is that, controlling for the structural properties of the network, and for the age of sites, the richer organisations and those with higher status are still more likely to receive links from other sites. The last section discusses how these findings qualify previous approaches to the structure of the web. 3. Approaches to link formation In principle, the web offers to users whatever information they want, as long as they know how to find it. In practice, users are more likely to access some web sites rather than others because they are more visible to the public. Gatekeepers to information like search engines play a crucial rule in directing users’ attention to certain destinations. They use the very backbone of the web, the structure of links, to rank its contents on the assumption that links are to sites what academic citations are to papers: an objective measure of relevance (Brin and Page, 1998). Some researchers have gone one step further by adding a semantic layer to the interpretation of links and analysing them not just as proxies to quality but also to common interests and affiliations (Huberman, 2001; Adamic and Adar, 2003). For both interpretations, links are essentially recommendations; they are not necessarily an endorsement (as with scientific citations, sources can be cited for criticism) but they are an obvious sign of acknowledgement: site A can only send a link to site B when it is aware of B’s existence, and each link that site B receives is a statement that it is, at least, worthwhile a visit. According to these two interpretations, the web is either a network of documents where links create a voting system that is used to identify the best contents, or it is a social network driven by homophily forces where users create links to similar others and, in doing so, shape the overall structure. And yet the possibility that the web, as other social networks, might also reflect an asymmetrical distribution of resources or status, inherited from offline relations, is somehow overlooked by these two approaches. What this possibility suggests is that the distribution of links might actually be reflecting hierarchies that are not necessarily related to quality or shared interests. A research institute might link to an organisation because it depends on its resources to fund its projects; or an international NGO might not reciprocate the links it receives from smaller organisations, even when they work on similar issues, because it does not need them to attain visibility or legitimacy as much as the smaller organisations need the NGO. Research on economic networks has shown that ties between organisations are not only the channels through which information or resources flow, but also assets that organisations use strategically to enhance their legitimacy in the eyes of potential consumers: alliances with highstatus third parties can improve the image and perception of an otherwise unknown organisation (Podolny, 2001). A similar distinction could apply to web links: they are the channels through which users find their way to online contents but, to the extent that they also signal alliances between organisations, they become an important clue for public recognition; in that sense, links from central, legitimised sites are more valuable than links from peripheral, unknown organisations. Links, like alliances between companies dealing with consumer uncertainty, can contribute to improve the image of an organisation. But to serve that purpose, links need to connect with the right partners. Researchers of social movements, particularly those adhering to resource mobilisation theory, have long acknowledged the importance of this strategic component in the formation of networks. The larger the size of organisational resources, the argument goes, the more influential and central an organisation will be in the network: ties with such an organisation are more valuable because they give access to a larger pool of resources (Diani, 2003). More recent research on civic associations has proposed an analytical distinction between two types of connections: identity and instrumental ties (Baldassarri and Diani, 2007). While the former are based on a common ground of values and interests, and promote the clustering of like-minded associations, the latter forge alliances with organisations that grant access to the resources necessary to achieve certain goals, even when these organisations do not necessarily share the same agenda or principles. These two types of connections result from different motivations, a difference that might also be reproduced on the web: links might signal affiliation, but they can also respond to a need to obtain resources that certain organisations would not able to get on their own, like more traffic flow in their websites. Being connected to a large international agency is, in this respect, more important than being connected to a smaller, local group because it contributes much more to improve visibility and public perception. The formation of links on the web could therefore be also related to the status and size of the resources managed by the organisations that publish the sites, just as it happens with other interorganisational networks. S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 However, as the following two sections show, this possibility has yet not been tested empirically. 3.1. Links as proxies to quality Search engines have greatly improved the quality of their results by using the information contained in the linking structure of the web. This structure is interpreted as a citation graph and it is assumed that pages that are well cited from many places around the web are the most interesting pages to browse. Following this logic, search engines propagate weights through the structure of the web in a way that gives more relevance to the links sent by the more central sites: a link sent by the World Bank weights more than a link sent by a small grassroots organisation when it comes to defining the relevance of the source being linked. This technology, originally devised by Google (Brin and Page, 1998) but soon adopted by most search engines (Tomlin, 2003), is based on a variation of the measure of eigenvector centrality: the centrality of a site, and its significance in defining the relevance of other sites, depends on how central the sites linking to it are themselves. However, the implementation of this technique rests on two further assumptions that are more subtle but still crucial if we are to understand the way in which the web has been conceptualised: first, that the web is essentially a network of documents, and second, that the position of those documents in the network has nothing to do with the attributes of the producers. Sociological research has usually associated measures of network centrality to power because the more central agents become in a network, the better positioned they are to control flows of information (Cook et al., 1983; Bonacich, 1987). Access to larger amounts of resources or to high-status partners can accelerate the centrality of agents in a network, but this possibility is mostly absent from how the web is conceived by its main gatekeepers, which sets the web artificially apart from how other interorganisational networks are formed. Instead, search engines conceive the web as analogous to citation networks. In science, the number of times that papers are cited by other papers is usually considered the best indicator of the significance of scientific work (Garfield, 1955; Cole and Cole, 1967). This reward system sometimes penalises individuals by setting off a rich-get-richer feedback mechanism that reinforces the visibility of renowned scholars but overall it has been said to play a functional role because it increases the salience of discoveries from which everybody benefits (Merton, 1968). The web, which grew as a repository of scientific documents, shares the same type of structure as citation networks: a scale-free, long-tail degree distribution where a minority of nodes concentrates the majority of the links (Price, 1976; Redner, 1998; Albert et al., 1999; Broder et al., 2000). But with its development, and the increasing participation of actors willing to reach audiences at any cost, the web started to resemble more a social network and less a network of documents: agents sending and receiving links had an interest in attaining the most visible positions and, most crucially, they were equipped with different stocks of resources to fulfil that aim. Whilst scientists do not have the power to make their colleagues reference a paper, a funding institution might condition its grants to receiving an explicit acknowledgement from the recipient; a small grassroots movement might be compelled to associate with a larger organisation to reach wider audiences; or local media platforms might collaborate and reference each other in order to compete with the logistics of established news organisations. These dynamics can only be identified if we approach the web not as a citation network but as an interorganisational network where links are used strategically to improve the position of the agents involved. A piece of evidence suggesting that links actually hide (and give expression to) different strategic behaviour is that linking patterns 273 change across web domains. Corporate sites, for instance, are less likely to send links to other sites: most of them occupy a section of the web that is easy to reach but difficult to leave because there are not many links offering a way out (Broder et al., 2000, p. 310). In addition, when considered apart, other subsets of the web, like university and newspapers homepages, or the sites published by scientists, do not follow the characteristic power-law distribution: relative to their own community, the sites that accumulate the larger number of links are not as far away from the mode (Pennock et al., 2002, p. 5208). These domains differ from each other in how much they deviate from the power-law prediction, but they share a ‘winners don’t take all’ feature that qualifies previous models of the web: when pages are compared with similar types, the unequal distribution of centrality is less extreme, and sites in different domains do not show the same tendency to prioritise a few nodes over the rest. What these findings suggest is that the scale-free nature of the web, and its distribution of visibility, hides generative mechanisms that cannot be reduced to endogenous forces explained only in terms of the quality of the contents published. But to conceptualise the web as a social network, the attributes of the agents behind its formation need to be incorporated in the analyses. Researchers who see links as reflecting alliances between organisations follow this line of inquiry, producing evidence in support of the claim that identity and homophily are, as in other social networks, a crucial factor to explain the structure of the web. Yet, as the following section shows, this line of research still leaves unexplored the instrumental role that links play both as signs of status and as channels for the mobilisation of resources like traffic. 3.2. Links as proxies to alliances Researchers factoring the notion of identity into the analysis of the web have found that, online, agents also build bridges to similar others to promote a common message. Different studies have shown that links between sites respond to the strategies and alliances of the organisations publishing them (Rogers and Marres, 2000; Adamic and Adar, 2003; Rogers, 2004; Adamic and Glance, 2005; Ackland et al., 2006). Individuals and organisations select the links that come out of their sites in line with their own agenda and interests. For instance, sites in the .com, .gov and .org domains dealing with global climate change follow different linking styles because the organisations behind those sites have different perspectives of the relevant issues. NGOs in the .org domain generate the densest networks, with most of their links going to other NGOs and governmental sites, and only a few to corporate sites; governmental sites, in turn, barely ever send links to other domains outside .gov, whilst corporate sites do just the opposite: most of their links are targeted outside their own .com domain (Rogers and Marres, 2000). What this research suggests is that links often respond to the same motivation that underlies identity ties between civic associations (Baldassarri and Diani, 2007), and that visibility on the web relies partly on relations that are forged offline, not just on the actual quality of the contents. Links reveal the public perception that organisations aim to build: this is why companies like Shell send links to Greenpeace but Greenpeace refuse to send links back to Shell (Rogers and Marres, 2000, p. 17). The tendency of web sites to prioritise connections to likeminded organisations falls in line with the homophily principle ubiquitous in other social networks (McPherson et al., 2001). Identifying this tendency is important because it contributes to explain the clustering of the web and can lead to the design of more refined search algorithms (Adamic, 1999); but it is also important because it opens a point of connection with research on interorganisational networks and its conceptual framework. As the previous section argued, of particular relevance is the distinction between identity 274 S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 and instrumental ties, and amongst the latter, between ties used to mobilise resources and ties used to signal status. However, to test whether these two types of connections have an impact on the structure of the web, other variables need to be taken into account in addition to those intended to approximate identity. Some researchers have analysed the impact of geographical distance and economic development on NGOs (Shumate and Dewitt, 2008) and university web networks (Thelwall, 2002); others have included characteristics of political candidates to analyse the linking patterns of political sites (Foot et al., 2002), and yet others have focused on the impact that language has on, for instance, the formation of university web networks (Vaughan, 2006). However, these studies do not make the conceptual distinction between identity and instrumental ties that this paper aims to test, in line with resource mobilisation theory and with what we know about interorganisational networks. This distinction is relevant because it qualifies the assumption that links are proxies to the quality of websites, and sheds light into factors that affect the way information is arranged and retrieved online. In order to test the impact that instrumental ties have on the structure of the web, this paper takes into account two new variables not considered so far in the literature: the economic resources of the organisations publishing the sites, and their status or prominence in public perception. The following section gives details on how these variables were operationalised; what follows are the two main hypotheses driving the collection of the data: Hypothesis 1—organisations managing larger pools of resources are more likely to attract a higher number of links. Hypothesis 2—organisations holding better public recognition or higher status are more likely to receive a higher number of links. Both hypotheses aim to identify the importance that instrumental links have in shaping the structure of the web, but they refer to different strategic aims: tapping into a larger pool of organisational resources in the first case, and enhancing public perception by associating with high-status organisations in the second case. If the web reflects the dynamics of other interorganisational networks, these two factors (economic resources and status) will have a positive effect in the centrality of sites for reasons that do not derive from the quality of their contents but from the prominence of the organisations that publish them. Their sites might still contain the best information, but their centrality would be reinforced by the strategic interests of the organisations linking to them, regardless of their intrinsic value. Given what we know about the factors that shape the structure of the web, testing these hypotheses requires introducing some controls. Additional variables that might exert an influence in the centrality of sites are, in the light of the literature explored above, homophily (which has been repeatedly found to be a crucial building block of the web) and age, both of the organisations and the sites: the assumption is that the longer a site has been online or the older an organisation is, the more opportunities both have to accumulate links and play more central roles in the network (Barabási et al., 2000; Diani, 2003). Considering these variables leads to the formulation of three additional hypotheses: Hypothesis 3—organisations working on similar issues are more likely to send links to each other. Hypothesis 4—older sites receive a higher number of links. Hypothesis 5—older organisations attract a higher number of links to their websites. These hypotheses will be tested controlling for the structure of the network so that the effects of the exogenous variables are not overestimated and we control for the influence of unmeasured attributes like geographical distance; and also to take into account endogenous mechanisms not captured by organisational attributes, like the tendency to reciprocate existing ties or engage in transitive clusters. The data gathered to test these hypotheses are presented in the next section. 4. Description of the data 4.1. Method for sampling the web The sample of the web used in the analyses was collected following the procedure summarised in Fig. 1. First, one thousand sites were randomly selected from the complete list of sites registered in the .org domain. We focus on this domain because it is one of the oldest and most popular domains on the web, and because it is one of the most representative: all sorts of organisations publish here, from charities and NGOs, environmental groups and grassroots organisations, to UN and intergovernmental agencies, professional associations and religious groups, to name some. Out of the initial random selection (stage A in Fig. 1), only 13% of the sites were operative; a content analysis was performed on each of them, obtaining information about the name and type of organisation publishing the site, and keeping track of the links sent to other recommended sites within the domain. This excluded links to non-relevant sites like hosting servers but also to commercial sites in the .com domain. Following those links, additional sites were added to the sample, and again information was obtained about the producers of those sites and their links (stage B in Fig. 1). The decision to proceed with the sampling following the links sent from the operative sites as opposed to using another random selection of the whole domain was taken for efficiency reasons: if 87% of every thousand sites randomly selected are not operative or are fake domains, it would have taken much more time to collect information for a network of the same size as the one considered here. Links from operative sites, on the other hand, are more likely to identify other sites that are also operative–hence the decision to snowball from them. In this second stage, a selection was made to extract from the sample the sites published by international organisations, or organisations internationally oriented. This filter was applied for two reasons. The first, methodological, was to avoid a bias in favour of US organisations, which do not use country code top level domains (as, for instance, .uk for the United Kingdom, .jp for Japan or .cn for China). These sites might attract an unrepresentative number Fig. 1. Data collection procedure. S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 275 Table 1 Descriptive measures of the producers of sites in the sample. Type of organisation N (%) Budget (millions, annual, dollars) Paid staff Media presence Charities-NGOs Political Health Environmental Religious Research UN Media Intergovernmental Professional Education Sports Security Total cases 217 (22) 149 (15) 116 (12) 113 (12) 85 (9) 70 (7) 60 (6) 49 (5) 30 (3) 42 (4) 15 (2) 15 (2) 6 (1) 967 22.8 (3.8–117.5) 3.0 (1.0–8.7) 17.8 (3.7–75.5) 8.9 (1.9–62.4) 11.8 (0.9–84.7) 7.2 (2.6–27) 180.4 (78–679.6) 43.8 (NA) 85.4 (12.2–571) 14 (0.8–40.1) 104.1 (54.4–153.9) 1.0 (0.2–1.5) 26 (NA) 270 26 (8–255) 2 (0–17) 17 (8–39) 2 (0–8) 23 (9–52) 2 (0–10) 22 (10–71) 3 (0–9) 27 (10–215) 1 (0–7) 24 (10–39) 4 (0–22) 541 (138–2847) 9 (1–48) 5 (5–6) 2 (0–4) 163 (68–574) 14 (4–92) 1 (1–87) 1 (0–11) 67 (38–96) 8 (3–60) 5 (0–25) 4 (0–15) NA 18 (2–34) 302 967 Yr foundation Yr foundation online 1986 (1968–1996) 1990 (1979–1996) 1992 (1978–1998) 1988 (1973–1995) 1963.5 (1940–1981) 1985 (1964–1994) 1971 (1951–1995) 1994 (1984–1999) 1967 (1959–1988) 1969 (1926–1991) 1973 (1925–1995) 1982 (1957–1992) 1992.5 (1961–1996) 539 1998 (1996–1999) 1998 (1997–2000) 1998.5 (1997–2000) 1998 (1996–2000) 1997 (1995–1999) 1997 (1996–1999) 1998 (1995–2000) 1999 1997–1999) 1996 (1995–1997) 1998 (1996–2000) 1996 (1994–1998) 2000 (1996–2001) 2001 (1997–2002) 909 Note: Cells with attributes show the median and the 1st and 3rd quantiles (in brackets). NA means the statistics could not be calculated due to missing values. of links, given that the US has one of the highest percentages of internet users. The second reason, operational, was to be able to complement the sample with information gathered from the Yearbook of International Organisations, and test whether the attributes of the organisations behind these sites contribute to explain their centrality. Snowballing from the sites published by international organisations, additional sites were added to the network, resulting in the final sample (stage C in Fig. 1), formed by about one thousand sites and more than seven thousand links. Analyses not reported in this paper show that the seed sites that resulted from the first round of data collection do not have a significant advantage in attracting a higher number of links, which means that the snawballing procedure is not imposing artificial centrality scores. Thirty-four of the sites in the final sample had no links with any other sites in the sample so they were removed from the analyses. The lack of links to or from these sites does not respond to any substantive reason: their isolation is rather an artefact of the data collection procedure and, in particular, of the size of the sample. Had the sample been larger, most of these isolated sites would have been connected to the other sites even if only through long paths: given what we know about the structure of the web, only a small percentage of sites are secluded in isolated components (Broder et al., 2000). According to the domain total registration figures, the fraction captured with this sample amounts to roughly 2% of all sites registered as .org; however, the real fraction of organisations sampled is probably higher given the high percentage of fake domains that are either not available or have automatic redirections to other domains like .com. The original random sample was selected in November 2004 and the snowballing procedure was applied between December 2004 and March 2005. 4.2. The attributes of the producers of sites The name and the field of activity of the organisations publishing the sites were collected as part of the content analysis in the procedure summarised above. In addition, information about other attributes was also collected using the Yearbook of International Organisations (printed edition) and the annual reports published online by the organisations themselves. These attributes included annual budget, number of paid staff, and year of foundation. The first two variables are intended to measure the amount of economic resources managed by the organisations. Since the network data was collected during 2005, the information about these attributes corresponds to 2004, or to the last available year before that. Annual reports were used when the Yearbook did not contain enough information: 47% of the organisations in the sample were not listed in the Yearbook, and some more contained missing information for the budget and paid staff variables. When annual reports were used, budget information was collected using the total income or total assets reported by the organisation. Information about the year of first online publication was collected using the search engine Alexa. The status of the organisations was measured using their visibility in traditional news media, on the assumption that high status organisations are more visible and receive more press coverage. This was operationalised as the number of times the sites of the organisations were cited by international newspapers (all full-text English language news and full-text and abstract news for other languages stored in the database LexisNexis) during the year previous to the collection of the sample. Table 1 provides some descriptive measures of these attributes. The first column in the table contains the categories in which sites were classified according to the field of activity of the organisations publishing them. This classification was done manually during the collection of the sample using the same definition given by the organisations themselves on their websites. When the nature of the organisation was not clearly specified, and more than one category could apply (for instance, some NGOs work in environmental issues) a decision was made to choose the category that defined more accurately the nature of the organisation: Greenpeace, for instance, was classified as an environmental organisation, not as an NGOs. This classification was done independently from the information contained in the Yearbook, which provides codes to identify different types of organisations. The reason is that many of the sites in our sample (47%, as mentioned above) were not included in the Yearbook, most of them being internet-based organisations. In order to check the robustness of this inductive classification, an inter-rater agreement test was run. The test showed that there is substantial agreement between the classification of sites displayed in Table 1 and two other classifications carried by independent researchers, with a Cohen’s Kappa coefficient of 0.61 for the three classifications. The second column of Table 1 specifies the relative size of each of these categories. As the figures show, the domain sampled here is mostly populated by charities and NGOs like Caritas, Amnesty International or the Red Cross. Political, health and environmental organisations follow: sites like Corporate Watch, Family Health International or Friends of the Earth amount to close to 40% of all the sites in the sample. The less numerous categories belong to education organisations like the Institute of International Education, sports associations like the International Athletics Foundation, and security sites like the International Code Council Foundation. The remaining columns in the table show information about the attributes of these organisations. Budget and paid staff are both measures of the economic resources managed by these organisations, and they intend to test the impact that organisational assets have on online centrality (hypothesis 1). The distance between the 276 S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 Fig. 2. Relative inequality in the distribution of centrality. median and the first and third quantiles show that the distribution of economic resources is significantly skewed to the right in all categories. The richest organisations are, as one would expect, UN and intergovernmental agencies; but charities and NGOs follow, with the American Council for Voluntary International Aid (InterAction) managing the larger amount of resources. According to the median, the less resourceful organisations in terms of budget are, aside from sports associations, those devoted to political issues, like Attac International or Minority Rights Group International, although in general these organisations have a comparatively better score when the level of economic resources is measured in terms of number of paid staff. Due to missing values (72% for budget and 69% for paid staff), the approximation to the economic resources of organisations is not very reliable for the less populated categories. The visibility of these sites in traditional media is, as the fourth column shows, also skewed to the right, meaning that a few sites get most of the attention from traditional news media and that there is an elite of high-status organisations. This variable aims to test to what extent the status of organisations contributes to increase the centrality of their sites (hypothesis 2). Again, the most visible organisations are intergovernmental and UN agencies, although sites devoted to security issues, like the Institute for the Analysis of Global Security, attain an even better visibility. In general, only 7% of the sites were cited more than a hundred times by traditional newspapers during the year previous to the collection of the data, with just two of them getting more than a thousand news citations. Finally, the last two columns examine the age of the organisations (44% of the cases are missing) and the age of the sites (6% missing). The youngest organisations are related to media issues, like the Independent Media Centre (Indymedia), and the oldest to religious groups, like the Alliance of Baptists. All the sites, however, went alive online around the same time. Intergovernmental agencies seem to have a lead on publishing on the web, followed by religious groups and research institutes like the Carnegie Endowment for International Peace. 4.3. The distribution of centrality The network formed by these organisations is, as expected, highly centralised in a few nodes. Fig. 2 captures the degree of inequality in the centrality scores of the web sites according to two measures: indegree (Freeman, 1979) and eigenvector centrality (Bonacich, 1987). The first refers to the number of links that reach a given site; the second measures the centrality of sites as a proportion of the centrality of the sites that link to it. As a baseline test, the observed distributions are compared with random networks of the same size and density assembled following a Bernoulli process. As expected, the Gini coefficients show that the inequality in the centrality of sites is significantly larger for the observed network, especially according to the eigenvector measure, a variety of which is used by search engines when determining the prominence of sites. This inequality is not surprising giving what we know about the long tail, scale-free properties of the web; but it provides the empirical starting point for the analyses presented in the following section and, in particular, for this question: What makes a few web sites be proportionally so much better connected than the majority of sites? The two main hypotheses driving this paper claim that underlying this uneven distribution of centrality there might S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 be mechanisms similar to those that explain the formation of other interorganisational networks. These networks rely heavily on instrumental ties that give access to more resourceful partners, who end up having the most central positions in the network (Diani, 2003). Online, resourceful organisations might also be more central because they count on the links sent by the organisations that depend on their financial support: if the non-profit organisation Family Care International is funded by the Gates Foundation, a link will acknowledge that part of its work depends on the funds granted by the Foundation. The same might happen with UN and intergovernmental agencies like the World Bank: this site is likely to be a hub not only (or not necessarily) because of the quality of its contents, but because the Bank is the pivot on which many organisations revolve. It is in this sense that instrumental ties defining alliances or partnerships offline might be having an effect on the structure of the web. In addition, organisations with better resources might also be able to hire the services that allow them to optimise their web sites and make them more visible for search engines. If sending a link to these sites can result in a reciprocated connection, organisations might be able to benefit from spilled-over traffic; this is another incentive to send links to prominent sites rather than to more peripheral organisations, again regardless of content. Visibility in traditional media, in turn, might also exert an influence in determining the centrality of sites because it contributes to enhance their public recognition and status (Podolny, 2001). Sites that link to highly visible organisations might be trying to increase their own visibility, if only by using links to influence their public perception and gain part of the audience that already trusts the high-status organisation. A link to Greenpeace as opposed to a less well known environmental group is more effective when an organisation wants to send the message that it is committed to the protection of the environment: all else equal (in this case, two organisations working in environmental issues) the organisation with a higher visibility might be a preferred partner because it helps to convey a message in a more efficient way. By taking into account the influence that offline status and economic resources have in linking patterns we can determine whether there are significant points of connection between visibility on the web and the dynamics that shape other interorganisational net- 277 works; this, in turn, can help us draw some important implications about how visibility is built online and about potential biases that this might be introducing in the way users retrieve information. 5. Disentangling the mechanisms of link formation The models presented in Table 2 belong to the exponential group of random graph models (ERGMs, also known as p* models, see Snijders et al., 2006; Robins et al., 2007a,b). Given that the focus of this paper lies in explaining the variance in the centrality (or indegree) of sites, these models were conditioned on all outdegrees: they incorporate no structural effects predicting the number of links that sites send, a feature that is modelled perfectly. The models fitted without this condition were not successful because the distribution of outdegrees is (as expected) very skewed. The assumption when conditioning on outdegrees is that the number of links that sites send is determined by factors that are internal to the organisations and for which the models control as fully as possible. This assumption is similar to that made by fixed effects regression in longitudinal analyses to control for omitted variables that differ between cases. The models were fitted using Siena v. 3.11 (Snijders et al., 2007). The parameter estimates identify what affects the probability that a site A will send a link to a site B. They are on a logit scale and should be interpreted as unstandardised effects in logistic regression. There are two types of estimates in these models: structural and attributes effects. As mentioned in the previous section, the structural effects aim to act as controls for the analysis of the exogenous attributes by modelling the configurations that characterise best the observed network. This ensures that the influence of organisations’ attributes is not overestimated, and that we control for the influence of unmeasured attributes, but also that we explicitly model relevant mechanisms, endogenous to the network, that are not reducible to the characteristics of the organisations. For instance, the structural effects in Model 1 tell us is that there is a significant degree of reciprocity and a significant tendency to form hierarchical connections, as suggested by the negative cyclic triad coefficient. There is also a significant clustering, as measured by the higher order transitivity parameter, which models not just Table 2 The impact of resources and status on the probability of links controlling for structure, homophily, and age (ERGMs). Parameters Structural effects Reciprocity Cyclic triads Popularity Higher order transitivity Association indegree and outdegree Direct and indirect links (reach) Indirect links (reach) Attribute effects Same field of activity Paid staff (of target) Paid staff (missing) Media visibility (of target) Online yr of foundation (of target) Online yr of foundation (missing) Yr of foundation of org (of target) Yr of foundation of org (missing) Large budget (of target) Large budget—similarity Large staff (of target) Large staff—similarity Large media visibility (of target) Large media visibility—similarity Model 1 Model 2 Model 3 Model 4 Est. SE Est. SE Est. SE Est. SE 1.337 −.131 .178 2.061 −.255 −.609 .251 .087 .024 .045 .038 .007 .056 .008 1.447 −.110 .057 1.965 −.248 −.525 .243 .087 .023 .049 .038 .007 .056 .008 1.470 −.108 .094 1.922 −.241 −.497 .236 .087 .022 .049 .039 .008 .056 .008 1.453 −.108 .089 1.940 −.245 −.504 .240 .087 .022 .049 .040 .007 .059 .008 .639 .057 −.003 .134 .020 .005 .027 .006 .671 .062 .043 .119 −.056 −.301 .003 −.108 .021 .006 .030 .007 .004 .091 .000 .006 .686 .057 .034 .100 −.057 −.314 .003 −.108 .121 −.038 −.088 −.018 −.198 −.419 .022 .007 .033 .007 .004 .087 .000 .038 .096 .103 .101 .108 .103 .114 .681 .053 .019 .004 .102 −.057 −.316 .003 −.103 .007 .004 .090 .000 .037 −.200 .033 Note: Estimates significant at the 5% level are printed in bold. 278 S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 the number of triangles present in the network but the extent to which these triangles exist in nested structures formed by more than three nodes. Both the direct and indirect links parameters act as controls for the estimation of transitivity: they differentiate the effects of links that are a prerequisite for transitivity from the effects of those links that do establish closure. The popularity parameter estimates the tendency of certain nodes to receive a high number of links, again placing decreasing probabilities on the higher indegrees. These parameters not only allow us to differentiate effects that contribute to generate the same configurations, giving a more precise account of the mechanisms that explain the emergence of the network; they also help in preventing the degeneracy of the models (more details in Snijders et al., 2006; Robins et al., 2007b). Interestingly, the popularity parameter (which, again, models the skewed indegree distribution of this network) losses the statistical significance as more attributes are introduced in the models. This suggests that the uneven distribution of centrality depicted in Fig. 2 is best explained in terms of factors that are exogenous to the web. With regard to the exogenous factors, the first model tests the influence that economic resources (as measured by number of paid staff) and status (as measured by visibility in traditional newspapers) have on the probability of receiving links. The coefficients are positive for the two variables, meaning that the richer and the more visible an organisation is, the more likely it is that its website will receive a link from another organisation. These effects take place controlling for the tendency of sites to link to similar sites: the positive coefficient of the parameter measuring homophily (‘same field of activity’) tells us that a link from site A to site B is more likely if both are classified under the same category. The fourth exogenous parameter in the model was introduced to control for missing cases in economic resources, that is, to control for the possibility that these missing values may contribute differently to the probability of links than the average values. This coefficient is not significant. Overall, the convergence of the model is acceptable: this is measured by t-ratios that summarise how much the values simulated by the model deviate from the observed values, so the closer they are to zero, the better the convergence is. By convention, good convergence is assumed when the t-values for all the parameters estimated are smaller than 0.15. This is the case for all the parameters except three (those accounting for the direct and indirect connections and the association of indegree and outdegree) where the t-ratios are between 0.17 and 0.18. Model 2 adds the age of the organisations to the equation, measured as the year in which they were founded and the year in which they first started to publish online. These variables were introduced as additional controls for the analysis of exogenous resources, and to test for the first-movers effect: those who start to publish on the web earlier might have more chances to become a target of links; likewise, older organisations may be more likely to engage in a higher number of partnerships simply because they have been available for a longer period of time. However, as the model shows, the data do not support the latter hypothesis: controlling for missing values, the year of foundation of an organisation has a positive impact on the probability of links, which means that as year of foundation increases (the younger the organisation is) the more likely it is that it receives a connection. Year of foundation online, in turn, generates the expected effect: the estimate is negative, which means that the longer the organisation has been publishing on the web, the more likely it is that other organisations will send links to it. The impact of economic resources and media visibility (or offline status) remains largely unchanged when controlling for these effects. The largest t-ratio in this model is 0.10. Model 3 tests another possible effect of resources and status: the existence of positive and negative assortativeness. The first would take place if the most resourceful or visible organisations (those in the top range of the respective distributions) would prioritise connections with each other. The second would take place in the opposite case: when organisations in the lower ranks of the distributions prioritise connections with those in the top range. To test for these effects, six new variables were introduced in the model. These variables classify organisations as being part (or not) of the top range of the distribution in economic resources (measured with budget and paid staff) and status (measured with visibility in news media). What these variables model is the influence that sharing a position in the top set has on the probability of creating links and, vice versa, the influence that being in the top set has on receiving links from the lower ranks of the distribution. These are dichotomised variables that obtain the value of 1 when an organisation is the top of the distribution and 0 otherwise. None of these variables is statistically significant with the exception of the similarity effect in large media visibility: organisations that are highly visible in newspapers, and hold better status and public recognition, do not tend to send links to each other; actually, the negative coefficient suggests that they rather try to avoid each other: when two organisations share the same high-status, a link between their sites is less likely. This supports the idea that links are used as strategic alliances to improve the status and visibility of organisations only when there is an asymmetrical starting point. Again, the effects identified in the previous two models remain largely unchanged and all t-values fall below 0.15. Model 4 confirms these trends, leaving out non-significant effects with the exception of popularity or indegree, without which the model did not converge well. The highest t-ratio for convergence in Model 4 is 0.12. A goodness-of-fit test was performed with Model 4 restricting the value of the popularity-indegree parameter to zero, and therefore hypothesizing that this is a dispensable effect once the other exogenous variables are controlled for. The Rao efficient score test was used (Snijders et al., 2007, p. 33), which assesses the difference between the expected indegree according to the model (where this parameter is assumed to be 0) and the indegree distribution observed in the network. The larger the difference is, the larger the misfit between the model and the network. According to the test, the corresponding p-value for the statistic measuring this difference is 0.07, so if we use a 5% level of confidence, the difference between the observed indegree and the modelled indegree is not statistically significant. This suggests that the exogenous variables considered in the model manage to reproduce successfully the centrality scores observed in the network, even in the absence of the indegree parameter. Going back to the hypotheses formulated above, Model 4 confirms that both the economic resources of organisations and their status are significant explanatory factors of their position in the network, much as it happens in other interorganisational networks. This influence takes place even when the age of both sites and organisations is taken into account, and possible assortative effects are controlled for. The influence of resources and status also holds in the presence of endogenous network mechanisms like reciprocity or transitivity, which confirms the instrumental role that links play on the web: many organisations might link to high-resources, highstatus partners because they want to profit from the advantages associated to that partnership, for instance an increased traffic flow in their websites—this is a likelier possibility if links end up being reciprocated, as the reciprocity parameter suggests. All in all, what these findings reveal is that links are not monolithic proxies to the quality of sites: they respond to social factors that are not necessarily related to the contents of the documents published online but rather to who is producing those contents. In the light of these results, the web acquires a dimension that is well known by the analysts of other social networks but that has been disregarded so far by those who study the structure of online connections: the relations of power embedded in the network. As S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 the influence of exogenous resources indicates, some agents enter the web from a position of strength that does not derive from their online activities but from their access to economic resources and offline visibility. These models suggest that more attention should be paid to how these variables shape the way users access information on the web. 6. Discussion Research on interorganisational networks has distinguished two types of ties: those based on identity, and those based on instrumental goals, used to either gain access to resources that would otherwise be out of reach, or to project a more recognisable image by associating with high-status organisations. This paper has applied this conceptual distinction to the analysis of the web using a sample of roughly a thousand sites, and incorporating resource and status variables to the analysis of the network structure. Building on previous research of the web, which had only explored the identity dimension of links, this paper has shown that organisations reproduce online many of their offline strategic alliances, responding to the same incentives to prioritise partnerships with the most resourceful and visible organisations. The findings reported in this paper suggest that, in line with resource mobilisation theory, the richer organisations are the more central on the web because they attract more links to their sites. This does not invalidate the possibility that centrality on the web might also contribute to increase the resources and offline visibility of organisations, particularly of those that were born with the internet. Many studies of the web have focused on that side of the relationship, highlighting stories of success that range from social movements, like the Zapatista struggle, to new business models and collaborative platforms, like for instance E-bay or Wikipedia (McCaughey and Ayers, 2003; Garrido and Halavais, 2003; Anderson, 2007; Tapscott and Williams, 2007). These cases have been used to illustrate how the internet in general, and the web in particular, are democratising access to the public domain by allowing some agents to grow in unprecedented ways, capture the attention of the international public and obtain funding and revenue along the way. Yet this paper focused on the less explored dimension of how the web is still reproducing old asymmetries and inertias. Further analysis is needed to determine how the two sides of this influence feed on each other. The data analysed here poses the question of how general these results are. What the findings presented suggest is that the networks formed in other domains, like .com, or using other web technologies, like blogs, are also shaped by the sort of exogenous factors identified here: well established corporations would, on average, have a competitive advantage in gaining links and users’ attention, and the blogs written by known academics or writers would, overall, be more likely to become central than those written by ordinary users. There is evidence suggesting that this is indeed the case (Hindman, 2009), but further research is needed to explore longitudinal trends and how much the influence of exogenous attributes changes over time. The web is a fast-changing medium, and the relevance of offline visibility, for instance, might diminish as users learn to trust the web more. The models presented here provide a baseline against which to assess the direction of that evolution. This assessment would help envision the future of the web, and detect potential biases affecting the way information is accessed. One of the morals of the findings presented here is that comparing the web with a network of documents might be misleading when interpreting what links represent. As explained, what determines the centrality of sites is not just the quality of the contents but the resources and status of the producers of those contents. Surely enough, resources also matter in the configuration of citation 279 networks: the papers produced from departments in the richest universities are more likely to become more cited and therefore more central. The best scientists, however, tend to self-select in better universities precisely because of the resources these make available; but the best scientists are still more likely to produce the best papers. Contents published on the web cannot be assessed using the same barometer as scientific papers: the influence of resources is more consequential on the web, especially given that it is as a form of public media. Studies in sociology and communication have long considered the negative effects that ownership and concentration can have in the public role of media. If concentration on the web is significantly affected by economic resources, this is a trend that requires further attention and analysis. This paper has presented empirical data that uncovers some of the forces that promote the formation of links between two sites. The mechanisms that underlie the formation of the web are particularly relevant because the structure of the web is used by most search engines as the main recommendation criteria to rank their results. This has surely improved the quality of searches but it might also be introducing biases in how information is accessed that, at least, are worthwhile identifying. If sites obtain a competitive advantage in attaining visibility on the basis of their economic resources and presence in traditional media, then the web might not be distributing visibility as meritocratically as it is often assumed. This paper has tried to uncover into this dimension of the web by showing that online networks follow similar dynamics to other interorganisational networks. Focusing on the mechanisms that online and offline networks have in common will give us a better understanding of how the web evolves and gives access to information. Acknowledgements Thanks to Tom Snijders for advice and guidance and to Michael Biggs, Tak Wing Chan, Jon Fahlander and Mike Thelwall for their comments and suggestions to previous versions of this paper. I am also grateful to three anonymous reviewers for their recommendations and to Lucy Power and Nesrine Abdel-Sattar for their research assistance. This work has been supported by the Economic and Social Research Council (ESRC, grant number PTA-026-27-1334), and it has benefited from the R + D project SEJ2006-00959/SOCI financed by the Spanish Ministry of Education and Science. References Ackland, R., O’Neil, M., Bimber, B., Gibson, R., Ward, S., 2006. New methods for studying online environmental-activist networks. In: Paper Presented at the International Sunbelt Social Network Conference, Vancouver. Adamic, L., 1999. The small world web. In: Abiteboul, S., Vercoustre, A.-M. (Eds.), Lecture Notes in Computer Science. Springer, New York, pp. 443–454. Adamic, L., Adar, E., 2003. Friends and neighbors on the web. Social Networks 25, 211–230. Adamic, L., Glance, N.S., 2005. The political blogosphere and the 2004 U.S. election: divided they blog. In: 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, WWW 2005, Japan. Albert, R., Jeong, H., Barabási, A.L., 1999. Diamater of the world-wide web. Nature 401, 130–131. Anderson, C., 2007. The Long Tail. How Endless Choice is Creating Unlimited Demand. Random House, London. Baldassarri, D., Diani, M., 2007. The integrative power of civic networks. American Journal of Sociology 113, 735–780. Barabási, A.L., Albert, R., 1999. Emergence of scaling in random networks. Science 286, 509–512. Barabási, A.L., Albert, R., Jeong, H., 2000. Scale-free characteristics of random networks: the topology of the world wide web. Physica A 281, 69–77. Bonacich, P., 1987. Power and centrality: a family of measures. American Journal of Sociology 92, 1170–1182. Brin, S., Page, L., 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., 2000. Graph structure in the Web. Computer Networks 33, 309–320. 280 S. Gonzalez-Bailon / Social Networks 31 (2009) 271–280 Cho, J., Roy, S., 2004. Impact of search engines on page popularity. In: WWW2004, New York, NY, US. Cole, S., Cole, J.R., 1967. Scientific output and recognition: a study in the operation of the reward system in science. American Sociological Review 32, 377– 390. Cook, K., Emerson, R.M., Gillmore, M.R., Yamagashi, T., 1983. The distribution of power in exchange networks: theory and experimental results. American Journal of Sociology 89, 275–305. Diani, M., 2003. ’Leaders’ or brokers? Positions and influence in social movement networks. In: Diani, M., McAdam, D. (Eds.), Social Movements and Networks. Relational Approaches to Collective Action. Oxford University Press, New York. DiMaggio, P., Hargittai, E., Russell Neuman, W., Robinson, J.P., 2001. Social implications of the internet. Annual Review of Sociology 27, 307–336. Foot, K.A., Schneider, S.M., Dougherty, M., Xenos, M., Larsen, E., 2002. Analyzing linking practices: Candidate sites in the 2002 US Electoral Web Sphere. Journal of Computer-Mediated Communication 8, 4. Freeman, L.C., 1979. Centrality in social networks: conceptual clarification. Social Networks 2, 215–239. Garfield, E., 1955. Citation indexes for sciences. Science 122, 108–111. Garrido, M., Halavais, A., 2003. Mapping networks of support for the zapatista movement: applying social-networks analysis to study contemporary social movements. In: McCaughey, M., Ayers, M.D. (Eds.), Cyberactivism: Online Activism in Theory and Practice. Routledge, London. Henzinger, M., 2007. Search technologies for the internet. Science 317, 468–471. Hindman, M.S., 2009. The Myth of Digital Democracy. Princeton University Press, Princeton, NJ. Huberman, B.A., 2001. The Laws of the Web: Patterns in the Ecology of Information. MIT Press, Cambridge, MA. Lawrence, S., Lee Giles, C., 1999. Accessibility of information on the web. Nature 400, 107–109. McCaughey, M., Ayers, M.D. (Eds.), 2003. Cyberactivism: Online Activism in Theory and Practice. Routledge, London. McPherson, M., Smith-Lovin, L., Cook, J., 2001. Birds of a feather: homophily in social networks. Annual Review of Sociology 27, 415–444. Merton, Robert K., 1968. The Matthew effect in science. Science 159, 56–63. Pennock, D.M., Flake, G.W., Lawrence, S., Glover, E.J., Lee Giles, C., 2002. Winners don’t take all: characterizing the competition for links on the web. Proceedings of the National Academy of Sciences 99, 5207–5211. Podolny, J.M., 2001. Networks as the pipes and prisms of the market. American Journal of Sociology 107, 33–60. Price, D.S., 1976. A general theory of bibliometric and other advantage processes. Journal of the American Society for Information Science 27, 292–306. Redner, S., 1998. How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B 4, 131–134. Robins, G., Pattison, P., Kalish, Y., Lusher, D., 2007a. An introduction to exponential random graph (p*) models for social networks. Social Networks 29, 169–172. Robins, G., Snijders, T.A.B., Wang, P., Handcock, M.S., Pattison, P., 2007b. Recent developments in exponential random graph (p*) models for social networks. Social Networks 29, 192–215. Rogers, R., Marres, N., 2000. Landscaping climate change: a mapping technique for understanding science & technology debates on the world wide web. Public Understanding of Science 9, 141–163. Rogers, R., 2004. Information Politics on the Web. The MIT Press, Cambridge, MA. Shumate, M., Dewitt, L., 2008. The North/South divide in NGO hyperlink networks. Journal of Computer-Mediated Networks 13, 405–428. Snijders, T.A., Pattison, P., Robins, G., Handcock, M.S., 2006. New specifications for exponential random graph models. Sociological Methodology 36, 99–153. Snijders, T.A.B., Steglich, C.E.G., Schweinberger, M., Huisman, M., 2007. Manual of SIENA version 3. ICS, University of Groningen, Groningen. Tapscott, D., Williams, A., 2007. Wikinomics. How Mass Collaboration Changes Everything. Atlantic Books, London. Thelwall, M., 2002. Evidence for the existence of geographic trends in university web site interlinking. Journal of Documentation 58, 563–574. Tomlin, J.A., 2003. A new paradigm for ranking pages on the world wide web. In: WWW2003, Budapest, Hungary. Vaughan, L., 2006. Visualizing linguistic and cultural differences using web co-link data. Journal of the American Society for Information Science and Technology 59, 628–643.
© Copyright 2025 Paperzz