Small World Graphs

Friends and Neighbors
on the Web
Presentation for Web Information
Retrieval
Bruno Lepri
1
Outline







Objectives
Data Used
Small World Graphs
Predicting Friendship
Results
Future Works and Applications
Conclusions
2
Objectives


To devise techniques to mine Internet
in order to predict relationships
between individuals
To show that some pieces of
information (e. g. terms on
homepages) are better indicators of
social connections than others
3
Information Side Effects


By-products of data intended for one
use which can be mined to understand
tangential and larger scale phenomena
Our case: to extract large social
networks from individuals’ homepages
4
Data Used




Text on user homepage (cooccurrence of text → common
interest)
Out-links: from user homepage to
other pages
In-links: from other pages to user
homepage
Mailing lists
5
Small World Phenomenon


Real World Social Networks described
by Small World Phenomenon
Stanley Milgram’s Experiment (“The
Small World Problem”, 1967): Six
Degrees of Separation
6
Small World Phenomenon
(cont’d)


Adamic: World Wide Web is a Small
World Graph (“The Small World Web”,
1999)
Our hypothesis (confirmed by Stanford
and MIT personal homepages
networks): networks of personal
homepages are Small World Graphs
7
Stanford Graph
8
MIT Graph
9
Small World Graph
Properties
Watts & Strogatz (Collective Dynamics of
small-world networks, 1999):
 Clustering Coefficient C is much larger than
that of a Random Graph with same n° of
vertices and avg n° of edges per vertex

Characteristic Path Length L is almost as
small as L for the corresponding Random
Graph
10
Clustering Coefficient
(Watts & Strogatz, 1999)


If a vertex v has kv neighbors then
at most kv*(kv-1) directed edges can
exist between them
If Cv denotes the fraction of these
allowable edges that actually exists
then C is the avg over all v
11
Clustering Coefficient in
Friendship Graphs

Cv: reflects the extent to which
friends of v are also friends of
each other

C: measures the cliquishness of a
typical friendship circle
12
Predicting Friendship


To predict if one person is a friend
of another: we rank all users by
their similarity to that person
Hypothesis: friends are more
similar to each other than others
13
Similarity Measurement



Similarity measured analyzing text, links and
mailing-lists
To evaluate the likelihood that A is linked to
B: we sum the n° of items the 2 users have
in common
Weighting Scheme: items unique to a few
users are weighted more than common items
14
Friendship Prediction
Algorithm’s Evaluation

To evaluate the algorithm’s performance:
–
–

we compute how many friends have a
non-zero similarity score
we see what similarity rank the friends
were assigned to
Problem: friends can appear have no items
in common (little information about one of
2 users, users’ homepages used to
express different interests)
15
Coverage and Predictive
Ability of Data Sources

Avg rank was computed for matches above a
threshold such that all 4 data sources ranked an
equal n° of users
16
Have friends most in
common than friends of
friends?
17
Individual Item’s
Predictive Ability
Metric Used:
ratio of the n° of linked users pairs associated with
item divided by total n° of possible pairs
 Some Interesting Findings:

– Shared items unique to a community are at the top,
popular terms are at the bottom of MIT and Stanford lists
– Different shared items at the top of Stanford and MIT lists
(in MIT list, 5 of the top 10 terms are fraternities’ names)
– In-link Stanford and MIT lists dominated by individual
homepages
– Bad predictive MIT and Stanford mailing lists are very
general discussion lists, announcement lists and social
activities lists
19
Individual Item’s
Predictive Ability (cont’d)
20
Future Works


New data sources: demographic
information as address, year in school,
major, …
To solve the problem that individuals
interact with many people regularly,
but do not link to all of them through
web pages (possible solution: obtain
social links directly from users)
21
Applications



To mine the correlations between groups of
people (see: Pentland and Eagle works)
To facilitate networking inside a community
(see: LinkedIn)
Marketing research: to identify groups
interested in a product, to rely on the Social
Network to propagate information about
some products
22
Conclusions


Personal homepages provide a glimpse
into the social structure of university
communities
Important: personal homepages reveal
not only who knows to whom, but
they give a context (e. g. shared
hobbies, shared dorm)
23
Thank You For Your
Attention!
Questions?
24