Introduction to the Stanford Network Analysis Project
Introduction
SNAP (Stanford Network Analysis Project) is a network analysis/mining library maintained by Stanford University, offering network data sizable enough to be considered massive networks. For instance, the network created using Twitter includes 17,069,982 users as nodes, with 476,553,560 tweets as links.
To be honest, there isn’t much data that would be useful for research or any practical applications, but it serves as an excellent practice for big data or network analysis. Given its quality, it would be hard to find a better source of data if you can find one that suits your needs.
Data Example
# Directed graph (each unordered pair of nodes is saved once): WikiTalk.txt
# Communication network of Wikipedia (till January 2008). Directed edge A->B means user A edited talk page of B.
# Nodes: 2394385 Edges: 5021410
# FromNodeId ToNodeId
0 1
2 1
2 21
2 46
2 63
2 88
2 93
2 94
For example, the WikiTalk.txt
mentioned above is organized with a txt
extension, which means it might require some preprocessing since it’s not a csv
.
Requirements
There are no specific requirements, allowing for unlimited downloads.
Categories
- Social networks
- Networks with ground-truth communities
- Communication networks
- Citation networks
- Collaboration networks
- Web graphs
- Amazon networks
- Internet networks
- Road networks
- Autonomous systems
- Signed networks
- Location-based online social networks
- Wikipedia networks, articles, and metadata
- Temporal networks
- Twitter and Memetracker
- Online communities
- Online reviews and Amazon
- User actions
- Face-to-face communication networks
- Graph classification datasets
Links
- Dataset download: https://snap.stanford.edu/data/index.html