Introduction to the Stanford Network Analysis Project 📂Data Set

Introduction to the Stanford Network Analysis Project

Introduction

SNAP (Stanford Network Analysis Project) is a network analysis/mining library maintained by Stanford University, offering network data sizable enough to be considered massive networks. For instance, the network created using Twitter includes 17,069,982 users as nodes, with 476,553,560 tweets as links.

To be honest, there isn’t much data that would be useful for research or any practical applications, but it serves as an excellent practice for big data or network analysis. Given its quality, it would be hard to find a better source of data if you can find one that suits your needs.

Data Example

# Directed graph (each unordered pair of nodes is saved once): WikiTalk.txt 
# Communication network of Wikipedia (till January 2008). Directed edge A->B means user A edited talk page of B.
# Nodes: 2394385 Edges: 5021410
# FromNodeId	ToNodeId
0	1
2	1
2	21
2	46
2	63
2	88
2	93
2	94

For example, the WikiTalk.txt mentioned above is organized with a txt extension, which means it might require some preprocessing since it’s not a csv.

Requirements

There are no specific requirements, allowing for unlimited downloads.