logo

Data Set

In the realm of Data Science, arguably the most challenging and crucial aspect is securing and preprocessing data. Unfortunately, classes that teach what data exists in the world and how to acquire it are not common.

MarkSubcategory
😀Easy Access
😡Difficult Access
🔰Recommended for Beginners
👨‍🎓Recommended for Experts
👍Highly Recommended

While it might seem sufficient to follow the thumbs-up 👍, navigating the world of data is not always straightforward. The exact data needed for a particular task often does not exist, and compromises must be made. Having multiple alternatives, even if they are not ideal, is beneficial, and any data is better than none.

Structured Data

  • Environmental Big Data Platform: Provides an environmental data market, offering both free and paid data. (2021.08.02)
  • Meteorological Data Open Portal 👨‍🎓🔰: Offers meteorological and disaster-related data. (2021.08.03)
  • 👍 Our World in Data 😀🔰: Provides hundreds of annual data types related to various societal aspects, available by country and year, for free. Notably offers extensive COVID-19 data and statistics. (2021.12.30)

Time Series

Local Governments of Korea

  • D-Data Hub 😀: Offers public data for the Daegu region, with over 4,000 datasets and 13,000 services. (2021.06.08)
  • Changwon Big Data Portal: Provides 172 datasets across 12 categories, big data studio, and commercial analysis for the Changwon area. (2021.07.30)

Unstructured Data

  • AI Hub 👨‍🎓: Provides AI training data in fields such as voice/natural language, vision, healthcare, autonomous driving, safety, agriculture, and environment, in various formats like images, videos, texts, audio, 3D, and sensor data. (2021.07.14)
  • kaggle 😀🔰: The most famous open data hub globally, hosting countless datasets and hosting many competitions. (2021.07.15)
  • KDX Korea Data Exchange 😡👨‍🎓: Unlike general data hubs, it’s a company that sells data. Offers high-quality data suitable for Korean contexts, with both paid and free options available. (2021.08.06)

Networks

  • SEES:lab 👨‍🎓: Offers neatly organized data on networks such as airports and emails. (2021.12.31)
  • Stanford Network Analysis Project 👨‍🎓: Maintained by Stanford University, this library for network analysis/mining provides large network data. (2022.01.04)
  • OpenFlights: Provides data on global airports and airline networks. Some preprocessing may be required, but comprehensive network data of this scale is rare. (2022.01.10)
  • Mark Newman’s Network Data 😡: Offers 23 types of networks related to published research by the well-known Mark Newman. (2022.01.10)
  • World Pop: Offers data on the global aviation network, international migration statistics, urbanization, age, and gender structure. (2022.01.04)

Geographic Information

  • ITS National Transportation Information Center 😀👨‍🎓: Provides domestic traffic congestion, construction accidents, CCTV, traffic prediction, vehicle detectors, VMS, traffic safety assistants, variable speed signs, vulnerable section information, and nationwide standard node links. (2021.08.03)
  • 👍 GIS DEVELOPER 👨‍🎓: A blog run by GIS expert and developer Hyungjun Kim. For projects using Korean data, it’s said that nothing can be done without his help. (2023.01.10)

All posts