logo

Data Set

Data Science: The most difficult and important part in data science is undoubtedly securing data and preprocessing it. Unfortunately, there are few courses that teach what data exists in the world and how to obtain it.

MarkDetailed Classification
😀Good Accessibility
😡Poor Accessibility
🔰Recommended for Beginners
👨‍🎓Recommended for Experts
👍Highly Recommended
  • To prevent overuse of the 👍 mark and maintain its scarcity, it is assigned to only one out of every ten posts.

At first glance, it might seem like you can just follow the thumbs-up 👍 marks, but the world of data isn’t that straightforward. Data that perfectly fits what you want to do usually doesn’t exist, and often you have to make do with what’s available, even if it’s lacking. The more alternatives you have, even for similar data, the better; and any data, even if not ideal, is better than none.

Structured Data

  • Environmental Big Data Platform: Provides an environmental data marketplace. Although it’s called a marketplace, there are many free datasets available. (2021.08.02)
  • Open Meteorological Data Portal 👨‍🎓🔰: Offers meteorological data and data related to disasters. (2021.08.03)
  • 👍 Our World in Data 😀🔰: Provides hundreds of types of annual data in various fields related to general society by country and year, all free of charge. Especially offers global data and statistics on COVID-19. (2021.12.30)
  • 🔒(24/12/18) Baseball Savant: Provides data related to MLB, the American baseball league. (2024.07.30)
  • 🔒(24/12/30) KOSIS National Data Portal: A statistical service provided by Statistics Korea; most officially available domestic data can be found on KOSIS. If you’re using Korean data, this is a must-know site. (2024.08.23)

Time Series

Korean Local Governments

  • D-Data Hub 😀: Provides public data of the Daegu region. Offers over 4,000 datasets and more than 13,000 services. (2021.06.08)
  • Changwon City Big Data Portal: Offers 172 datasets across 12 categories, as well as services like Big Data Studio and commercial area analysis for the Changwon region. (2021.07.30)

Unstructured Data

  • AI Hub 👨‍🎓: Provides AI training data. Covers various formats like images, videos, text, audio, 3D models, and sensor data in fields such as speech/natural language, vision, healthcare, autonomous driving, safety, agriculture/fisheries, land/environment, and education. (2021.07.14)
  • kaggle 😀🔰: The most famous open data hub globally, releasing countless types of data and hosting many small competitions. (2021.07.15)
  • KDX Korea Data Exchange 😡👨‍🎓: Unlike general data hubs, this is a company that sells data for a fee. As a paid service, it provides top-level quantity and quality of data suitable for Korean needs, and there are quite a few free datasets available. (2021.08.06)

Networks

  • SEES:lab 👨‍🎓: Provides neatly refined network data such as airports and emails. (2021.12.31)
  • Stanford Network Analysis Project 👨‍🎓: A network analysis and mining library maintained by Stanford University, offering network data that qualifies as massive networks. (2022.01.04)
  • OpenFlights: Provides data on airports and air routes worldwide. Although some preprocessing is required, network data of this scale are surprisingly rare. (2022.01.10)
  • Mark Newman’s Network Data 😡: Access network datasets from the famous Mark Newman. Twenty-three networks related to research published in papers are available. (2022.01.10)
  • World Pop: Offers data on global air networks, international migration statistics, urbanization, age and gender structures, and more. (2022.01.04)
  • 🔒(24/12/14) Web of Life: Provides ecological network data such as parasitic, symbiotic, and predatory relationships. (2024.07.30)
  • 🔒(24/12/22) Network Data Repository: Offers thousands of various networks across more than 30 topics. (2024.08.01)

Geographic Information

  • ITS National Traffic Information Center 😀👨‍🎓: Provides domestic traffic flow, construction accidents, CCTV footage, traffic predictions, vehicle detectors, VMS, traffic safety assistants, variable speed signs, vulnerable section information, and national standard nodes and links. (2021.08.03)
  • 👍 GIS DEVELOPER 👨‍🎓: A blog run by GIS expert and developer Hyungjun Kim. It is no exaggeration to say that no work can be done on projects using Korean data without his help. (2023.01.10)
  • 🔒(24/12/26) Administrative Standard Code Management System: Although not specific geographic information, you can obtain a list of “legal district codes,” the most important data corresponding to geographic information. (2024.08.23)

All posts