Data Acquisition
In Data Science, the most difficult and important part is undoubtedly acquiring data and preprocessing it. Unfortunately, there are few classes that teach what data exists in this world and how to obtain it.
| Mark | Classification |
|---|---|
| 😀 | Good Accessibility |
| 😡 | Poor Accessibility |
| 🔰 | Beginner Recommended |
| 👨🎓 | Expert Recommended |
| 👍 | Highly Recommended |
- Giving 👍 indiscriminately loses meaning, so it is given to only one per 10 posts to maintain rarity.
Looking at just the marks, it might seem like you should just follow 👍, but the world of data isn’t that straightforward. The data that perfectly matches what you want to do usually doesn’t exist in the world, and if there’s a shortage, you often have no choice but to accept the regret and use what’s available. Similar data with many alternatives is better, and even bad data is better than nothing.
Structured Data
- Environmental Big Data Platform : Provides an environmental data marketplace. While called a marketplace, it has many free data, so it depends on how you use it. (2021.08.02)
- Korea Meteorological Administration Data Portal 👨🎓🔰: Provides meteorological and disaster-related data. (2021.08.03)
- 👍 Our World in Data 😀🔰: Provides hundreds of types of annual data across various fields related to general society by country and year at no cost. Particularly provides global data and statistics regarding COVID-19. (2021.12.30)
- Baseball Savant: Provides data related to MLB, America’s baseball league. (2024.07.30)
- KOSIS National Data Portal: A statistical service provided by Statistics Korea. Most officially available data in Korea can be found on KOSIS. It’s a must-know site if you use domestic data. (2024.08.23)
- The Materials Project (MP): Provides data related to properties of various inorganic materials. (2026.02.02)
Time Series
- investing.com 😀🔰: Investing.com is a global financial information website that conveniently provides chart information for stocks like KOSPI and KOSDAQ for free. (2021.07.30)
- CYBOS Plus 😡👨🎓: An Open API from Daewoo Securities that provides almost all information and functions needed in trading systems, including daily or real-time time series data such as stock codes, closing prices, market capitalization, and institutional net purchases. (2021.07.15)
- 🔒(26/04/14)HYCOM Introduction 😡: Provides global ocean data in the form of multidimensional tensors, including sea surface temperature, height, flow velocity, and temperature at various depths. (2026.03.19)
Korean Local Governments
- D-Data Hub 😀: Provides public data for the Daegu region. Provides over 4,000 datasets and over 13,000 services. (2021.06.08)
- Changwon City Big Data Portal: Provides 12 categories and 172 datasets from the Changwon region, as well as services like big data studios and business analysis. (2021.07.30)
Unstructured Data
- AI Hub 👨🎓: Provides data for AI training. Covers various formats such as images, videos, text, audio, 3D, and sensor data in fields including speech/natural language, vision, healthcare, autonomous driving, safety, agriculture/fisheries, national land and environment, and education. (2021.07.14)
- kaggle 😀🔰: The world’s most famous open data hub, offering countless diverse datasets and hosting many smaller competitions. (2021.07.15)
- KDX Korea Data Exchange 😡👨🎓: Unlike typical data hubs, it’s a company that sells data for a fee. Being paid, it maintains the highest level of quality and quantity of data suited to Korea’s circumstances, and many free datasets are also available for sale. (2021.08.06)
Network
- SEES:lab 👨🎓: Network data for airports, email, etc. is cleanly organized. (2021.12.31)
- Stanford Network Analysis Project 👨🎓: A network analysis/mining library maintained by Stanford University that provides what can be called massive network data. (2022.01.04)
- OpenFlights: Provides data on world airports and aviation networks. While requiring some preprocessing, surprisingly, network data of this scale is uncommon. (2022.01.10)
- Mark Newman Network Data 😡: You can access the legendary Mark Newman’s network datasets. 23 types of networks related to published research are available. (2022.01.10)
- World Pop: Provides data on global aviation networks, international migration statistics, urbanization, age and gender structure, and more. (2022.01.04)
- Web of Life: Provides ecosystem network data for parasitism, mutualism, predation relationships, and more. (2024.07.30)
- Network Data Repository: Provides thousands of diverse networks across over 30 topics. (2024.08.01)
Geographic Information
- ITS National Traffic Information Center 😀👨🎓: Provides domestic traffic flow, construction accidents, CCTV, traffic forecasting, vehicle sensors, VMS, traffic safety assistance, variable speed signs, vulnerable section information, and nationwide standard node links. (2021.08.03)
- 👍 GIS DEVELOPER 👨🎓: A blog run by Kim Hyung-jun, a GIS expert and developer. It’s no exaggeration to say that for any project using Korean data, it’s impossible to accomplish anything without this person’s help. (2023.01.10)
- Administrative Standard Code Management System: Not specific geographic information, but provides the most important list of ’legal district codes’ corresponding to geographic information. (2024.08.23)
- ISO3 by Country and Latitude-Longitude Data: Can obtain needed ISO codes and latitude-longitude for the entire world. (2025.05.16)
All posts
- Introduction to D-Data Hub
- Introduction to AI Hub
- How to Download Data Using Kaggle API, Solving OSError: Could Not Find kaggle.json.
- Introduction to Kaggle
- Introduction to Investment Information Open API CYBOS Plus
- CYBOS Plus Installation Tutorial
- How to Load Stock Codes with CYBOS Plus CpUtil.CpStockCode
- How to Fetch Stock Prices for Securities Using CYBOS Plus CpSysDib.StockChart
- How to Import Institutional and Foreign Trade Volume with CYBOS Plus
- How to Fetch Short Selling Trends with CYBOS Plus
- Introduction to Changwon City Big Data Portal
- Introduction to investing.com
- Introduction to the Environmental Big Data Platform
- Introduction to the Meteorological Data Open Portal
- Introduction to the ITS National Transportation Information Center
- Introduction to the Korea Data Exchange (KDX)
- Introduction to Our World in Data
- SEES:lab Introduction
- Introduction to World Pop
- Introduction to the Stanford Network Analysis Project
- Introduction to OpenFlights
- Introduction to Network Data by Mark Newman
- Introduction to GIS Developer
- Country-Specific ISO3 and Latitude Longitude Data
- Introduction to the Web of Life
- Introduction to Baseball Savant
- Introduction to Network Data Repository
- Introduction to the Administrative Standard Code Management System
- Introduction to KOSIS National Data Portal
- Introduction to The Materials Project
- Using the Materials Project API in Python
- Querying Data from the Materials Project API Client
