How to Delete Duplicate Rows in DataFrames in Julia
Overview
To achieve this, we can use unique(). More precisely, it leaves only one of the duplicates rather than deleting duplicated rows.
Code
using DataFrames
WJSN = DataFrame(
member = ["다영","다원","루다","소정","수빈","연정","주연","지연","진숙","현정"],
birth = [99,97,97,95,96,99,98,95,99,94],
height = [161,167,157,166,159,165,172,163,162,165],
unit = ["쪼꼬미","메보즈","쪼꼬미","더블랙","쪼꼬미","메보즈","더블랙","더블랙","쪼꼬미","더블랙"]
)
sort!(WJSN, :birth)
unique(WJSN, :unit)
Let’s run the example code above and check its result.
julia> WJSN
10×4 DataFrame
Row │ member birth height unit
│ String Int64 Int64 String
─────┼───────────────────────────────
1 │ 현정 94 165 더블랙
2 │ 소정 95 166 더블랙
3 │ 지연 95 163 더블랙
4 │ 수빈 96 159 쪼꼬미
5 │ 다원 97 167 메보즈
6 │ 루다 97 157 쪼꼬미
7 │ 주연 98 172 더블랙
8 │ 다영 99 161 쪼꼬미
9 │ 연정 99 165 메보즈
10 │ 진숙 99 162 쪼꼬미
The WJSN dataframe looks like the above.
Removing duplicated rows in a single column with unique()
julia> unique(WJSN, :unit)
3×4 DataFrame
Row │ member birth height unit
│ String Int64 Int64 String
─────┼───────────────────────────────
1 │ 현정 94 165 더블랙
2 │ 수빈 96 159 쪼꼬미
3 │ 다원 97 167 메보즈
You can see that only one row remains for each :unit symbol.
Environment
- OS: Windows
- julia: v1.6.3
