How to Delete Duplicate Rows in DataFrames in Julia 📂Julia

How to Delete Duplicate Rows in DataFrames in Julia

Overview

To achieve this, we can use unique(). More precisely, it leaves only one of the duplicates rather than deleting duplicated rows.

Code

using DataFrames

WJSN = DataFrame(
    member = ["다영","다원","루다","소정","수빈","연정","주연","지연","진숙","현정"],
    birth = [99,97,97,95,96,99,98,95,99,94],
    height = [161,167,157,166,159,165,172,163,162,165],
    unit = ["쪼꼬미","메보즈","쪼꼬미","더블랙","쪼꼬미","메보즈","더블랙","더블랙","쪼꼬미","더블랙"]
)
sort!(WJSN, :birth)

unique(WJSN, :unit)

Let’s run the example code above and check its result.

julia> WJSN
10×4 DataFrame
 Row │ member  birth  height  unit   
     │ String  Int64  Int64   String 
─────┼───────────────────────────────
   1 │ 현정       94     165  더블랙
   2 │ 소정       95     166  더블랙
   3 │ 지연       95     163  더블랙
   4 │ 수빈       96     159  쪼꼬미
   5 │ 다원       97     167  메보즈
   6 │ 루다       97     157  쪼꼬미
   7 │ 주연       98     172  더블랙
   8 │ 다영       99     161  쪼꼬미
   9 │ 연정       99     165  메보즈
  10 │ 진숙       99     162  쪼꼬미

The WJSN dataframe looks like the above.

Removing duplicated rows in a single column with `unique()`

julia> unique(WJSN, :unit)
3×4 DataFrame
 Row │ member  birth  height  unit   
     │ String  Int64  Int64   String 
─────┼───────────────────────────────
   1 │ 현정       94     165  더블랙
   2 │ 수빈       96     159  쪼꼬미
   3 │ 다원       97     167  메보즈

You can see that only one row remains for each :unit symbol.

Environment

OS: Windows
julia: v1.6.3