【问题标题】:Python unique values per column in csv file rowcsv文件行中每列的Python唯一值
【发布时间】:2023-04-03 11:25:01
【问题描述】:

为此苦苦思考了很长时间。有没有一种简单的方法使用 Numpy 或 Pandas 或修复我的代码来获取由“|”分隔的行中列的唯一值

即数据:

"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL"
"2","john","doe","htw","2000","dev"

输出应该是:

"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"

我的破代码:

import csv
import pprint

your_list = csv.reader(open('out.csv'))
your_list = list(your_list)

#pprint.pprint(your_list)
string = "|"
cols_no=6
for line in your_list:
    i=0
    for col in line:
      if i==cols_no:
        print "\n" 
        i=0
      if string in col:
        values = col.split("|")
        myset = set(values)
        items = list()
        for item in myset:
          items.append(item)
        print items
      else:
        print col+",",
      i=i+1

它输出:

id, fname, lname, education, gradyear, attributes, 1, john, smith, ['harvard', 'ft', 'mit']
['2003', '212', '207']
['qa', 'admin,co', 'NULL', 'master']
2, john, doe, htw, 2000, dev,

提前致谢!

【问题讨论】:

标签:
python
pandas
numpy