csv文件與字典，列表等之間的轉(zhuǎn)換小結(jié)【Python】

econi 發(fā)布于2019-07-25 10:51 / 2344人閱讀

摘要：本文針對(duì)前面利用所做的一次數(shù)據(jù)匹配實(shí)驗(yàn)，整理了其中的一些對(duì)于文件的讀寫操作和常用的數(shù)據(jù)結(jié)構(gòu)如字典和列表之間的轉(zhuǎn)換文件與列表之間的轉(zhuǎn)換將列表轉(zhuǎn)換為文件將嵌套字典的列表轉(zhuǎn)換為文件將列表轉(zhuǎn)換為文件最基本的轉(zhuǎn)換，將列表中的元素逐行寫入到文件中將嵌套

本文針對(duì)前面利用Python 所做的一次數(shù)據(jù)匹配實(shí)驗(yàn)，整理了其中的一些對(duì)于csv文件的讀寫操作和常用的Python"數(shù)據(jù)結(jié)構(gòu)"（如字典和列表）之間的轉(zhuǎn)換
(Python Version 2.7)

csv文件與列表之間的轉(zhuǎn)換

將列表轉(zhuǎn)換為csv文件

將嵌套字典的列表轉(zhuǎn)換為csv文件

將列表轉(zhuǎn)換為csv文件

最基本的轉(zhuǎn)換，將列表中的元素逐行寫入到csv文件中

def list2csv(list, file):
    wr = csv.writer(open(file, "wb"), quoting=csv.QUOTE_ALL)
    for word in list:
        wr.writerow([word])

將嵌套字典的列表轉(zhuǎn)換為csv文件

這種屬于典型的csv文件讀寫，常見的csv文件常常是第一行為屬性欄，標(biāo)明各個(gè)字段，接下來每一行都是對(duì)應(yīng)屬性的值，讀取時(shí)常常用字典來存儲(chǔ)（key為第一行的屬性，value為對(duì)應(yīng)行的值）,例如

my_list = [{"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia",
            "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi",
            "players.date_of_birth": "08/02/1991", "players.team": "Bordeaux"},
           {"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia",
            "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi",
            "players.date_of_birth": "08/02/1991", "players.team": "Sunderland"},
           {"players.vis_name": "Lewis Baker", "players.role": "Midfielder", "players.country": "England",
            "players.last_name": "Baker", "players.player_id": "9574", "players.first_name": "Lewis",
            "players.date_of_birth": "25/04/1995", "players.team": "Vitesse"}
           ]

而最后所有的字典嵌套到一個(gè)列表中存儲(chǔ)，而接下來是一個(gè)逆過程，即將這種嵌套了字典的列表還原為csv文件存儲(chǔ)起來

# write nested list of dict to csv
def nestedlist2csv(list, out_file):
    with open(out_file, "wb") as f:
        w = csv.writer(f)
        fieldnames=list[0].keys()  # solve the problem to automatically write the header
        w.writerow(fieldnames)
        for row in list:
            w.writerow(row.values())

注意其中的fieldnames用于傳遞key即第一行的屬性

csv文件與字典之間的轉(zhuǎn)換

csv文件轉(zhuǎn)換為字典

第一行為key，其余行為value

每一行為key,value的記錄

csv文件轉(zhuǎn)換為二級(jí)字典

字典轉(zhuǎn)換為csv文件

第一行為key，其余行為value

每一行為key,value的記錄

csv文件轉(zhuǎn)換為字典

第一行為key，其余行為value

針對(duì)常見的首行為屬性，其余行為值的情形

# convert csv file to dict
# @params:
# key/value: the column of original csv file to set as the key and value of dict
def csv2dict(in_file,key,value):
    new_dict = {}
    with open(in_file, "rb") as f:
        reader = csv.reader(f, delimiter=",")
        fieldnames = next(reader)
        reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=",")
        for row in reader:
            new_dict[row[key]] = row[value]
    return new_dict

其中的new_dict[row[key]] = row[value]中的"key"和"value"是csv文件中的對(duì)應(yīng)的第一行的屬性字段,需要注意的是這里假設(shè)csv文件比較簡(jiǎn)單，所指定的key是唯一的，否則直接從csv轉(zhuǎn)換為dict文件會(huì)造成重復(fù)字段的覆蓋而丟失數(shù)據(jù)，如果原始數(shù)據(jù)指定作為key的列存在重復(fù)的情況，則需要構(gòu)建列表字典，將value部分設(shè)置為list，可參照列表字典的構(gòu)建部分代碼

每一行為key,value的記錄

針對(duì)每一行均為鍵值對(duì)的特殊情形
這里默認(rèn)認(rèn)為第一列為所構(gòu)建的字典的key，而第二列對(duì)應(yīng)為value，可根據(jù)需要進(jìn)行修改

# convert csv file to dict(key-value pairs each row)
def row_csv2dict(csv_file):
    dict_club={}
    with open(csv_file)as f:
        reader=csv.reader(f,delimiter=",")
        for row in reader:
            dict_club[row[0]]=row[1]
    return dict_club

[更新]

字典列表

構(gòu)造有值為列表的字典，主要適用于需要把csv中的某些列對(duì)應(yīng)的值作為某一個(gè)列的值的情形
或者說本身并不適合作為單純的字典結(jié)構(gòu)，同一個(gè)鍵對(duì)應(yīng)的值不唯一

# build a dict of list like {key:[...element of lst_inner_value...]}
# key is certain column name of csv file
# the lst_inner_value is a list of specific column name of csv file
def build_list_dict(source_file, key, lst_inner_value):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            for element in lst_inner_value:
                new_dict.setdefault(row[key], []).append(row[element])
    return new_dict
# sample:
# test_club=build_list_dict("test_info.csv","season",["move from","move to"])
# print test_club

csv文件轉(zhuǎn)換為二級(jí)字典

這個(gè)一般是特殊用途，將csv文件進(jìn)一步結(jié)構(gòu)化，將其中的某一列(屬性)所對(duì)應(yīng)的值作為key，然后將其余鍵值對(duì)構(gòu)成子字典作為value，一般用于匹配時(shí)優(yōu)先過濾來建立一種層級(jí)結(jié)構(gòu)提高準(zhǔn)確度
例如我有csv文件的記錄如下（以表格形式表示）

id	name	age	country
1	danny	21	China
2	Lancelot	22	America
...	...	...	...

經(jīng)過二級(jí)字典轉(zhuǎn)換后（假設(shè)構(gòu)建country-name兩級(jí)）得到如下字典

dct={"China":{"danny":{"id":"1","age":"21"}}
     "America":{"Lancelot":{"id":"2","age":"22"}}}

代碼如下

# build specific nested dict from csv files(date->name)
def build_level2_dict(source_file):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row["country"], dict())
            item[row["name"]] = {k: row[k] for k in ("id","age")}
            new_dict[row["country"]] = item
    return new_dict

[更新]
進(jìn)一步改進(jìn)后可以使用更加靈活一點(diǎn)的方法來構(gòu)建二級(jí)字典，不用修改內(nèi)部代碼，二是指定傳入的鍵和值，有兩種不同的字典構(gòu)建，按需查看

構(gòu)建的二級(jí)字典的各層級(jí)的鍵值均人為指定為某一列的值

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   inner_key:the inner level key of nested dict
#   inner_value:set the inner value for the inner key
def build_level2_dict2(source_file,outer_key,inner_key,inner_value):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            item[row[inner_key]] = row[inner_value]
            new_dict[row[outer_key]] = item
    return new_dict

指定第一層和第二層的字典的鍵，而將csv文件中剩余的鍵值對(duì)存儲(chǔ)為最內(nèi)層的值

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   inner_key:the inner level key of nested dict,and rest key-value will be store as the value of inner key
def build_level2_dict(source_file,outer_key,inner_key):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        reader = csv.reader(csv_file, delimiter=",")
        fieldnames = next(reader)
        inner_keyset=fieldnames
        inner_keyset.remove(outer_key)
        inner_keyset.remove(inner_key)
        csv_file.seek(0)
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            item[row[inner_key]] = {k: row[k] for k in inner_keyset}
            new_dict[row[outer_key]] = item
    return new_dict

還有另一種構(gòu)建二級(jí)字典的方法，利用的是pop()方法，但是個(gè)人覺得不如這個(gè)直觀，貼在下面

def build_dict(source_file):
    projects = defaultdict(dict)
    # if there is no header within the csv file you need to set the header 
    # and utilize fieldnames parameter in csv.DictReader method
    # headers = ["id", "name", "age", "country"]
    with open(source_file, "rb") as fp:
        reader = csv.DictReader(fp, dialect="excel", skipinitialspace=True)
        for rowdict in reader:
            if None in rowdict:
                del rowdict[None]
            nationality = rowdict.pop("country")
            date_of_birth = rowdict.pop("name")
            projects[nationality][date_of_birth] = rowdict
    return dict(projects)

[更新]
另外另種構(gòu)造二級(jí)字典的方法，主要是針對(duì)csv文件并不適合直接構(gòu)造單純的字典結(jié)構(gòu)，某些鍵對(duì)應(yīng)多個(gè)值，所以需要在內(nèi)部用列表來保存值，或者對(duì)每一個(gè)鍵值對(duì)用列表保存

用列表保存鍵值對(duì)

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct
#   {outer_key:[{pairs of lst_inner_value}]}
def build_level2_dict3(source_file,outer_key,lst_inner_value):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            new_dict.setdefault(row[outer_key], []).append({k: row[k] for k in lst_inner_value})
    return new_dict

用列表保存值域

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct
#   {outer_key:{key of lst_inner_value:[...value of lst_inner_value...]}}
def build_level2_dict4(source_file,outer_key,lst_inner_value):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            # print row
            item = new_dict.get(row[outer_key], dict())
            # item.setdefault("move from",[]).append(row["move from"])
            # item.setdefault("move to", []).append(row["move to"])
            for element in lst_inner_value:
                item.setdefault(element, []).append(row[element])
            new_dict[row[outer_key]] = item
    return new_dict

# build specific nested dict from csv files
# @params:
#   source_file
#   outer_key:the outer level key of nested dict
#   lst_inner_key:a list of column name
#   lst_inner_value: a list of column name,for circumstance that the inner value of the same lst_inner_key are not distinct
#   {outer_key:{lst_inner_key:[...lst_inner_value...]}}
def build_list_dict2(source_file,outer_key,lst_inner_key,lst_inner_value):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            # print row
            item = new_dict.get(row[outer_key], dict())
            item.setdefault(row[lst_inner_key], []).append(row[lst_inner_value])
            new_dict[row[outer_key]] = item
    return new_dict

# dct=build_list_dict2("test_info.csv","season","move from","move to")

構(gòu)造三級(jí)字典

類似的，可以從csv重構(gòu)造三級(jí)字典甚至多級(jí)字典，方法和上面的類似，就不贅述了，只貼代碼

# build specific nested dict from csv files
# a dict like {outer_key:{inner_key1:{inner_key2:{rest_key:rest_value...}}}}
# the params are extract from the csv column name as you like
def build_level3_dict(source_file,outer_key,inner_key1,inner_key2):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        reader = csv.reader(csv_file, delimiter=",")
        fieldnames = next(reader)
        inner_keyset=fieldnames
        inner_keyset.remove(outer_key)
        inner_keyset.remove(inner_key1)
        inner_keyset.remove(inner_key2)
        csv_file.seek(0)
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            sub_item = item.get(row[inner_key1], dict())
            sub_item[row[inner_key2]] = {k: row[k] for k in inner_keyset}
            item[row[inner_key1]] = sub_item
            new_dict[row[outer_key]] = item
    return new_dict

# build specific nested dict from csv files
# a dict like {outer_key:{inner_key1:{inner_key2:inner_value}}}
# the params are extract from the csv column name as you like
def build_level3_dict2(source_file,outer_key,inner_key1,inner_key2,inner_value):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            sub_item = item.get(row[inner_key1], dict())
            sub_item[row[inner_key2]] = row[inner_value]
            item[row[inner_key1]] = sub_item
            new_dict[row[outer_key]] = item
    return new_dict

這里同樣給出兩種根據(jù)不同需求構(gòu)建字典的方法，一種是將剩余的鍵值對(duì)原封不動(dòng)地保存為最內(nèi)部的值，另一種是只取所需要的鍵值對(duì)保留。

此外還有一種特殊情形，當(dāng)你的最內(nèi)部的值不是一個(gè)多帶帶的元素而需要是一個(gè)列表來存儲(chǔ)多個(gè)對(duì)應(yīng)同一個(gè)鍵的元素，則只需要對(duì)于最內(nèi)部的鍵值對(duì)進(jìn)行修改

# build specific nested dict from csv files
# a dict like {outer_key:{inner_key1:{inner_key2:[inner_value]}}}
# for multiple inner_value with the same inner_key2,thus gather them in a list
# the params are extract from the csv column name as you like
def build_level3_dict3(source_file,outer_key,inner_key1,inner_key2,inner_value):
    new_dict = {}
    with open(source_file, "rb")as csv_file:
        data = csv.DictReader(csv_file, delimiter=",")
        for row in data:
            item = new_dict.get(row[outer_key], dict())
            sub_item = item.get(row[inner_key1], dict())
            sub_item.setdefault(row[inner_key2], []).append(row[inner_value])
            item[row[inner_key1]] = sub_item
            new_dict[row[outer_key]] = item
    return new_dict

其中的核心部分是這一句
sub_item.setdefault(row[inner_key2], []).append(row[inner_value])

字典轉(zhuǎn)換為csv文件

每一行為key,value的記錄

第一行為key，其余行為value

輸出列表字典

每一行為key,value的記錄

前述csv文件轉(zhuǎn)換為字典的逆過程，比較簡(jiǎn)單就直接貼代碼啦

def dict2csv(dict,file):
    with open(file,"wb") as f:
        w=csv.writer(f)
        # write each key/value pair on a separate row
        w.writerows(dict.items())

第一行為key，其余行為value

def dict2csv(dict,file):
    with open(file,"wb") as f:
        w=csv.writer(f)
        # write all keys on one row and all values on the next
        w.writerow(dict.keys())
        w.writerow(dict.values())

輸出列表字典

其實(shí)這個(gè)不太常用，倒是逆過程比較常見，就是從常規(guī)的csv文件導(dǎo)入到列表的字典（本身是一個(gè)字典，csv文件的首行構(gòu)成鍵，其余行依次構(gòu)成對(duì)應(yīng)列下的鍵的值，其中值形成列表），不過如果碰到這種情形要保存為csv文件的話，做法如下

import csv
import pandas as pd
from collections import OrderedDict

dct=OrderedDict()
dct["a"]=[1,2,3,4]
dct["b"]=[5,6,7,8]
dct["c"]=[9,10,11,12]

header = dct.keys()
rows=pd.DataFrame(dct).to_dict("records")

with open("outTest.csv", "wb") as f:
    f.write(",".join(header))
    f.write("
")
    for data in rows:
        f.write(",".join(str(data[h]) for h in header))
        f.write("
")

這里用到了三個(gè)包，除了csv包用于常規(guī)的csv文件讀取外，其中OrderedDict用于讓csv文件輸出后保持原有的列的順序，而pandas則適用于中間的一步將列表構(gòu)成的字典轉(zhuǎn)換為字典構(gòu)成的列表，舉個(gè)例子

[("a", [1, 2, 3, 4]), ("b", [5, 6, 7, 8]), ("c", [9, 10, 11, 12])]
to
[{"a": 1, "c": 9, "b": 5}, {"a": 2, "c": 10, "b": 6}, {"a": 3, "c": 11, "b": 7}, {"a": 4, "c": 12, "b": 8}]

特殊的csv文件的讀取

這個(gè)主要是針對(duì)那種分隔符比較特殊的csv文件，一般情形下csv文件統(tǒng)一用一種分隔符是關(guān)系不大的（向上述操作基本都是針對(duì)分隔符統(tǒng)一用,的情形），而下面這種第一行屬性分隔符是,而后續(xù)值的分隔符均為;的讀取時(shí)略有不同，一般可逐行轉(zhuǎn)換為字典在進(jìn)行操作，代碼如下:

def func(id_list,input_file,output_file):
    with open(input_file, "rb") as f:
        # if the delimiter for header is "," while ";" for rows
        reader = csv.reader(f, delimiter=",")
        fieldnames = next(reader)

        reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=";")        
        rows = [row for row in reader if row["players.player_id"] in set(id_list)]
        # operation on rows...

可根據(jù)需要修改分隔符中的內(nèi)容.

關(guān)于csv文件的一些操作我在實(shí)驗(yàn)過程中遇到的問題大概就是這些啦，大部分其實(shí)都可以在stackoverflow上找到或者自己提問解決，上面的朋友還是很給力的，后續(xù)會(huì)小結(jié)一下實(shí)驗(yàn)過程中的一些對(duì)數(shù)據(jù)的其他處理如格式轉(zhuǎn)換，除重，重復(fù)判斷等等

最后，源碼我發(fā)布在github上的csv_toolkit里面，歡迎隨意玩耍~

更新日志
1、2016-12-22：改進(jìn)了構(gòu)建二級(jí)字典的方法，使其變得更加靈活
2、2016-12-24 14:55:30：加入構(gòu)造三級(jí)字典的方法
3、2017年1月9日11:26:59：最內(nèi)部可保存制定列的元素列表
4、2017年1月16日10:29:44：加入了列表字典的構(gòu)建；針對(duì)特殊二級(jí)字典的構(gòu)建（需要保存對(duì)應(yīng)同一個(gè)鍵的多個(gè)值）；
5、2017年2月9日10:54:41：加入新的二級(jí)列表字典的構(gòu)建
6、2017年2月10日11:18:01：改進(jìn)了簡(jiǎn)單的csv文件到字典的構(gòu)建代碼

GPU云服務(wù)器云服務(wù)器大數(shù)據(jù)與共享經(jīng)濟(jì)之間的聯(lián)系 csv文件導(dǎo)入數(shù)據(jù)庫(kù)的好處 python字典的鍵 python字典的用法

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://m.specialneedsforspecialkids.com/yun/38187.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

econi

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

tensorflow

閱讀 1694·2023-04-25 20:16
LinuxMirrors一鍵腳本徹底解決linux換源問題 – 讓linux寶塔

閱讀 3878·2021-10-09 09:54
勒索軟件攻擊致杜佩奇醫(yī)療集團(tuán)超65.5萬人數(shù)據(jù)泄露

閱讀 2710·2021-09-04 16:40
git常用命令速查表

閱讀 2527·2019-08-30 15:55
搜索結(jié)果頁優(yōu)化

閱讀 843·2019-08-29 12:37
Vue+Express+Mysql 全棧初體驗(yàn)

閱讀 2746·2019-08-26 13:55
前端簡(jiǎn)單面部識(shí)別

閱讀 2917·2019-08-26 11:42
JavaScript之“use strict”

閱讀 3159·2019-08-23 18:26

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來選購(gòu)！

csv文件與字典，列表等之間的轉(zhuǎn)換小結(jié)【Python】

第一行為key，其余行為value

每一行為key,value的記錄

字典列表

用列表保存鍵值對(duì)

用列表保存值域

每一行為key,value的記錄

第一行為key，其余行為value

輸出列表字典

相關(guān)文章

Python學(xué)習(xí)之路15-下載數(shù)據(jù)

**針對(duì)Python初學(xué)者，這13個(gè)好用到起飛的小技巧！**

**一文帶你斬殺Python之Numpy??Pandas全部操作【全網(wǎng)最詳細(xì)】???**

**十三個(gè)好用到起飛的Python技巧！**

Python3網(wǎng)絡(luò)爬蟲實(shí)戰(zhàn)---31、數(shù)據(jù)存儲(chǔ)：文件存儲(chǔ)

發(fā)表評(píng)論

0條評(píng)論

econi

男|高級(jí)講師

TA的文章

tensorflow

LinuxMirrors一鍵腳本徹底解決linux換源問題 – 讓linux寶塔

勒索軟件攻擊致杜佩奇醫(yī)療集團(tuán)超65.5萬人數(shù)據(jù)泄露

git常用命令速查表

搜索結(jié)果頁優(yōu)化

Vue+Express+Mysql 全棧初體驗(yàn)

前端簡(jiǎn)單面部識(shí)別

JavaScript之“use strict”

最新活動(dòng)