摘要:本文針對(duì)前面利用所做的一次數(shù)據(jù)匹配實(shí)驗(yàn),整理了其中的一些對(duì)于文件的讀寫操作和常用的數(shù)據(jù)結(jié)構(gòu)如字典和列表之間的轉(zhuǎn)換文件與列表之間的轉(zhuǎn)換將列表轉(zhuǎn)換為文件將嵌套字典的列表轉(zhuǎn)換為文件將列表轉(zhuǎn)換為文件最基本的轉(zhuǎn)換,將列表中的元素逐行寫入到文件中將嵌套
本文針對(duì)前面利用Python 所做的一次數(shù)據(jù)匹配實(shí)驗(yàn),整理了其中的一些對(duì)于csv文件的讀寫操作和常用的Python"數(shù)據(jù)結(jié)構(gòu)"(如字典和列表)之間的轉(zhuǎn)換
(Python Version 2.7)
將列表轉(zhuǎn)換為csv文件
將嵌套字典的列表轉(zhuǎn)換為csv文件
將列表轉(zhuǎn)換為csv文件最基本的轉(zhuǎn)換,將列表中的元素逐行寫入到csv文件中
def list2csv(list, file): wr = csv.writer(open(file, "wb"), quoting=csv.QUOTE_ALL) for word in list: wr.writerow([word])將嵌套字典的列表轉(zhuǎn)換為csv文件
這種屬于典型的csv文件讀寫,常見的csv文件常常是第一行為屬性欄,標(biāo)明各個(gè)字段,接下來每一行都是對(duì)應(yīng)屬性的值,讀取時(shí)常常用字典來存儲(chǔ)(key為第一行的屬性,value為對(duì)應(yīng)行的值),例如
my_list = [{"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia", "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi", "players.date_of_birth": "08/02/1991", "players.team": "Bordeaux"}, {"players.vis_name": "Khazri", "players.role": "Midfielder", "players.country": "Tunisia", "players.last_name": "Khazri", "players.player_id": "989", "players.first_name": "Wahbi", "players.date_of_birth": "08/02/1991", "players.team": "Sunderland"}, {"players.vis_name": "Lewis Baker", "players.role": "Midfielder", "players.country": "England", "players.last_name": "Baker", "players.player_id": "9574", "players.first_name": "Lewis", "players.date_of_birth": "25/04/1995", "players.team": "Vitesse"} ]
而最后所有的字典嵌套到一個(gè)列表中存儲(chǔ),而接下來是一個(gè)逆過程,即將這種嵌套了字典的列表還原為csv文件存儲(chǔ)起來
# write nested list of dict to csv def nestedlist2csv(list, out_file): with open(out_file, "wb") as f: w = csv.writer(f) fieldnames=list[0].keys() # solve the problem to automatically write the header w.writerow(fieldnames) for row in list: w.writerow(row.values())
注意其中的fieldnames用于傳遞key即第一行的屬性
csv文件與字典之間的轉(zhuǎn)換
csv文件轉(zhuǎn)換為字典
第一行為key,其余行為value
每一行為key,value的記錄
csv文件轉(zhuǎn)換為二級(jí)字典
字典轉(zhuǎn)換為csv文件
第一行為key,其余行為value
每一行為key,value的記錄
csv文件轉(zhuǎn)換為字典針對(duì)常見的首行為屬性,其余行為值的情形
# convert csv file to dict # @params: # key/value: the column of original csv file to set as the key and value of dict def csv2dict(in_file,key,value): new_dict = {} with open(in_file, "rb") as f: reader = csv.reader(f, delimiter=",") fieldnames = next(reader) reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=",") for row in reader: new_dict[row[key]] = row[value] return new_dict
其中的new_dict[row[key]] = row[value]中的"key"和"value"是csv文件中的對(duì)應(yīng)的第一行的屬性字段,需要注意的是這里假設(shè)csv文件比較簡(jiǎn)單,所指定的key是唯一的,否則直接從csv轉(zhuǎn)換為dict文件會(huì)造成重復(fù)字段的覆蓋而丟失數(shù)據(jù),如果原始數(shù)據(jù)指定作為key的列存在重復(fù)的情況,則需要構(gòu)建列表字典,將value部分設(shè)置為list,可參照列表字典的構(gòu)建部分代碼
針對(duì)每一行均為鍵值對(duì)的特殊情形
這里默認(rèn)認(rèn)為第一列為所構(gòu)建的字典的key,而第二列對(duì)應(yīng)為value,可根據(jù)需要進(jìn)行修改
# convert csv file to dict(key-value pairs each row) def row_csv2dict(csv_file): dict_club={} with open(csv_file)as f: reader=csv.reader(f,delimiter=",") for row in reader: dict_club[row[0]]=row[1] return dict_club
[更新]
構(gòu)造有值為列表的字典,主要適用于需要把csv中的某些列對(duì)應(yīng)的值作為某一個(gè)列的值的情形
或者說本身并不適合作為單純的字典結(jié)構(gòu),同一個(gè)鍵對(duì)應(yīng)的值不唯一
# build a dict of list like {key:[...element of lst_inner_value...]} # key is certain column name of csv file # the lst_inner_value is a list of specific column name of csv file def build_list_dict(source_file, key, lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: for element in lst_inner_value: new_dict.setdefault(row[key], []).append(row[element]) return new_dict # sample: # test_club=build_list_dict("test_info.csv","season",["move from","move to"]) # print test_clubcsv文件轉(zhuǎn)換為二級(jí)字典
這個(gè)一般是特殊用途,將csv文件進(jìn)一步結(jié)構(gòu)化,將其中的某一列(屬性)所對(duì)應(yīng)的值作為key,然后將其余鍵值對(duì)構(gòu)成子字典作為value,一般用于匹配時(shí)優(yōu)先過濾來建立一種層級(jí)結(jié)構(gòu)提高準(zhǔn)確度
例如我有csv文件的記錄如下(以表格形式表示)
id | name | age | country |
---|---|---|---|
1 | danny | 21 | China |
2 | Lancelot | 22 | America |
... | ... | ... | ... |
經(jīng)過二級(jí)字典轉(zhuǎn)換后(假設(shè)構(gòu)建country-name兩級(jí))得到如下字典
dct={"China":{"danny":{"id":"1","age":"21"}} "America":{"Lancelot":{"id":"2","age":"22"}}}
代碼如下
# build specific nested dict from csv files(date->name) def build_level2_dict(source_file): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row["country"], dict()) item[row["name"]] = {k: row[k] for k in ("id","age")} new_dict[row["country"]] = item return new_dict
[更新]
進(jìn)一步改進(jìn)后可以使用更加靈活一點(diǎn)的方法來構(gòu)建二級(jí)字典,不用修改內(nèi)部代碼,二是指定傳入的鍵和值,有兩種不同的字典構(gòu)建,按需查看
構(gòu)建的二級(jí)字典的各層級(jí)的鍵值均人為指定為某一列的值
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # inner_key:the inner level key of nested dict # inner_value:set the inner value for the inner key def build_level2_dict2(source_file,outer_key,inner_key,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) item[row[inner_key]] = row[inner_value] new_dict[row[outer_key]] = item return new_dict
指定第一層和第二層的字典的鍵,而將csv文件中剩余的鍵值對(duì)存儲(chǔ)為最內(nèi)層的值
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # inner_key:the inner level key of nested dict,and rest key-value will be store as the value of inner key def build_level2_dict(source_file,outer_key,inner_key): new_dict = {} with open(source_file, "rb")as csv_file: reader = csv.reader(csv_file, delimiter=",") fieldnames = next(reader) inner_keyset=fieldnames inner_keyset.remove(outer_key) inner_keyset.remove(inner_key) csv_file.seek(0) data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) item[row[inner_key]] = {k: row[k] for k in inner_keyset} new_dict[row[outer_key]] = item return new_dict
還有另一種構(gòu)建二級(jí)字典的方法,利用的是pop()方法,但是個(gè)人覺得不如這個(gè)直觀,貼在下面
def build_dict(source_file): projects = defaultdict(dict) # if there is no header within the csv file you need to set the header # and utilize fieldnames parameter in csv.DictReader method # headers = ["id", "name", "age", "country"] with open(source_file, "rb") as fp: reader = csv.DictReader(fp, dialect="excel", skipinitialspace=True) for rowdict in reader: if None in rowdict: del rowdict[None] nationality = rowdict.pop("country") date_of_birth = rowdict.pop("name") projects[nationality][date_of_birth] = rowdict return dict(projects)
[更新]
另外另種構(gòu)造二級(jí)字典的方法,主要是針對(duì)csv文件并不適合直接構(gòu)造單純的字典結(jié)構(gòu),某些鍵對(duì)應(yīng)多個(gè)值,所以需要在內(nèi)部用列表來保存值,或者對(duì)每一個(gè)鍵值對(duì)用列表保存
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct # {outer_key:[{pairs of lst_inner_value}]} def build_level2_dict3(source_file,outer_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: new_dict.setdefault(row[outer_key], []).append({k: row[k] for k in lst_inner_value}) return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_value: a list of column name,for circumstance that the inner value of the same outer_key are not distinct # {outer_key:{key of lst_inner_value:[...value of lst_inner_value...]}} def build_level2_dict4(source_file,outer_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: # print row item = new_dict.get(row[outer_key], dict()) # item.setdefault("move from",[]).append(row["move from"]) # item.setdefault("move to", []).append(row["move to"]) for element in lst_inner_value: item.setdefault(element, []).append(row[element]) new_dict[row[outer_key]] = item return new_dict
# build specific nested dict from csv files # @params: # source_file # outer_key:the outer level key of nested dict # lst_inner_key:a list of column name # lst_inner_value: a list of column name,for circumstance that the inner value of the same lst_inner_key are not distinct # {outer_key:{lst_inner_key:[...lst_inner_value...]}} def build_list_dict2(source_file,outer_key,lst_inner_key,lst_inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: # print row item = new_dict.get(row[outer_key], dict()) item.setdefault(row[lst_inner_key], []).append(row[lst_inner_value]) new_dict[row[outer_key]] = item return new_dict # dct=build_list_dict2("test_info.csv","season","move from","move to")構(gòu)造三級(jí)字典
類似的,可以從csv重構(gòu)造三級(jí)字典甚至多級(jí)字典,方法和上面的類似,就不贅述了,只貼代碼
# build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:{rest_key:rest_value...}}}} # the params are extract from the csv column name as you like def build_level3_dict(source_file,outer_key,inner_key1,inner_key2): new_dict = {} with open(source_file, "rb")as csv_file: reader = csv.reader(csv_file, delimiter=",") fieldnames = next(reader) inner_keyset=fieldnames inner_keyset.remove(outer_key) inner_keyset.remove(inner_key1) inner_keyset.remove(inner_key2) csv_file.seek(0) data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item[row[inner_key2]] = {k: row[k] for k in inner_keyset} item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict # build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:inner_value}}} # the params are extract from the csv column name as you like def build_level3_dict2(source_file,outer_key,inner_key1,inner_key2,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item[row[inner_key2]] = row[inner_value] item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict
這里同樣給出兩種根據(jù)不同需求構(gòu)建字典的方法,一種是將剩余的鍵值對(duì)原封不動(dòng)地保存為最內(nèi)部的值,另一種是只取所需要的鍵值對(duì)保留。
此外還有一種特殊情形,當(dāng)你的最內(nèi)部的值不是一個(gè)多帶帶的元素而需要是一個(gè)列表來存儲(chǔ)多個(gè)對(duì)應(yīng)同一個(gè)鍵的元素,則只需要對(duì)于最內(nèi)部的鍵值對(duì)進(jìn)行修改
# build specific nested dict from csv files # a dict like {outer_key:{inner_key1:{inner_key2:[inner_value]}}} # for multiple inner_value with the same inner_key2,thus gather them in a list # the params are extract from the csv column name as you like def build_level3_dict3(source_file,outer_key,inner_key1,inner_key2,inner_value): new_dict = {} with open(source_file, "rb")as csv_file: data = csv.DictReader(csv_file, delimiter=",") for row in data: item = new_dict.get(row[outer_key], dict()) sub_item = item.get(row[inner_key1], dict()) sub_item.setdefault(row[inner_key2], []).append(row[inner_value]) item[row[inner_key1]] = sub_item new_dict[row[outer_key]] = item return new_dict
其中的核心部分是這一句
sub_item.setdefault(row[inner_key2], []).append(row[inner_value])
每一行為key,value的記錄
第一行為key,其余行為value
輸出列表字典
前述csv文件轉(zhuǎn)換為字典的逆過程,比較簡(jiǎn)單就直接貼代碼啦
def dict2csv(dict,file): with open(file,"wb") as f: w=csv.writer(f) # write each key/value pair on a separate row w.writerows(dict.items())
def dict2csv(dict,file): with open(file,"wb") as f: w=csv.writer(f) # write all keys on one row and all values on the next w.writerow(dict.keys()) w.writerow(dict.values())
其實(shí)這個(gè)不太常用,倒是逆過程比較常見,就是從常規(guī)的csv文件導(dǎo)入到列表的字典(本身是一個(gè)字典,csv文件的首行構(gòu)成鍵,其余行依次構(gòu)成對(duì)應(yīng)列下的鍵的值,其中值形成列表),不過如果碰到這種情形要保存為csv文件的話,做法如下
import csv import pandas as pd from collections import OrderedDict dct=OrderedDict() dct["a"]=[1,2,3,4] dct["b"]=[5,6,7,8] dct["c"]=[9,10,11,12] header = dct.keys() rows=pd.DataFrame(dct).to_dict("records") with open("outTest.csv", "wb") as f: f.write(",".join(header)) f.write(" ") for data in rows: f.write(",".join(str(data[h]) for h in header)) f.write(" ")
這里用到了三個(gè)包,除了csv包用于常規(guī)的csv文件讀取外,其中OrderedDict用于讓csv文件輸出后保持原有的列的順序,而pandas則適用于中間的一步將列表構(gòu)成的字典轉(zhuǎn)換為字典構(gòu)成的列表,舉個(gè)例子
[("a", [1, 2, 3, 4]), ("b", [5, 6, 7, 8]), ("c", [9, 10, 11, 12])] to [{"a": 1, "c": 9, "b": 5}, {"a": 2, "c": 10, "b": 6}, {"a": 3, "c": 11, "b": 7}, {"a": 4, "c": 12, "b": 8}]特殊的csv文件的讀取
這個(gè)主要是針對(duì)那種分隔符比較特殊的csv文件,一般情形下csv文件統(tǒng)一用一種分隔符是關(guān)系不大的(向上述操作基本都是針對(duì)分隔符統(tǒng)一用,的情形),而下面這種第一行屬性分隔符是,而后續(xù)值的分隔符均為;的讀取時(shí)略有不同,一般可逐行轉(zhuǎn)換為字典在進(jìn)行操作,代碼如下:
def func(id_list,input_file,output_file): with open(input_file, "rb") as f: # if the delimiter for header is "," while ";" for rows reader = csv.reader(f, delimiter=",") fieldnames = next(reader) reader = csv.DictReader(f, fieldnames=fieldnames, delimiter=";") rows = [row for row in reader if row["players.player_id"] in set(id_list)] # operation on rows...
可根據(jù)需要修改分隔符中的內(nèi)容.
關(guān)于csv文件的一些操作我在實(shí)驗(yàn)過程中遇到的問題大概就是這些啦,大部分其實(shí)都可以在stackoverflow上找到或者自己提問解決,上面的朋友還是很給力的,后續(xù)會(huì)小結(jié)一下實(shí)驗(yàn)過程中的一些對(duì)數(shù)據(jù)的其他處理如格式轉(zhuǎn)換,除重,重復(fù)判斷等等
最后,源碼我發(fā)布在github上的csv_toolkit里面,歡迎隨意玩耍~
更新日志
1、2016-12-22: 改進(jìn)了構(gòu)建二級(jí)字典的方法,使其變得更加靈活
2、2016-12-24 14:55:30: 加入構(gòu)造三級(jí)字典的方法
3、2017年1月9日11:26:59: 最內(nèi)部可保存制定列的元素列表
4、2017年1月16日10:29:44:加入了列表字典的構(gòu)建;針對(duì)特殊二級(jí)字典的構(gòu)建(需要保存對(duì)應(yīng)同一個(gè)鍵的多個(gè)值);
5、2017年2月9日10:54:41: 加入新的二級(jí)列表字典的構(gòu)建
6、2017年2月10日11:18:01:改進(jìn)了簡(jiǎn)單的csv文件到字典的構(gòu)建代碼
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://m.specialneedsforspecialkids.com/yun/38187.html
摘要:本節(jié)中將繪制幅圖像收盤折線圖,收盤價(jià)對(duì)數(shù)變換,收盤價(jià)月日均值,收盤價(jià)周日均值,收盤價(jià)星期均值。對(duì)數(shù)變換是常用的處理方法之一。 《Python編程:從入門到實(shí)踐》筆記。本篇是Python數(shù)據(jù)處理的第二篇,本篇將使用網(wǎng)上下載的數(shù)據(jù),對(duì)這些數(shù)據(jù)進(jìn)行可視化。 1. 前言 本篇將訪問并可視化以兩種常見格式存儲(chǔ)的數(shù)據(jù):CSV和JSON: 使用Python的csv模塊來處理以CSV(逗號(hào)分隔的值)...
摘要:如果你也是學(xué)習(xí)愛好者,今天講述的個(gè)小技巧,真挺香歡迎收藏學(xué)習(xí),喜歡點(diǎn)贊支持。因此,鍵將成為值,而值將成為鍵。幸運(yùn)的是,這可以通過一行代碼快速完成。因此,我們的代碼不會(huì)因錯(cuò)誤而終止。 ...
目錄Numpy簡(jiǎn)介Numpy操作集合1、不同維度數(shù)據(jù)的表示1.1 一維數(shù)據(jù)的表示1.2 二維數(shù)據(jù)的表示1.3 三維數(shù)據(jù)的表示2、 為什么要使用Numpy2.1、Numpy的ndarray具有廣播功能2.2 Numpy數(shù)組的性能比Python原生數(shù)據(jù)類型高3 ndarray的屬性和基本操作3.1 ndarray的基本屬性3.2 ndarray元素類型3.3 創(chuàng)建ndarray的方式3.4 ndarr...
摘要:因其在各個(gè)領(lǐng)域的實(shí)用性與和等其他編程語言相比的生產(chǎn)力以及與英語類似的命令而廣受歡迎。反轉(zhuǎn)字典一個(gè)非常常見的字典任務(wù)是如果我們有一個(gè)字典并且想要反轉(zhuǎn)它的鍵和值。 ??...
摘要:如果該文件已存在,文件指針將會(huì)放在文件的結(jié)尾。運(yùn)行結(jié)果以上是讀取文件的方法。為了輸出中文,我們還需要指定一個(gè)參數(shù)為,另外規(guī)定文件輸出的編碼。 上一篇文章:Python3網(wǎng)絡(luò)爬蟲實(shí)戰(zhàn)---30、解析庫(kù)的使用:PyQuery下一篇文章:Python3網(wǎng)絡(luò)爬蟲實(shí)戰(zhàn)---32、數(shù)據(jù)存儲(chǔ):關(guān)系型數(shù)據(jù)庫(kù)存儲(chǔ):MySQL 我們用解析器解析出數(shù)據(jù)之后,接下來的一步就是對(duì)數(shù)據(jù)進(jìn)行存儲(chǔ)了,保存的形式可以...
閱讀 1694·2023-04-25 20:16
閱讀 3878·2021-10-09 09:54
閱讀 2710·2021-09-04 16:40
閱讀 2527·2019-08-30 15:55
閱讀 843·2019-08-29 12:37
閱讀 2746·2019-08-26 13:55
閱讀 2917·2019-08-26 11:42
閱讀 3159·2019-08-23 18:26