国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

10分鐘了解Pandas基礎(chǔ)知識

shaonbean / 3003人閱讀

摘要:選擇多行通過一個會通過索引對行進行切片,由于前面設(shè)置了索引為日期格式,所以可以方便的直接使用日期范圍進行篩選。選擇指定行列的數(shù)據(jù)同,代表全部。按季度劃分,每個月開始為頻率一中下一個月的早上點。

背景

在數(shù)據(jù)分析中pandas舉足輕重,學(xué)習(xí)pandas最好的方法就是看官方文檔,以下是根據(jù)官方文檔10 Minutes to pandas學(xué)習(xí)記錄。(官方標(biāo)題10分鐘,感覺起碼得半個小時吧)

pandas中主要有兩種數(shù)據(jù)類型,可以簡單的理解為:

Series:一維數(shù)組

DateFrame:二維數(shù)組(矩陣)

有了大概的概念之后,開始正式認(rèn)識pandas:

首先要引入對應(yīng)的包:

import numpy as np
import pandas as pd
新建對象 Object Creation

Series

可以通過傳入一個list對象來新建Series,其中空值為np.nan:

s = pd.Series([1,3,4,np.nan,7,9])
s
Out[5]: 
0    1.0
1    3.0
2    4.0
3    NaN
4    7.0
5    9.0
dtype: float64

pandas會默認(rèn)創(chuàng)建一列索引index(上面的0-5)。我們也可以在創(chuàng)建時就指定索引:

pd.Series([1,3,4,np.nan,7,9], index=[1,1,2,2,"a",4])
Out[9]: 
1    1.0
1    3.0
2    4.0
2    NaN
a    7.0
4    9.0
dtype: float64

要注意的是,索引是可以重復(fù)的,也可以是字符。

DataFrame

新建一個DataFrame對象可以有多種方式:

通過傳入一個numpy的數(shù)組、指定一個時間的索引以及一個列名。

dates = pd.date_range("20190101", periods=6)
dates
Out[11]: 
DatetimeIndex(["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04",
               "2019-01-05", "2019-01-06"],
              dtype="datetime64[ns]", freq="D")
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list("ABCD"))
df
Out[18]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
2019-01-06  0.221597 -0.753038 -1.741256  0.287280

通過傳入一個dict對象

df2 = pd.DataFrame({"A":1.,
                    "B":pd.Timestamp("20190101"),
                    "C":pd.Series(1, index=list(range(4)), dtype="float32"),
                    "D":np.array([3]*4, dtype="int32"),
                    "E":pd.Categorical(["test", "tain", "test", "train"]),
                    "F":"foo"})
df2
Out[27]: 
     A          B    C  D      E    F
0  1.0 2019-01-01  1.0  3   test  foo
1  1.0 2019-01-01  1.0  3   tain  foo
2  1.0 2019-01-01  1.0  3   test  foo
3  1.0 2019-01-01  1.0  3  train  foo

這里我們指定了不同的類型,可以通過如下查看:

df2.dtypes
Out[28]: 
A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

可以看出DataFrame和Series一樣,在沒有指定索引時,會自動生成一個數(shù)字的索引,這在后續(xù)的操作中十分重要。

查看 Viewing Data

查看開頭幾行或者末尾幾行:

df.head()
Out[30]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
df.tail(3)
Out[31]: 
                   A         B         C         D
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
2019-01-06  0.221597 -0.753038 -1.741256  0.287280

可以通過添加行數(shù)參數(shù)來輸出,默認(rèn)為輸出5行。

查看索引和列名

df.index
Out[32]: 
DatetimeIndex(["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04",
               "2019-01-05", "2019-01-06"],
              dtype="datetime64[ns]", freq="D")
df.columns
Out[33]: Index(["A", "B", "C", "D"], dtype="object")

使用DataFrame.to_numpy()轉(zhuǎn)化為numpy數(shù)據(jù)。需要注意的是由于numpy array類型數(shù)據(jù)只可包含一種格式,而DataFrame類型數(shù)據(jù)可包含多種格式,所以在轉(zhuǎn)換過程中,pandas會找到一種可以處理DateFrame中國所有格式的numpy array格式,比如object。這個過程會耗費一定的計算量。

df.to_numpy()
Out[35]: 
array([[ 0.67162219,  0.78572584,  0.39243527,  0.87469243],
       [-2.42070338, -1.11620768, -0.34607048,  0.78594081],
       [ 1.36442543, -0.94764138,  2.38688005,  0.58537186],
       [-0.48597971, -1.28145415,  0.35406263, -1.41885798],
       [-1.12271697, -2.78904135, -0.79181242, -0.17434484],
       [ 0.22159737, -0.75303807, -1.74125564,  0.28728004]])
df2.to_numpy()
Out[36]: 
array([[1.0, Timestamp("2019-01-01 00:00:00"), 1.0, 3, "test", "foo"],
       [1.0, Timestamp("2019-01-01 00:00:00"), 1.0, 3, "tain", "foo"],
       [1.0, Timestamp("2019-01-01 00:00:00"), 1.0, 3, "test", "foo"],
       [1.0, Timestamp("2019-01-01 00:00:00"), 1.0, 3, "train", "foo"]],
      dtype=object)

上面df全部為float類型,所以轉(zhuǎn)換會很快,而df2涉及多種類型轉(zhuǎn)換,最后全部變成了object類型元素。

查看數(shù)據(jù)的簡要統(tǒng)計結(jié)果

df.describe()
Out[37]: 
              A         B         C         D
count  6.000000  6.000000  6.000000  6.000000
mean  -0.295293 -1.016943  0.042373  0.156680
std    1.356107  1.144047  1.396030  0.860725
min   -2.420703 -2.789041 -1.741256 -1.418858
25%   -0.963533 -1.240143 -0.680377 -0.058939
50%   -0.132191 -1.031925  0.003996  0.436326
75%    0.559116 -0.801689  0.382842  0.735799
max    1.364425  0.785726  2.386880  0.874692

轉(zhuǎn)置

df.T
Out[38]: 
   2019-01-01  2019-01-02  2019-01-03  2019-01-04  2019-01-05  2019-01-06
A    0.671622   -2.420703    1.364425   -0.485980   -1.122717    0.221597
B    0.785726   -1.116208   -0.947641   -1.281454   -2.789041   -0.753038
C    0.392435   -0.346070    2.386880    0.354063   -0.791812   -1.741256
D    0.874692    0.785941    0.585372   -1.418858   -0.174345    0.287280

按坐標(biāo)軸排序,其中axis參數(shù)為坐標(biāo)軸,axis默認(rèn)為0,即橫軸(對行排序),axis=1則為縱軸(對列排序);asceding參數(shù)默認(rèn)為True,即升序排序,ascending=False則為降序排序:

df.sort_index(axis=1)
Out[44]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
2019-01-06  0.221597 -0.753038 -1.741256  0.287280
df.sort_index(axis=1, ascending=False)
Out[45]: 
                   D         C         B         A
2019-01-01  0.874692  0.392435  0.785726  0.671622
2019-01-02  0.785941 -0.346070 -1.116208 -2.420703
2019-01-03  0.585372  2.386880 -0.947641  1.364425
2019-01-04 -1.418858  0.354063 -1.281454 -0.485980
2019-01-05 -0.174345 -0.791812 -2.789041 -1.122717
2019-01-06  0.287280 -1.741256 -0.753038  0.221597

可見df.sort_index(axis=1)是按列名升序排序,所以看起來沒有變化,當(dāng)設(shè)置ascending=False時,列順序變成了DCBA

按數(shù)值排序:

df.sort_values(by="B")
Out[46]: 
                   A         B         C         D
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-06  0.221597 -0.753038 -1.741256  0.287280
2019-01-01  0.671622  0.785726  0.392435  0.874692
df.sort_values(by="B", ascending=False)
Out[47]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-06  0.221597 -0.753038 -1.741256  0.287280
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345

篩選 Selection

獲取某列

df["A"]
Out[49]: 
2019-01-01    0.671622
2019-01-02   -2.420703
2019-01-03    1.364425
2019-01-04   -0.485980
2019-01-05   -1.122717
2019-01-06    0.221597
Freq: D, Name: A, dtype: float64
type(df.A)
Out[52]: pandas.core.series.Series

也可直接用df.A,注意這里是大小寫敏感的,這時候獲取的是一個Series類型數(shù)據(jù)。

選擇多行

df[0:3]
Out[53]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-03  1.364425 -0.947641  2.386880  0.585372
df["20190102":"20190104"]
Out[54]: 
                   A         B         C         D
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858

通過一個[]會通過索引對行進行切片,由于前面設(shè)置了索引為日期格式,所以可以方便的直接使用日期范圍進行篩選。

通過標(biāo)簽選擇

選擇某行

df.loc[dates[0]]
Out[57]: 
A    0.671622
B    0.785726
C    0.392435
D    0.874692
Name: 2019-01-01 00:00:00, dtype: float64

選擇指定行列的數(shù)據(jù)

df.loc[:, ("A", "C")]
Out[58]: 
                   A         C
2019-01-01  0.671622  0.392435
2019-01-02 -2.420703 -0.346070
2019-01-03  1.364425  2.386880
2019-01-04 -0.485980  0.354063
2019-01-05 -1.122717 -0.791812
2019-01-06  0.221597 -1.741256

df.loc["20190102":"20190105", ("A", "C")]
Out[62]: 
                   A         C
2019-01-02 -2.420703 -0.346070
2019-01-03  1.364425  2.386880
2019-01-04 -0.485980  0.354063
2019-01-05 -1.122717 -0.791812

傳入第一個參數(shù)是行索引標(biāo)簽范圍,第二個是列索引標(biāo)簽,:代表全部。

選定某值

df.loc["20190102", "A"]
Out[69]: -2.420703380445092
df.at[dates[1], "A"]
Out[70]: -2.420703380445092

可以通過loc[]at[]兩種方式來獲取某值,但需要注意的是,由于行索引為datetime類型,使用loc[]方式獲取時,可直接使用20190102字符串來代替,而在at[]中,必須傳入datetime類型,否則會有報錯:

df.at["20190102", "A"]

  File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 449, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas/_libs/index.pyx", line 455, in pandas._libs.index.DatetimeEngine._date_check_type
KeyError: "20190102"

通過位置選擇

選擇某行

df.iloc[3]
Out[71]: 
A   -0.485980
B   -1.281454
C    0.354063
D   -1.418858
Name: 2019-01-04 00:00:00, dtype: float64

iloc[]方法的參數(shù),必須是數(shù)值。

選擇指定行列的數(shù)據(jù)

df.iloc[3:5, 0:2]
Out[72]: 
                   A         B
2019-01-04 -0.485980 -1.281454
2019-01-05 -1.122717 -2.789041
df.iloc[:,:]
Out[73]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
2019-01-06  0.221597 -0.753038 -1.741256  0.287280

df.iloc[[1, 2, 4], [0, 2]]
Out[74]: 
                   A         C
2019-01-02 -2.420703 -0.346070
2019-01-03  1.364425  2.386880
2019-01-05 -1.122717 -0.791812

loc[]:代表全部。

選擇某值

df.iloc[1, 1]
Out[75]: -1.1162076820700824
df.iat[1, 1]
Out[76]: -1.1162076820700824

可以通過iloc[]iat[]兩種方法獲取數(shù)值。

按條件判斷選擇

按某列的數(shù)值判斷選擇

df[df.A > 0]
Out[77]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-03  1.364425 -0.947641  2.386880  0.585372
2019-01-06  0.221597 -0.753038 -1.741256  0.287280

篩選出符合要求的數(shù)據(jù)

df[df > 0]
Out[78]: 
                   A         B         C         D
2019-01-01  0.671622  0.785726  0.392435  0.874692
2019-01-02       NaN       NaN       NaN  0.785941
2019-01-03  1.364425       NaN  2.386880  0.585372
2019-01-04       NaN       NaN  0.354063       NaN
2019-01-05       NaN       NaN       NaN       NaN
2019-01-06  0.221597       NaN       NaN  0.287280

不符合要求的數(shù)據(jù)均會被賦值為空NaN

使用isin()方法篩選

df2 = df.copy()
df2["E"] = ["one", "one", "two", "three", "four", "three"]
df2
Out[88]: 
                   A         B         C         D      E
2019-01-01  0.671622  0.785726  0.392435  0.874692    one
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941    one
2019-01-03  1.364425 -0.947641  2.386880  0.585372    two
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  three
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345   four
2019-01-06  0.221597 -0.753038 -1.741256  0.287280  three
df2["E"].isin(["two", "four"])
Out[89]: 
2019-01-01    False
2019-01-02    False
2019-01-03     True
2019-01-04    False
2019-01-05     True
2019-01-06    False
Freq: D, Name: E, dtype: bool
df2[df2["E"].isin(["two", "four"])]
Out[90]: 
                   A         B         C         D     E
2019-01-03  1.364425 -0.947641  2.386880  0.585372   two
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  four

注意isin必須嚴(yán)格一致才行,df中的默認(rèn)數(shù)值小數(shù)點位數(shù)很長,并非顯示的5位,為了方便展示,所以新增了E列。直接用原數(shù)值,情況如下,可看出[1,1]位置符合要求。

df.isin([-1.1162076820700824])
Out[95]: 
                A      B      C      D
2019-01-01  False  False  False  False
2019-01-02  False   True  False  False
2019-01-03  False  False  False  False
2019-01-04  False  False  False  False
2019-01-05  False  False  False  False
2019-01-06  False  False  False  False

設(shè)定值

通過指定索引設(shè)定列

s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20190102", periods=6))
s1
Out[98]: 
2019-01-02    1
2019-01-03    2
2019-01-04    3
2019-01-05    4
2019-01-06    5
2019-01-07    6
Freq: D, dtype: int64
df["F"]=s1
df
Out[101]: 
                   A         B         C         D    F
2019-01-01  0.671622  0.785726  0.392435  0.874692  NaN
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941  1.0
2019-01-03  1.364425 -0.947641  2.386880  0.585372  2.0
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  3.0
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  4.0
2019-01-06  0.221597 -0.753038 -1.741256  0.287280  5.0

空值會自動填充為NaN

通過標(biāo)簽設(shè)定值

df.at[dates[0], "A"] = 0
df
Out[103]: 
                   A         B         C         D    F
2019-01-01  0.000000  0.785726  0.392435  0.874692  NaN
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941  1.0
2019-01-03  1.364425 -0.947641  2.386880  0.585372  2.0
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  3.0
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  4.0
2019-01-06  0.221597 -0.753038 -1.741256  0.287280  5.0

通過為止設(shè)定值

df.iat[0, 1] = 0
df
Out[105]: 
                   A         B         C         D    F
2019-01-01  0.000000  0.000000  0.392435  0.874692  NaN
2019-01-02 -2.420703 -1.116208 -0.346070  0.785941  1.0
2019-01-03  1.364425 -0.947641  2.386880  0.585372  2.0
2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  3.0
2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  4.0
2019-01-06  0.221597 -0.753038 -1.741256  0.287280  5.0

通過NumPy array設(shè)定值

df.loc[:, "D"] = np.array([5] * len(df))
df
Out[109]: 
                   A         B         C  D    F
2019-01-01  0.000000  0.000000  0.392435  5  NaN
2019-01-02 -2.420703 -1.116208 -0.346070  5  1.0
2019-01-03  1.364425 -0.947641  2.386880  5  2.0
2019-01-04 -0.485980 -1.281454  0.354063  5  3.0
2019-01-05 -1.122717 -2.789041 -0.791812  5  4.0
2019-01-06  0.221597 -0.753038 -1.741256  5  5.0

通過條件判斷設(shè)定值

df2 = df.copy()
df2[df2 > 0] = -df2
df2
Out[112]: 
                   A         B         C  D    F
2019-01-01  0.000000  0.000000 -0.392435 -5  NaN
2019-01-02 -2.420703 -1.116208 -0.346070 -5 -1.0
2019-01-03 -1.364425 -0.947641 -2.386880 -5 -2.0
2019-01-04 -0.485980 -1.281454 -0.354063 -5 -3.0
2019-01-05 -1.122717 -2.789041 -0.791812 -5 -4.0
2019-01-06 -0.221597 -0.753038 -1.741256 -5 -5.0

空值處理 Missing Data

pandas默認(rèn)使用np.nan來表示空值,在統(tǒng)計計算中會直接忽略。

通過reindex()方法可以新增、修改、刪除某坐標(biāo)軸(行或列)的索引,并返回一個數(shù)據(jù)的拷貝:

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
df1.loc[dates[0]:dates[1], "E"] = 1
df1
Out[115]: 
                   A         B         C  D    F    E
2019-01-01  0.000000  0.000000  0.392435  5  NaN  1.0
2019-01-02 -2.420703 -1.116208 -0.346070  5  1.0  1.0
2019-01-03  1.364425 -0.947641  2.386880  5  2.0  NaN
2019-01-04 -0.485980 -1.281454  0.354063  5  3.0  NaN

刪除空值

df1.dropna(how="any")
Out[116]: 
                   A         B        C  D    F    E
2019-01-02 -2.420703 -1.116208 -0.34607  5  1.0  1.0

填充空值

df1.fillna(value=5)
Out[117]: 
                   A         B         C  D    F    E
2019-01-01  0.000000  0.000000  0.392435  5  5.0  1.0
2019-01-02 -2.420703 -1.116208 -0.346070  5  1.0  1.0
2019-01-03  1.364425 -0.947641  2.386880  5  2.0  5.0
2019-01-04 -0.485980 -1.281454  0.354063  5  3.0  5.0

判斷是否為空值

pd.isna(df1)
Out[118]: 
                A      B      C      D      F      E
2019-01-01  False  False  False  False   True  False
2019-01-02  False  False  False  False  False  False
2019-01-03  False  False  False  False  False   True
2019-01-04  False  False  False  False  False   True

運算 Operations

統(tǒng)計

注意 所有的統(tǒng)計默認(rèn)是不包含空值的

平均值

默認(rèn)情況是按列求平均值:

df.mean()
Out[119]: 
A   -0.407230
B   -1.147897
C    0.042373
D    5.000000
F    3.000000
dtype: float64

如果需要按行求平均值,需指定軸參數(shù):

df.mean(1)
Out[120]: 
2019-01-01    1.348109
2019-01-02    0.423404
2019-01-03    1.960733
2019-01-04    1.317326
2019-01-05    0.859286
2019-01-06    1.545461
Freq: D, dtype: float64

數(shù)值移動

s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates)
s
Out[122]: 
2019-01-01    1.0
2019-01-02    3.0
2019-01-03    5.0
2019-01-04    NaN
2019-01-05    6.0
2019-01-06    8.0
Freq: D, dtype: float64
s = s.shift(2)
s
Out[125]: 
2019-01-01    NaN
2019-01-02    NaN
2019-01-03    1.0
2019-01-04    3.0
2019-01-05    5.0
2019-01-06    NaN
Freq: D, dtype: float64

這里將s的值移動兩個,那么空出的部分會自動使用NaN填充。

不同維度間的運算,pandas會自動擴展維度:

df.sub(s, axis="index")
Out[128]: 
                   A         B         C    D    F
2019-01-01       NaN       NaN       NaN  NaN  NaN
2019-01-02       NaN       NaN       NaN  NaN  NaN
2019-01-03  0.364425 -1.947641  1.386880  4.0  1.0
2019-01-04 -3.485980 -4.281454 -2.645937  2.0  0.0
2019-01-05 -6.122717 -7.789041 -5.791812  0.0 -1.0
2019-01-06       NaN       NaN       NaN  NaN  NaN

應(yīng)用

通過apply()方法,可以對數(shù)據(jù)進行逐一操作:

累計求和

df.apply(np.cumsum)
Out[130]: 
                   A         B         C   D     F
2019-01-01  0.000000  0.000000  0.392435   5   NaN
2019-01-02 -2.420703 -1.116208  0.046365  10   1.0
2019-01-03 -1.056278 -2.063849  2.433245  15   3.0
2019-01-04 -1.542258 -3.345303  2.787307  20   6.0
2019-01-05 -2.664975 -6.134345  1.995495  25  10.0
2019-01-06 -2.443377 -6.887383  0.254239  30  15.0

這里使用了apply()方法調(diào)用np.cumsum方法,也可直接使用df.cumsum():

df.cumsum()
Out[133]: 
                   A         B         C     D     F
2019-01-01  0.000000  0.000000  0.392435   5.0   NaN
2019-01-02 -2.420703 -1.116208  0.046365  10.0   1.0
2019-01-03 -1.056278 -2.063849  2.433245  15.0   3.0
2019-01-04 -1.542258 -3.345303  2.787307  20.0   6.0
2019-01-05 -2.664975 -6.134345  1.995495  25.0  10.0
2019-01-06 -2.443377 -6.887383  0.254239  30.0  15.0

自定義方法

通過自定義函數(shù),配合apply()方法,可以實現(xiàn)更多數(shù)據(jù)處理:

df.apply(lambda x: x.max() - x.min())
Out[134]: 
A    3.785129
B    2.789041
C    4.128136
D    0.000000
F    4.000000
dtype: float64

矩陣

統(tǒng)計矩陣中每個元素出現(xiàn)的頻次:

s = pd.Series(np.random.randint(0, 7, size=10))
s
Out[136]: 
0    2
1    0
2    4
3    0
4    3
5    3
6    6
7    4
8    6
9    5
dtype: int64
s.value_counts()
Out[137]: 
6    2
4    2
3    2
0    2
5    1
2    1
dtype: int64

String方法

所有的Series類型都可以直接調(diào)用str的屬性方法來對每個對象進行操作。

比如轉(zhuǎn)換成大寫:

s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
s.str.upper()
Out[139]: 
0       A
1       B
2       C
3    AABA
4    BACA
5     NaN
6    CABA
7     DOG
8     CAT
dtype: object

分列:

s = pd.Series(["A,b", "c,d"])
s
Out[142]: 
0    A,b
1    c,d
dtype: object
s.str.split(",", expand=True)
Out[143]: 
   0  1
0  A  b
1  c  d

其他方法:

dir(str)
Out[140]: 
["capitalize",
 "casefold",
 "center",
 "count",
 "encode",
 "endswith",
 "expandtabs",
 "find",
 "format",
 "format_map",
 "index",
 "isalnum",
 "isalpha",
 "isascii",
 "isdecimal",
 "isdigit",
 "isidentifier",
 "islower",
 "isnumeric",
 "isprintable",
 "isspace",
 "istitle",
 "isupper",
 "join",
 "ljust",
 "lower",
 "lstrip",
 "maketrans",
 "partition",
 "replace",
 "rfind",
 "rindex",
 "rjust",
 "rpartition",
 "rsplit",
 "rstrip",
 "split",
 "splitlines",
 "startswith",
 "strip",
 "swapcase",
 "title",
 "translate",
 "upper",
 "zfill"]

合并 Merge

pandas`可以提供很多方法可以快速的合并各種類型的Series、DataFrame以及Panel Object。

Concat方法

df = pd.DataFrame(np.random.randn(10, 4))
df
Out[145]: 
          0         1         2         3
0 -0.227408 -0.185674 -0.187919  0.185685
1  1.132517 -0.539992  1.156631 -0.022468
2  0.214134 -1.283055 -0.862972  0.518942
3  0.785903  1.033915 -0.471496 -1.403762
4 -0.676717 -0.529971 -1.161988 -1.265071
5  0.670126  1.320960 -0.128098  0.718631
6  0.589902  0.349386  0.221955  1.749188
7 -0.328885  0.607929 -0.973610 -0.928472
8  1.724243 -0.661503 -0.374254  0.409250
9  1.346625  0.618285  0.528776 -0.628470
# break it into pieces
pieces = [df[:3], df[3:7], df[7:]]
pieces
Out[147]: 
[          0         1         2         3
 0 -0.227408 -0.185674 -0.187919  0.185685
 1  1.132517 -0.539992  1.156631 -0.022468
 2  0.214134 -1.283055 -0.862972  0.518942,
           0         1         2         3
 3  0.785903  1.033915 -0.471496 -1.403762
 4 -0.676717 -0.529971 -1.161988 -1.265071
 5  0.670126  1.320960 -0.128098  0.718631
 6  0.589902  0.349386  0.221955  1.749188,
           0         1         2         3
 7 -0.328885  0.607929 -0.973610 -0.928472
 8  1.724243 -0.661503 -0.374254  0.409250
 9  1.346625  0.618285  0.528776 -0.628470]
pd.concat(pieces)
Out[148]: 
          0         1         2         3
0 -0.227408 -0.185674 -0.187919  0.185685
1  1.132517 -0.539992  1.156631 -0.022468
2  0.214134 -1.283055 -0.862972  0.518942
3  0.785903  1.033915 -0.471496 -1.403762
4 -0.676717 -0.529971 -1.161988 -1.265071
5  0.670126  1.320960 -0.128098  0.718631
6  0.589902  0.349386  0.221955  1.749188
7 -0.328885  0.607929 -0.973610 -0.928472
8  1.724243 -0.661503 -0.374254  0.409250
9  1.346625  0.618285  0.528776 -0.628470

Merge方法

這是類似sql的合并方法:

left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
left
Out[151]: 
   key  lval
0  foo     1
1  foo     2
right
Out[152]: 
   key  rval
0  foo     4
1  foo     5
pd.merge(left, right, on="key")
Out[153]: 
   key  lval  rval
0  foo     1     4
1  foo     1     5
2  foo     2     4
3  foo     2     5

另一個例子:

left = pd.DataFrame({"key": ["foo", "bar"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "bar"], "rval": [4, 5]})
left
Out[156]: 
   key  lval
0  foo     1
1  bar     2
right
Out[157]: 
   key  rval
0  foo     4
1  bar     5
pd.merge(left, right, on="key")
Out[158]: 
   key  lval  rval
0  foo     1     4
1  bar     2     5

Append方法

在DataFrame中增加行

df = pd.DataFrame(np.random.randn(8, 4), columns=["A", "B", "C", "D"])
df
Out[160]: 
          A         B         C         D
0 -0.496709  0.573449  0.076059  0.685285
1  0.479253  0.587376 -1.240070 -0.907910
2 -0.052609 -0.287786 -1.949402  1.163323
3 -0.659489  0.525583  0.820922 -1.368544
4  1.270453 -1.813249  0.059915  0.586703
5  1.859657  0.564274 -0.198763 -1.794173
6 -0.649153 -3.129258  0.063418 -0.727936
7  0.862402 -0.800031 -1.954784 -0.028607
s = df.iloc[3]
s
Out[162]: 
A   -0.659489
B    0.525583
C    0.820922
D   -1.368544
Name: 3, dtype: float64
df.append(s, ignore_index=True)
Out[163]: 
          A         B         C         D
0 -0.496709  0.573449  0.076059  0.685285
1  0.479253  0.587376 -1.240070 -0.907910
2 -0.052609 -0.287786 -1.949402  1.163323
3 -0.659489  0.525583  0.820922 -1.368544
4  1.270453 -1.813249  0.059915  0.586703
5  1.859657  0.564274 -0.198763 -1.794173
6 -0.649153 -3.129258  0.063418 -0.727936
7  0.862402 -0.800031 -1.954784 -0.028607
8 -0.659489  0.525583  0.820922 -1.368544

這里要注意,我們增加了ignore_index=True參數(shù),如果不設(shè)置的話,那么增加的新行的index仍然是3,這樣在后續(xù)的處理中可能有存在問題。具體也需要看情況來處理。

df.append(s)
Out[164]: 
          A         B         C         D
0 -0.496709  0.573449  0.076059  0.685285
1  0.479253  0.587376 -1.240070 -0.907910
2 -0.052609 -0.287786 -1.949402  1.163323
3 -0.659489  0.525583  0.820922 -1.368544
4  1.270453 -1.813249  0.059915  0.586703
5  1.859657  0.564274 -0.198763 -1.794173
6 -0.649153 -3.129258  0.063418 -0.727936
7  0.862402 -0.800031 -1.954784 -0.028607
3 -0.659489  0.525583  0.820922 -1.368544

分組 Grouping

一般分組統(tǒng)計有三個步驟:

分組:選擇需要的數(shù)據(jù)

計算:對每個分組進行計算

合并:把分組計算的結(jié)果合并為一個數(shù)據(jù)結(jié)構(gòu)中

df = pd.DataFrame({"A": ["foo", "bar", "foo", "bar",
                    "foo", "bar", "foo", "foo"],
                    "B": ["one", "one", "two", "three",
                    "two", "two", "one", "three"],
                    "C": np.random.randn(8),
                    "D": np.random.randn(8)})
df
Out[166]: 
     A      B         C         D
0  foo    one -1.252153  0.172863
1  bar    one  0.238547 -0.648980
2  foo    two  0.756975  0.195766
3  bar  three -0.933405 -0.320043
4  foo    two -0.310650 -1.388255
5  bar    two  1.568550 -1.911817
6  foo    one -0.340290 -2.141259

按A列分組并使用sum函數(shù)進行計算:

df.groupby("A").sum()
Out[167]: 
            C         D
A                      
bar  0.873692 -2.880840
foo -1.817027 -5.833961

這里由于B列無法應(yīng)用sum函數(shù),所以直接被忽略了。

按A、B列分組并使用sum函數(shù)進行計算:

df.groupby(["A", "B"]).sum()
Out[168]: 
                  C         D
A   B                        
bar one    0.238547 -0.648980
    three -0.933405 -0.320043
    two    1.568550 -1.911817
foo one   -1.592443 -1.968396
    three -0.670909 -2.673075
    two    0.446325 -1.192490

這樣就有了一個多層index的結(jié)果集。

整形 Reshaping

堆疊 Stack

pythonzip函數(shù)可以將對象中對應(yīng)的元素打包成一個個的元組:

tuples = list(zip(["bar", "bar", "baz", "baz",
"foo", "foo", "qux", "qux"],
["one", "two", "one", "two",
"one", "two", "one", "two"]))
tuples
Out[172]: 
[("bar", "one"),
 ("bar", "two"),
 ("baz", "one"),
 ("baz", "two"),
 ("foo", "one"),
 ("foo", "two"),
 ("qux", "one"),
 ("qux", "two")]
## 設(shè)置兩級索引
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
index
Out[174]: 
MultiIndex(levels=[["bar", "baz", "foo", "qux"], ["one", "two"]],
           codes=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=["first", "second"])
## 創(chuàng)建DataFrame
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=["A", "B"])
df
Out[176]: 
                     A         B
first second                    
bar   one    -0.501215 -0.947993
      two    -0.828914  0.232167
baz   one     1.245419  1.006092
      two     1.016656 -0.441073
foo   one     0.479037 -0.500034
      two    -1.113097  0.591696
qux   one    -0.014760 -0.320735
      two    -0.648743  1.499899
## 選取DataFrame
df2 = df[:4]
df2
Out[179]: 
                     A         B
first second                    
bar   one    -0.501215 -0.947993
      two    -0.828914  0.232167
baz   one     1.245419  1.006092
      two     1.016656 -0.441073

使用stack()方法,可以通過堆疊的方式將二維數(shù)據(jù)變成為一維數(shù)據(jù):

stacked = df2.stack()
stacked
Out[181]: 
first  second   
bar    one     A   -0.501215
               B   -0.947993
       two     A   -0.828914
               B    0.232167
baz    one     A    1.245419
               B    1.006092
       two     A    1.016656
               B   -0.441073
dtype: float64

對應(yīng)的逆操作為unstacked()方法:

stacked.unstack()
Out[182]: 
                     A         B
first second                    
bar   one    -0.501215 -0.947993
      two    -0.828914  0.232167
baz   one     1.245419  1.006092
      two     1.016656 -0.441073
stacked.unstack(1)
Out[183]: 
second        one       two
first                      
bar   A -0.501215 -0.828914
      B -0.947993  0.232167
baz   A  1.245419  1.016656
      B  1.006092 -0.441073
stacked.unstack(0)
Out[184]: 
first          bar       baz
second                      
one    A -0.501215  1.245419
       B -0.947993  1.006092
two    A -0.828914  1.016656
       B  0.232167 -0.441073

unstack()默認(rèn)對最后一層級進行操作,也可通過輸入?yún)?shù)指定。

表格轉(zhuǎn)置

df = pd.DataFrame({"A": ["one", "one", "two", "three"] * 3,
"B": ["A", "B", "C"] * 4,
"C": ["foo", "foo", "foo", "bar", "bar", "bar"] * 2,
"D": np.random.randn(12),
"E": np.random.randn(12)})
df
Out[190]: 
        A  B    C         D         E
0     one  A  foo -0.933264 -2.387490
1     one  B  foo -0.288101  0.023214
2     two  C  foo  0.594490  0.418505
3   three  A  bar  0.450683  1.939623
4     one  B  bar  0.243897 -0.965783
5     one  C  bar -0.705494 -0.078283
6     two  A  foo  1.560352  0.419907
7   three  B  foo  0.199453  0.998711
8     one  C  foo  1.426861 -1.108297
9     one  A  bar -0.570951 -0.022560
10    two  B  bar -0.350937 -1.767804
11  three  C  bar  0.983465  0.065792

通過pivot_table()方法可以很方便的進行行列的轉(zhuǎn)換:

pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])
Out[191]: 
C             bar       foo
A     B                    
one   A -0.570951 -0.933264
      B  0.243897 -0.288101
      C -0.705494  1.426861
three A  0.450683       NaN
      B       NaN  0.199453
      C  0.983465       NaN
two   A       NaN  1.560352
      B -0.350937       NaN
      C       NaN  0.594490

轉(zhuǎn)換中,涉及到空值部分會自動填充為NaN

時間序列 Time Series

pandas的在時序轉(zhuǎn)換方面十分強大,可以很方便的進行各種轉(zhuǎn)換。

時間間隔調(diào)整

rng = pd.date_range("1/1/2019", periods=100, freq="S")
rng[:5]
Out[214]: 
DatetimeIndex(["2019-01-01 00:00:00", "2019-01-01 00:00:01",
               "2019-01-01 00:00:02", "2019-01-01 00:00:03",
               "2019-01-01 00:00:04"],
              dtype="datetime64[ns]", freq="S")
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.head(5)
Out[216]: 
2019-01-01 00:00:00    245
2019-01-01 00:00:01    347
2019-01-01 00:00:02    113
2019-01-01 00:00:03    196
2019-01-01 00:00:04    131
Freq: S, dtype: int64
## 按10s間隔進行重新采樣
ts1 = ts.resample("10S")
ts1
Out[209]: DatetimeIndexResampler [freq=<10 * Seconds>, axis=0, closed=left, label=left, convention=start, base=0]
## 用求平均的方式進行數(shù)據(jù)整合    
ts1.mean()
Out[218]: 
2019-01-01 00:00:00    174.0
2019-01-01 00:00:10    278.5
2019-01-01 00:00:20    281.8
2019-01-01 00:00:30    337.2
2019-01-01 00:00:40    221.0
2019-01-01 00:00:50    277.1
2019-01-01 00:01:00    171.0
2019-01-01 00:01:10    321.0
2019-01-01 00:01:20    318.6
2019-01-01 00:01:30    302.6
Freq: 10S, dtype: float64
## 用求和的方式進行數(shù)據(jù)整合 
ts1.sum()
Out[219]: 
2019-01-01 00:00:00    1740
2019-01-01 00:00:10    2785
2019-01-01 00:00:20    2818
2019-01-01 00:00:30    3372
2019-01-01 00:00:40    2210
2019-01-01 00:00:50    2771
2019-01-01 00:01:00    1710
2019-01-01 00:01:10    3210
2019-01-01 00:01:20    3186
2019-01-01 00:01:30    3026
Freq: 10S, dtype: int64

這里先通過resample進行重采樣,在指定sum()或者mean()等方式來指定沖采樣的處理方式。

顯示時區(qū):

rng = pd.date_range("1/1/2019 00:00", periods=5, freq="D")
rng
Out[221]: 
DatetimeIndex(["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04",
               "2019-01-05"],
              dtype="datetime64[ns]", freq="D")
ts = pd.Series(np.random.randn(len(rng)), rng)
ts
Out[223]: 
2019-01-01   -2.327686
2019-01-02    1.527872
2019-01-03    0.063982
2019-01-04   -0.213572
2019-01-05   -0.014856
Freq: D, dtype: float64
ts_utc = ts.tz_localize("UTC")
ts_utc
Out[225]: 
2019-01-01 00:00:00+00:00   -2.327686
2019-01-02 00:00:00+00:00    1.527872
2019-01-03 00:00:00+00:00    0.063982
2019-01-04 00:00:00+00:00   -0.213572
2019-01-05 00:00:00+00:00   -0.014856
Freq: D, dtype: float64

轉(zhuǎn)換時區(qū):

ts_utc.tz_convert("US/Eastern")
Out[226]: 
2018-12-31 19:00:00-05:00   -2.327686
2019-01-01 19:00:00-05:00    1.527872
2019-01-02 19:00:00-05:00    0.063982
2019-01-03 19:00:00-05:00   -0.213572
2019-01-04 19:00:00-05:00   -0.014856
Freq: D, dtype: float64

時間格式轉(zhuǎn)換

rng = pd.date_range("1/1/2019", periods=5, freq="M")
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts
Out[230]: 
2019-01-31    0.197134
2019-02-28    0.569082
2019-03-31   -0.322141
2019-04-30    0.005778
2019-05-31   -0.082306
Freq: M, dtype: float64
ps = ts.to_period()
ps
Out[232]: 
2019-01    0.197134
2019-02    0.569082
2019-03   -0.322141
2019-04    0.005778
2019-05   -0.082306
Freq: M, dtype: float64
ps.to_timestamp()
Out[233]: 
2019-01-01    0.197134
2019-02-01    0.569082
2019-03-01   -0.322141
2019-04-01    0.005778
2019-05-01   -0.082306
Freq: MS, dtype: float64

在是時間段和時間轉(zhuǎn)換過程中,有一些很方便的算術(shù)方法可以使用,比如我們轉(zhuǎn)換如下兩個頻率:

1、按季度劃分,且每個年的最后一個月是11月。

2、按季度劃分,每個月開始為頻率一中下一個月的早上9點。

prng = pd.period_range("2018Q1", "2019Q4", freq="Q-NOV")
prng
Out[243]: 
PeriodIndex(["2018Q1", "2018Q2", "2018Q3", "2018Q4", "2019Q1", "2019Q2",
             "2019Q3", "2019Q4"],
            dtype="period[Q-NOV]", freq="Q-NOV")
ts = pd.Series(np.random.randn(len(prng)), prng)
ts
Out[245]: 
2018Q1   -0.112692
2018Q2   -0.507304
2018Q3   -0.324846
2018Q4    0.549671
2019Q1   -0.897732
2019Q2    1.130070
2019Q3   -0.399814
2019Q4    0.830488
Freq: Q-NOV, dtype: float64
ts.index = (prng.asfreq("M", "e") + 1).asfreq("H", "s") + 9
ts
Out[247]: 
2018-03-01 09:00   -0.112692
2018-06-01 09:00   -0.507304
2018-09-01 09:00   -0.324846
2018-12-01 09:00    0.549671
2019-03-01 09:00   -0.897732
2019-06-01 09:00    1.130070
2019-09-01 09:00   -0.399814
2019-12-01 09:00    0.830488
Freq: H, dtype: float64

注意:這個例子有點怪。可以這樣理解,我們先將prng直接轉(zhuǎn)換為按小時顯示:

prng.asfreq("H", "end") 
Out[253]: 
PeriodIndex(["2018-02-28 23:00", "2018-05-31 23:00", "2018-08-31 23:00",
             "2018-11-30 23:00", "2019-02-28 23:00", "2019-05-31 23:00",
             "2019-08-31 23:00", "2019-11-30 23:00"],
            dtype="period[H]", freq="H")

我們要把時間轉(zhuǎn)換為下一個月的早上9點,所以先轉(zhuǎn)換為按月顯示,并每個月加1(即下個月),然后按小時顯示并加9(早上9點)。

另外例子中s參數(shù)是start的簡寫,e參數(shù)是end的簡寫,Q-NOV即表示按季度,且每年的NOV是最后一個月。

更多了freq簡稱可以參考:http://pandas.pydata.org/pand...

asfreq()方法介紹可參考:http://pandas.pydata.org/pand...

分類目錄類型 Categoricals

關(guān)于Categories類型介紹可以參考:http://pandas.pydata.org/pand...

類型轉(zhuǎn)換:astype("category")

df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
"raw_grade": ["a", "b", "b", "a", "a", "e"]})
df
Out[255]: 
   id raw_grade
0   1         a
1   2         b
2   3         b
3   4         a
4   5         a
5   6         e
df["grade"] = df["raw_grade"].astype("category")
df["grade"]
Out[257]: 
0    a
1    b
2    b
3    a
4    a
5    e
Name: grade, dtype: category
Categories (3, object): [a, b, e]

重命名分類:cat

df["grade"].cat.categories = ["very good", "good", "very bad"]
df["grade"]
Out[269]: 
0    very good
1         good
2         good
3    very good
4    very good
5     very bad
Name: grade, dtype: category
Categories (3, object): [very good, good, very bad]

重分類:

df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium","good", "very good"])
df["grade"]
Out[271]: 
0    very good
1         good
2         good
3    very good
4    very good
5     very bad
Name: grade, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]

排列

df.sort_values(by="grade")
Out[272]: 
   id raw_grade      grade
5   6         e   very bad
1   2         b       good
2   3         b       good
0   1         a  very good
3   4         a  very good
4   5         a  very good

分組

df.groupby("grade").size()
Out[273]: 
grade
very bad     1
bad          0
medium       0
good         2
very good    3
dtype: int64

畫圖 Plotting

Series

ts = pd.Series(np.random.randn(1000),
index=pd.date_range("1/1/2000", periods=1000))
ts = pd.Series(np.random.randn(1000),
index=pd.date_range("1/1/2019", periods=1000))
ts = ts.cumsum()
ts.plot()
Out[277]: 
import matplotlib.pyplot as plt
plt.show()

DataFrame畫圖

使用plot可以把所有的列都通過標(biāo)簽的形式展示出來:

df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
columns=["A", "B", "C", "D"])
df = df.cumsum()
plt.figure()
Out[282]: 
df.plot() Out[283]: plt.legend(loc="best")

導(dǎo)入導(dǎo)出數(shù)據(jù) Getting Data In/Out

CSV

寫入:

df.to_csv("foo.csv")

讀取:

pd.read_csv("foo.csv")

HDF5

寫入:

df.to_hdf("foo.h5", "df")

讀取:

pd.read_hdf("foo.h5", "df")

Excel

寫入:

df.to_excel("foo.xlsx", sheet_name="Sheet1")

讀取:

pd.read_excel("foo.xlsx", "Sheet1", index_col=None, na_values=["NA"])

異常處理 Gotchas

如果有一些異常情況比如:

>>> if pd.Series([False, True, False]):
...     print("I was true")
Traceback
    ...
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

可以參考如下鏈接:

http://pandas.pydata.org/pand...

http://pandas.pydata.org/pand...

文章版權(quán)歸作者所有,未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址:http://m.specialneedsforspecialkids.com/yun/43272.html

相關(guān)文章

  • 8步從Python白板到專家,從基礎(chǔ)到深度學(xué)習(xí)

    摘要:去吧,參加一個在上正在舉辦的實時比賽吧試試你所學(xué)到的全部知識微軟雅黑深度學(xué)習(xí)終于看到這個,興奮吧現(xiàn)在,你已經(jīng)學(xué)到了絕大多數(shù)關(guān)于機器學(xué)習(xí)的技術(shù),是時候試試深度學(xué)習(xí)了。微軟雅黑對于深度學(xué)習(xí),我也是個新手,就請把這些建議當(dāng)作參考吧。 如果你想做一個數(shù)據(jù)科學(xué)家,或者作為一個數(shù)據(jù)科學(xué)家你想擴展自己的工具和知識庫,那么,你來對地方了。這篇文章的目的,是給剛開始使用Python進行數(shù)據(jù)分析的人,指明一條全...

    Zachary 評論0 收藏0
  • ??僅剩20分鐘挑戰(zhàn)一道Pandas面試題??生死競速??簡直刺激?

    ?作者主頁:小小明-代碼實體 ?簡介:Python領(lǐng)域優(yōu)質(zhì)創(chuàng)作者?、數(shù)據(jù)處理專家? ?歡迎點贊 ? 收藏 ?留言 ? 昨晚有位童鞋一道Pandas面試題完全沒有思路不會做,通過黃同學(xué)找到我時,這道題目離提交答案僅剩20分鐘,不過我最終還是在15分鐘之內(nèi)解決了問題,這整個過程簡直是刺激~??? 原題題目如下: 最終要求輸出: 要在20分鐘內(nèi)解決這個問題,對于我來說最困難的第一步就是理解...

    objc94 評論0 收藏0
  • 還在抱怨pandas運行速度慢?這幾個方法會顛覆你的看法

    摘要:它還使用執(zhí)行所謂的鏈?zhǔn)剿饕@通常會導(dǎo)致意外的結(jié)果。但這種方法的最大問題是計算的時間成本。這些都是一次產(chǎn)生一行的生成器方法,類似中使用的用法。在這種情況下,所花費的時間大約是方法的一半。根據(jù)每小時所屬的應(yīng)用一組標(biāo)簽。 作者:xiaoyu 微信公眾號:Python數(shù)據(jù)科學(xué) 知乎:python數(shù)據(jù)分析師 showImg(https://segmentfault.com/img/bVboe...

    keelii 評論0 收藏0
  • 各種API+教程+練習(xí)

    摘要:做一個搬運工,希望自己能努力學(xué)習(xí),也希望大神們的東西能讓更多的人看到不斷更新更新日志新增了網(wǎng)絡(luò)安全分類,整理了排版布局新增了的鏈接,將一些雜七雜八的東西弄到了一篇新文章上了,叫做積累與雜貨鋪一以及相關(guān)教程的規(guī)范與相關(guān)中文學(xué)習(xí)大本營中文文檔簡 做一個搬運工,希望自己能努力學(xué)習(xí),也希望大神們的東西能讓更多的人看到 不斷更新 更新日志:2017.10.13 新增了網(wǎng)絡(luò)安全分類,整理了排版布局...

    saucxs 評論0 收藏0
  • 各種API+教程+練習(xí)

    摘要:做一個搬運工,希望自己能努力學(xué)習(xí),也希望大神們的東西能讓更多的人看到不斷更新更新日志新增了網(wǎng)絡(luò)安全分類,整理了排版布局新增了的鏈接,將一些雜七雜八的東西弄到了一篇新文章上了,叫做積累與雜貨鋪一以及相關(guān)教程的規(guī)范與相關(guān)中文學(xué)習(xí)大本營中文文檔簡 做一個搬運工,希望自己能努力學(xué)習(xí),也希望大神們的東西能讓更多的人看到 不斷更新 更新日志:2017.10.13 新增了網(wǎng)絡(luò)安全分類,整理了排版布局...

    20171112 評論0 收藏0

發(fā)表評論

0條評論

最新活動
閱讀需要支付1元查看
<