摘要:導入數據預處理計算值從到對應的平均畸變程度用求解距離平均畸變程度用肘部法則來確定最佳的值建模
導入數據
cus_general = customer[["wm_poi_id","city_type","pre_book","aor_type","is_selfpick_poi","is_selfpick_trade_poi"]] cus_ord = customer[["wm_poi_id","month_original_price","month_order_cnt","service_fee_30day","abnor_rate_30day"]] cus = customer[["wm_poi_id","comment_1star","comment_5star","pic_comment_cnt"]] cus = customer[["wm_poi_id","waybill_received_ratio","waybill_delivered_ratio","waybill_ontime_ratio","waybill_normal_arrived_delivery_total_interval_avg","waybill_normal_poi_push_interval_avg","waybill_normal_receive_interval_avg","waybill_normal_fetch_interval_avg","waybill_normal_delivery_interval_avg","waybill_delivery_ontime_ratio","loss_amt"]] cus_all = customer[["wm_poi_id","c5","ol_time","primary_first_tag_id","city_level", "month_original_price","month_order_cnt","service_fee_30day","abnor_cnt_30day", "comment_1star","comment_5star","pic_comment_cnt", "area_30day","waybill_grab_5mins_ratio","waybill_delivered_ratio","waybill_normal_arrived_delivery_total_interval_avg","waybill_normal_receive_interval_avg", "call.call_cnt","call.call_cnt_ord","call.call_cnt_poi","call.call_cnt_oth"]]預處理
from sklearn import preprocessing cus = pd.DataFrame(preprocessing.scale(cus_general.iloc[:,1:6])) cus = pd.DataFrame(preprocessing.scale(cus_ord.iloc[:,1:5])) cus = pd.DataFrame(preprocessing.scale(cus_all.iloc[:,1:21])) cus.columns = ["city_type","pre_book","aor_type","is_selfpick_poi","is_selfpick_trade_poi"] cus.columns = ["month_original_price","month_order_cnt","service_fee_30day","abnor_rate_30day"] cus.columns = ["comment_1star","comment_5star","pic_comment_cnt"] cus.columns = ["waybill_push_ratio","waybill_delivered_ratio","waybill_ontime_ratio","waybill_normal_arrived_delivery_total_interval_avg","waybill_normal_poi_push_interval_avg","waybill_normal_receive_interval_avg","waybill_normal_fetch_interval_avg","waybill_normal_delivery_interval_avg","waybill_delivery_ontime_ratio","loss_amt"] cus.columns = ["c5","ol_time","primary_first_tag_id","city_level", "month_original_price","month_order_cnt","service_fee_30day","abnor_cnt_30day", "comment_1star","comment_5star","pic_comment_cnt", "area_30day","waybill_grab_5mins_ratio","waybill_delivered_ratio","waybill_normal_arrived_delivery_total_interval_avg","waybill_normal_receive_interval_avg", "call.call_cnt","call.call_cnt_ord","call.call_cnt_poi","call.call_cnt_oth"]計算K值從1到10對應的平均畸變程度:用scipy求解距離
from sklearn.cluster import KMeans from scipy.spatial.distance import cdist K=range(1,15) meandistortions=[] for k in K: kmeans=KMeans(n_clusters=k) kmeans.fit(cus) meandistortions.append(sum(np.min(cdist(cus,kmeans.cluster_centers_,"euclidean"),axis=1))) plt.plot(K,meandistortions,"bx-") plt.xlabel("k") plt.ylabel(u"平均畸變程度") plt.title(u"用肘部法則來確定最佳的K值")Kmean建模
from sklearn.cluster import KMeans clf = KMeans(n_clusters=12) clf.fit(cus) pd.Series(pd.Series(clf.labels_).value_counts()) centres = pd.DataFrame(clf.cluster_centers_) centres.columns = cus_all.iloc[:,1:21].columns centres.plot(kind="bar", subplots=True, figsize=(6,15)) clf.inertia_ cus_general = pd.concat([cus_general, pd.DataFrame(clf.fit_predict(cus))], axis=0) cus_general = cus_general.rename(columns={0:"general"}) cus_ord = pd.concat([cus_ord, pd.DataFrame(clf.fit_predict(cus))], axis=0) cus_ord = cus_ord.rename(columns={0:"order"}) cus_all = pd.concat([cus_all, pd.DataFrame(clf.fit_predict(cus))], axis=0) cus_all = cus_all.rename(columns={0:"cluster"}) centres = cus_all.groupby(["cluster"]).mean() cus_all.to_csv("cluster.csv") result = cus_all[cus_all["cluster"]==2]
文章版權歸作者所有,未經允許請勿轉載,若此文章存在違規行為,您可以聯系管理員刪除。
轉載請注明本文地址:http://m.specialneedsforspecialkids.com/yun/44576.html
摘要:聚類算法簡介聚類的目標是使同一類對象的相似度盡可能地大不同類對象之間的相似度盡可能地小。用戶地理位置信息的的聚類實現本實驗用實現,依賴等科學計算。 1. 聚類算法簡介 聚類的目標是使同一類對象的相似度盡可能地大;不同類對象之間的相似度盡可能地小。目前聚類的方法很多,根據基本思想的不同,大致可以將聚類算法分為五大類:層次聚類算法、分割聚類算法、基于約束的聚類算法、機器學習中的聚類算法和用...
摘要:如何確定最佳的值類別數本文選取手肘法手肘法對于每一個值,計算它的誤差平方和其中是點的個數,是第個點,是對應的中心。隨著聚類數的增大,樣本劃分會更加精細,每個簇的聚合程度會逐漸提高,那么誤差平方和自然會逐漸變小。 目錄 Kmeans聚類算法介紹: 1.聚類概念: 2.Kmeans算法: 定義...
摘要:指定最大迭代次數的整數要求的準確性重復試驗算法次數,將會返回最好的一次結果該標志用于指定初始中心的采用方式。第一列對應于所有個人的高度,第二列對應于它們的權重。類似地,剩余的行對應于其他人的高度和重量。 K-Means Clustering in OpenCV cv2.kmeans(data, K, bestLabels, criteria, attempts, flags[, cen...
閱讀 3660·2021-09-27 14:02
閱讀 1790·2019-08-30 15:56
閱讀 1745·2019-08-29 18:44
閱讀 3279·2019-08-29 17:21
閱讀 487·2019-08-26 17:15
閱讀 1176·2019-08-26 13:57
閱讀 1241·2019-08-26 13:56
閱讀 2880·2019-08-26 11:30