Pandas: วิธีจัดกลุ่มตามดัชนีและคำนวณ

โดย ดร.เบนจามิน แอนเดอร์สัน กรกฎาคม 22, 2023 แนะนำ 0 ความคิดเห็น

คุณสามารถใช้วิธีการต่อไปนี้เพื่อจัดกลุ่มตามคอลัมน์ดัชนีอย่างน้อยหนึ่งคอลัมน์ในแพนด้าและทำการคำนวณ:

วิธีที่ 1: จัดกลุ่มตามคอลัมน์ดัชนี

 df. groupby (' index1 ')[' numeric_column ']. max ()

วิธีที่ 2: จัดกลุ่มตามคอลัมน์ดัชนีหลายคอลัมน์

 df. groupby ([' index1 ',' index2 '])[' numeric_column ']. sum ()

วิธีที่ 3: จัดกลุ่มตามคอลัมน์ดัชนีและคอลัมน์ปกติ

 df. groupby ([' index1 ',' numeric_column1 '])[' numeric_column2 ']. nunique ()

ตัวอย่างต่อไปนี้แสดงวิธีการใช้แต่ละวิธีกับ DataFrame แพนด้าต่อไปนี้ที่มี MultiIndex:

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' team ': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   ' position ': ['G', 'G', 'G', 'F', 'F', 'G', 'G', 'F', 'F', 'F'],
                   ' points ': [7, 7, 7, 19, 16, 9, 10, 10, 8, 8],
                   ' rebounds ': [8, 8, 8, 10, 11, 12, 13, 13, 15, 11]})

#set 'team' column to be index column
df. set_index ([' team ', ' position '], inplace= True )

#view DataFrame
df

		 rebound points
team position		
A G 7 8
        G 7 8
        G 7 8
        F 19 10
        F 16 11
B G 9 12
        G 10 13
        F 10 13
        F 8 15
        F 8 11

วิธีที่ 1: จัดกลุ่มตามคอลัมน์ดัชนี

รหัสต่อไปนี้แสดงวิธีการค้นหาค่าสูงสุดของคอลัมน์ “จุด” ซึ่งจัดกลุ่มตามคอลัมน์ดัชนี “ตำแหน่ง”:

 #find max value of 'points' grouped by 'position index column
df. groupby (' position ')[' points ']. max ()

position
F 19
G 10
Name: points, dtype: int64

วิธีที่ 2: จัดกลุ่มตามคอลัมน์ดัชนีหลายคอลัมน์

รหัสต่อไปนี้แสดงวิธีการค้นหาผลรวมของคอลัมน์ “คะแนน” ซึ่งจัดกลุ่มตามคอลัมน์ดัชนี “ทีม” และ “ตำแหน่ง”:

 #find max value of 'points' grouped by 'position index column
df. groupby ([' team ', ' position '])[' points ']. sum ()

team position
AF35
      G21
BF 26
      G 19
Name: points, dtype: int64

วิธีที่ 3: จัดกลุ่มตามคอลัมน์ดัชนีและคอลัมน์ปกติ

รหัสต่อไปนี้แสดงวิธีค้นหาจำนวนค่าที่ไม่ซ้ำในคอลัมน์ “rebounds” ซึ่งจัดกลุ่มตามคอลัมน์ดัชนี “ทีม” และคอลัมน์ “คะแนน” ปกติ:

 #find max value of 'points' grouped by 'position index column
df. groupby ([' team ', ' points '])[' rebounds ']. nunique ()

team points
At 7 1
      16 1
      19 1
B 8 2
      9 1
      10 1
Name: rebounds, dtype: int64

แหล่งข้อมูลเพิ่มเติม

บทช่วยสอนต่อไปนี้จะอธิบายวิธีดำเนินการทั่วไปอื่น ๆ ในแพนด้า:

วิธีนับค่าที่ไม่ซ้ำในแพนด้า
วิธีทำให้ MultiIndex แบนใน Pandas
วิธีแก้ไขค่าดัชนีตั้งแต่หนึ่งค่าขึ้นไปใน Pandas
วิธีรีเซ็ตดัชนีใน Pandas

เกี่ยวกับผู้แต่ง

ดร.เบนจามิน แอนเดอร์สัน

สวัสดี ฉันชื่อเบนจามิน ศาสตราจารย์สถิติเกษียณอายุแล้ว และผันตัวมาเป็นครูสอนสถิติโดยเฉพาะ ด้วยประสบการณ์และความเชี่ยวชาญที่กว้างขวางในสาขาสถิติ ฉันกระตือรือร้นที่จะแบ่งปันความรู้ของฉันเพื่อเสริมศักยภาพนักเรียนผ่าน Statorials. รู้เพิ่มเติม