TIL - 05.15

TIL 2024. 5. 15. 15:10

Pandas 숙제편

iris.sort_values('Petal Length',ascending=False)

iris.groupby('Species')[['Species']].count()
iris.groupby('Species', as_index=False)['Species'].count()

DF로 만드는 index = False

=> 정확한 기능은?

concat - dataframe 간의 합

dfSLmn = df[['Sepal Length']].mean()
dfPLmn = df[['Petal Length']].mean()

df3 = pd.concat([dfSLmn,dfPLmn]) #axis =0 기본일 떄는 dtype = float64
wru = pd.DataFrame(df3, columns=['Mean']).T # transpose()
wru

index 등 세부 값 지정시

단수 : ' ' // 복수 : [ , ]

np의 조건문 설정

dfSS['Sepal Size'] = np.where(df['Sepal Length'] >=5.0 ,'Large','Small')

.apply

dfSS['Sepal Size'] = df['Sepal Length'].apply(lambda x: 'Large' if x >= 5.0 else 'Small')

dffilt = df[(df['Sepal Length'] >=5.0) & (df['Sepal Width'] <=3.5)]
df01 = dffilt.copy()
df01['Petal Sum'] = df01['Petal Length'] + df01['Petal Width']
df01

조건문으로 .query 사용

df_SS = df.copy()
SS = "df['Sepal Length'] >=5.0 and df['Sepal Width'] <=3.5"
df_SS = df_SS.query(SS)

SS = "df['Sepal Length'] >=5.0"
# size = 5.0
# SS = f"(df['Sepal Length'] >= {size})"

# def my_max(x,y) :
# return max(x,y)
# SS = "df['Sepal Length'] >= @my_max(1,22) ""

# 인덱스값 적용 및 검색 가능
# SS = "index >= 2"
# df_q = df.query(SS)
# display_side_by_side(df, df_q) : 두개의 dataframe을 결합없이 그냥 보여주는 방식

# 문자열 검색
# print(pd.__version__) # 판다스 버젼 보여줌
# SS = "name.str.contains('Sepal', case=True)" // 대소문자 구분
# contains / startswith / endswith

장점은 가독성과 편의성이 최대 장점입니다.

단점은 .loc[ ] 로 구현한 것보다 속도가 느립니다.

matplotlib 의 histgram 의 bins는 막대바 의 수 : 몇개의 구간으로 분류할 것인가?

import seaborn as sns
iris = sns.load_dataset("iris")
    
sepal_lengths_list = [iris[iris['species']==s]['sepal_length'].tolist()\
for s in iris['species'].unique()]


plt.boxplot(sepal_lengths_list, labels=iris['species'].unique())
plt.show()

'TIL' 카테고리의 다른 글

TIL - 05.20 (0)	2024.05.20
TIL*2 - 05.17,9 (0)	2024.05.20
TIL - 05.14 (0)	2024.05.15
TIL - 05.13 (0)	2024.05.13
TIL - 05.10 (0)	2024.05.10

ABOUT ME

자율탐구 자율탐구

'TIL' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'TIL' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바