python pandas

Tram Ho

Panda is a library for manipulating with data in Python. Quick facts:

| Fact | Description |
| ——– | ——– |
| Homepage | https://pandas.pydata.org |
|API doc | https://pandas.pydata.org/docs/reference/index.html |
|Initial year | Aug 05, 2009 (13 years ago). https://github.com/pandas-dev/pandas/commit/ec1a0a2a2 |
| Source code | https://github.com/pandas-dev/pandas |
| Stack Overflow tag | https://stackoverflow.com/questions/tagged/pandas |
| Latest stable version | 1.4.2 (02 April, 2022)|

Development environment

![image.png](https://images.viblo.asia/d55ee195-ae53-4879-8d59-4445ab5ea71a.png)

Install pandas

![install_panda](https://user-images.githubusercontent.com/1328316/164500646-3330a0bb-7eea-433a-aa60-c03d90b75493.png)

Version of Python


(pythonProject1) C:Usersdonhu>python --version
Python 3.10.0

Install

![image.png](https://images.viblo.asia/f65d8acc-6bd5-4231-9e1c-6e91fc01ec6a.png)

![image.png](https://images.viblo.asia/71d14c2d-3c9e-453c-a60d-51c4bb6cfa2e.png)

Properties and Method with panda object

python
import pandas as pd

df = pd.DataFrame(
{
"Name": [
"Braund, Mr. Owen Harris",
"Allen, Mr. William Henry",
"Bonnell, Miss. Elizabeth",
],
"Age": [22, 35, 58],
"Sex": ["male", "male", "female"],
}
)

print("n01-----------------")
print(df)

print()
print("n02-----------------")
print(df["Age"])

print("n03-----------------")
ages = pd.Series([22, 35, 58], name="Age")
print(ages)

print("n04-----------------")
print(df["Age"].max())

print("n05-----------------")
print(ages.max())

print("n06-----------------")
print(df.describe())

# https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv
print("n07-----------------")
titanic = pd.read_csv("vy/titanic.csv")
print(titanic)

print("n08-----------------")
print(titanic.head(2))

print("n09-----------------")
print(titanic.dtypes)

print("n10-----------------")
# pip install openpyxl
# conda install openpyxl
print(titanic.to_excel("minh_thu.xlsx", sheet_name="lovers", index=False))

print("n11-----------------")
my_titanic = pd.read_excel("minh_thu.xlsx", sheet_name="lovers")
print(my_titanic.head(3))

print("n12-----------------")
print(my_titanic.info())

# https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv
url = (
"https://raw.github.com/pandas-dev"
"/pandas/main/pandas/tests/io/data/csv/tips.csv"
)
tips = pd.read_csv(url)
print("n12b-----------------")
print(tips)

print("n14-----------------")
sorted_df = tips.sort_values(by='total_bill')
print(sorted_df)

print("n15-----------------")
sorted_df = tips.sort_values(by='total_bill', ascending=False)
print(sorted_df)

result


C:ProgramDataAnaconda3envspythonProject1python.exe C:/Users/donhu/PycharmProjects/pythonProject1/vy_panda_01.py

01-----------------
Name Age Sex
0 Braund, Mr. Owen Harris 22 male
1 Allen, Mr. William Henry 35 male
2 Bonnell, Miss. Elizabeth 58 female

02-----------------
0 22
1 35
2 58
Name: Age, dtype: int64

03-----------------
0 22
1 35
2 58
Name: Age, dtype: int64

04-----------------
58

05-----------------
58

06-----------------
Age
count 3.000000
mean 38.333333
std 18.230012
min 22.000000
25% 28.500000
50% 35.000000
75% 46.500000
max 58.000000

07-----------------
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
.. ... ... ... ... ... ... ...
886 887 0 2 ... 13.0000 NaN S
887 888 1 1 ... 30.0000 B42 S
888 889 0 3 ... 23.4500 NaN S
889 890 1 1 ... 30.0000 C148 C
890 891 0 3 ... 7.7500 NaN Q

[891 rows x 12 columns]

08-----------------
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C

[2 rows x 12 columns]

09-----------------
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object

10-----------------
None

11-----------------
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S

[3 rows x 12 columns]

12-----------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None

12b-----------------
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
.. ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2

[244 rows x 7 columns]

14-----------------
total_bill tip sex smoker day time size
67 3.07 1.00 Female Yes Sat Dinner 1
92 5.75 1.00 Female Yes Fri Dinner 2
111 7.25 1.00 Female No Sat Dinner 1
172 7.25 5.15 Male Yes Sun Dinner 2
149 7.51 2.00 Male No Thur Lunch 2
.. ... ... ... ... ... ... ...
182 45.35 3.50 Male Yes Sun Dinner 3
156 48.17 5.00 Male No Sun Dinner 6
59 48.27 6.73 Male No Sat Dinner 4
212 48.33 9.00 Male No Sat Dinner 4
170 50.81 10.00 Male Yes Sat Dinner 3

[244 rows x 7 columns]

15-----------------
total_bill tip sex smoker day time size
170 50.81 10.00 Male Yes Sat Dinner 3
212 48.33 9.00 Male No Sat Dinner 4
59 48.27 6.73 Male No Sat Dinner 4
156 48.17 5.00 Male No Sun Dinner 6
182 45.35 3.50 Male Yes Sun Dinner 3
.. ... ... ... ... ... ... ...
149 7.51 2.00 Male No Thur Lunch 2
111 7.25 1.00 Female No Sat Dinner 1
172 7.25 5.15 Male Yes Sun Dinner 2
92 5.75 1.00 Female Yes Fri Dinner 2
67 3.07 1.00 Female Yes Sat Dinner 1

[244 rows x 7 columns]

Process finished with exit code 0

![image.png](https://images.viblo.asia/47607cba-5549-43de-a340-f8243546c4cc.png)

**Pandas Excel API**
Need install pandas and openpyxl inside Miniconda before practice. This is **read excel** function.

python
import pandas as pd

found_url = ("https://m.hvtc.edu.vn/Portals/0/01_2018/01.DS%20TN_9.2021%20.xlsx")
hehe = pd.read_excel(found_url)
hehe

Result

![image](https://user-images.githubusercontent.com/1328316/164501346-f0299556-9298-4181-8a5f-e53f26b73560.png)

Without header

python
hihi = pd.read_excel(found_url, index_col=None, header=None)
hihi

rb means r + b = read + binary. See https://docs.python.org/3/library/functions.html#open

python
hoho = pd.read_excel(open('C:\Users\donhu\Desktop\01.DS TN_9.2021 .xlsx', 'rb'), sheet_name='LC22')
hoho

Share the news now

Source : Viblo