The pandas functions that Data Scientists often use with the 80/20 principle [Part 2]

Tram Ho

2 years ago

You can read the previous part of the article here .

11.Delete column

If you want to drop one or more columns from the DataFrame, use the drop() method as shown below:

import pandas as pd

df = pd.DataFrame([[1,"A"],
                [2,"B"]],
                columns = [“coli", "“col2"])

df.drop(columns = ["col2"])

"""
    col1
0    1
l    2
"""

import pandas as pd

df = pd.DataFrame([[1,"A"],

[2,"B"]],

columns = [“coli", "“col2"])

df.drop(columns = ["col2"])

"""

col1

0 1

l 2

"""

Read more here .

12. GroupBy:

If you want to perform an aggregate operation after grouping, use the groupby() method as shown below:

import pandas as pd

df = pd.DataFrame([[1,"A"],
                   [2,"B"],
                   [3,"A"],
                   [4,"C"]],
                   columns = ["col1", "col2"])

df.groupby("col2").col1.sum()
"""
    Col2
A    4
B    2
C    4
"""

import pandas as pd

df = pd.DataFrame([[1,"A"],

[2,"B"],

[3,"A"],

[4,"C"]],

columns = ["col1", "col2"])

df.groupby("col2").col1.sum()

"""

Col2

A 4

B 2

C 4

"""

Read more here .

13. Unique value in column:

If you want to count or print unique values in a column of DataFrame, use unique() or nunique() method as shown below:

import pandas as pd

df = pd.DataFrame([[1,"A"],
                   [2,"B"],
                   [3,"A"],
                   [4,"C"]],
                   columns = ["col1", "col2"])

# Print Unique values
df.col2.unique()
"""
['A','B','C']
"""

# Number of unique values
df.col2.nunique()

"""
3
"""

import pandas as pd

df = pd.DataFrame([[1,"A"],

[2,"B"],

[3,"A"],

[4,"C"]],

columns = ["col1", "col2"])

# Print Unique values

df.col2.unique()

"""

['A','B','C']

"""

# Number of unique values

df.col2.nunique()

"""

Read more here .

14. Fill in NaN (empty) values

If you want to replace the NaN values in a column with some other value, use the fillna() method as shown below:

import pandas as pd
import numpy as np

df = pd.DataFrame([[1, "A"],
                   [2, np.nan],
                   [3, np.nan],
                   columns = ["col1", "col2"])

df.col2.fillna("B", inplace = True)

"""
 col1 col2
0  1   A
1  2   B
2  3   B
"""

import pandas as pd

import numpy as np

df = pd.DataFrame([[1, "A"],

[2, np.nan],

[3, np.nan],

columns = ["col1", "col2"])

df.col2.fillna("B", inplace = True)

"""

col1 col2

0 1 A

1 2 B

2 3 B

"""

Read more here .

15. Apply function on 1 column:

If you want to apply a function to a column, use the apply() method as shown below:

import pandas as pd

def f(number):
    return number + 2

df = pd.DataFrame([[1, "A"],
                   [2, "B"],
                   columns = ["col1", "col2"])
                   
df["col3"] = df.col1.apply(f)

"""
 col1 col2 col3
0  1   A   3   
1  2   B   4
"""

import pandas as pd

def f(number):

return number + 2

df = pd.DataFrame([[1, "A"],

[2, "B"],

columns = ["col1", "col2"])

df["col3"] = df.col1.apply(f)

"""

col1 col2 col3

0 1 A 3

1 2 B 4

"""

Read more here .

16. Remove duplicates:

If you want to remove duplicate values, use the dropduplicates() method as shown below:

import pandas as pd

df = pd.DataFrame([[1,"A"],
                   [2,"B"],
                   [1,"A"],
                   columns = ["col1", "col2"])
                   
df.drop_duplicates()
"""
 col1 col2
0  1   A
1  2   B
"""

import pandas as pd

df = pd.DataFrame([[1,"A"],

[2,"B"],

[1,"A"],

columns = ["col1", "col2"])

df.drop_duplicates()

"""

col1 col2

0 1 A

1 2 B

"""

Read more here .

17. Counting values:

If you want to find the frequency of each value in a column, use the value_counts() method as shown below:

import pandas as pd

df = pd.DataFrame([[1,"A"],
                   [2,"B"],
                   [2,"A"],
                   [3,"C"]],
                   columns = ["col1", "col2"])

df.col2.value_counts()

"""
A 2
B 1
C 1
"""

import pandas as pd

df = pd.DataFrame([[1,"A"],

[2,"B"],

[2,"A"],

[3,"C"]],

columns = ["col1", "col2"])

df.col2.value_counts()

"""

A 2

B 1

C 1

"""

18. Size of DataFrame:

If you want to find the size of the DataFrame, use the .shape property as shown below:

import pandas as pd

df = pd.DataFrame([[1,"A"],
                   [2,"B"],
                   [2,"A"],
                   [3,"C"]],
                   columns = ["col1", "col2"])

df.shape

"""
(4,2)
"""

import pandas as pd

df = pd.DataFrame([[1,"A"],

[2,"B"],

[2,"A"],

[3,"C"]],

columns = ["col1", "col2"])

df.shape

"""

(4,2)

"""

To wrap up, in this post, I’ve covered some of the most commonly used functions/methods in Pandas to get you started with this library.

Furthermore, there is no better place than to consult the official Pandas documentation available here to get a basic and practical knowledge of the different methods in Pandas. The official Pandas documentation provides a detailed explanation of each argument accepted by a function along with a practical example which is, in my opinion, a great way to gain Pandas expertise.

Thanks for reading. I hope this article was helpful.

References

https://towardsdatascience.com/20-of-pandas-functions-that-data-scientists-use-80-of-the-time-a4ff1b694707

https://pandas.pydata.org/docs/index.html

Share the news now