The pandas functions that Data Scientists often use with the 80/20 principle [Part 1]

Tram Ho

Mastering an entire Python library like Pandas can be a challenge for anyone. However, if we take a step back and think, do we really need to pay attention to every little detail of a particular library, especially when we live in a regulated world? by the 80/20 Principle (Pareto principle)?

Therefore, this post is my attempt to apply the Pareto Principle to the Pandas library and introduce you to 20% of specific Pandas functions that you are likely to use 80% of the time working with DataFrames. The methods mentioned below are what I have used many times in my daily work and feel necessary and sufficient to familiarize anyone getting started with Pandas.

1. Read CSV file:

If you want to read a CSV file in Pandas, use the pd.readcsv() method as illustrated below:

Read the documentation here .

2. Save a DataFrame as a CSV file:

If you want to save the DataFrame to a CSV file, use the tocsv() method as shown below:

Read the documentation here .

3. Create a DataFrame from a list of lists:

If you want to create a DataFrame from a list, use the pd.DataFrame() method as shown below:

Read the documentation here .

4. Create a DataFrame from a dictionary:

If you want to create a DataFrame from a dictionary, use the pd.DataFrame() method as shown below:

Read more here .

5. Merge DataFrames (Merging DataFrames):

Merge operations in DataFrames are the same as JOIN operations in SQL. We use it to concatenate two DataFrames on one or more columns. If you want to merge two DataFrames, use the pd.merge() method as shown below:

Read more here .

6. Sorting a DataFrame (Sorting a DataFrame):

If you want to sort a DataFrame based on the values ​​in a specific column, use the sortvalues() method as shown below:

Read more here .

7. Concatenating DataFrame (Concatenating):

If you want to concatenate DataFrames, use the pd.concat() method as shown below:

  • axis = 1 stacks the columns together.
  • axis = 0 lines the rows together, provided they match the column headers.

Read more here .

8. Rename columns:

If you want to rename one or more columns in the DataFrame, use the rename() method as shown below:

Read more here .

9. Add a new column:

If you want to add a new column to the DataFrame, you can use the usual assignment operation as shown below:

10. Filter by condition:

If you want to filter rows from DataFrame based on a condition, you can do it like below:

Part 2 will be released soon. Thanks for reading. I hope this article was helpful.

References

https://towardsdatascience.com/20-of-pandas-functions-that-data-scientists-use-80-of-the-time-a4ff1b694707 https://pandas.pydata.org/docs/index.html

Share the news now

Source : Viblo