Some basic pandas functions used in the Time Series problem

Tram Ho

Foreword

In the process of learning the application Pandas library to solve problems related to Time series, I realized that there are some basic functions needed and I would like to share some of my conclusions drawn. out from the course on udemy. These are just some basic and subjective functions that I find popular, so I really need more contributions and additions from you and your friends in the Community. Content implementation and function usage in Pandas are summarized on a case-by-case basis, i.e. different question types.

Content

Some basic Pandas functions

Using the Pandas library with what command?

Which function in pandas to use to read csv file?

For example:

Similarly, pandas also supports reading files in different extension formats such as excel, html… You can learn more about this by clicking the key-word Pandas IO .
Note: The pwd can be used to check the current directory location.

How to display data lines in dataframe from pandas?

By default, the head() function will return the first 5 lines in the dataframe, but I can adjust the number of returned lines by passing a positive integer in the head() function.

Display some information from dataframe in pandas?

  1. The function gives the dataframe information:

  1. Description function of dataframe including standard deviation, variance, percentile, mean of each column (corresponding to each field)

Get columns names – field names in dataframe?

How to filter out a list of column-by-column values ​​and non-matching values?

For example:

How to get the count of a list of values ​​of a particular column?

For example:

How to count occurrences of each value at a field of dataframe?

How to sort and filter out top x in dataframe?

How to filter out the top x in the dataframe and which information should be grouped?

The solution to this problem is: Combination of groupby and sort_values

Compares the condition for the dataFrame and shows the number of rows that meet the condition

Method 1: The >= operator only serves the example below, can use many other conditional expressions such as < > <= >= == !=

Method 2 and I find it very cool:

Another example would be to filter each city to check for cities that don’t have the text ‘County’:
sum(data['County'].apply(lambda string: 'County' not in string))

Thank you

Thank you everyone for your support.

References

  1. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
  2. https://www.udemy.com/course/python-for-time-series-data-analysis/
  3. https://youtu.be/B67x_p-slYc
Share the news now

Source : Viblo