Mastering an entire Python library like Pandas can be a challenge for anyone. However, if we take a step back and think, do we really need to pay attention to every little detail of a particular library, especially when we live in a regulated world? by the 80/20 Principle (Pareto principle)?
Therefore, this post is my attempt to apply the Pareto Principle to the Pandas library and introduce you to 20% of specific Pandas functions that you are likely to use 80% of the time working with DataFrames. The methods mentioned below are what I have used many times in my daily work and feel necessary and sufficient to familiarize anyone getting started with Pandas.
1. Read CSV file:
If you want to read a CSV file in Pandas, use the pd.readcsv() method as illustrated below:
1 2 3 4 5 6 7 8 9 10 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd <span class="token builtin">file</span> <span class="token operator">=</span> <span class="token string">"file.csv"</span> <span class="token comment"># Reading CSV</span> df <span class="token operator">=</span> pd <span class="token punctuation">.</span> read_csv <span class="token punctuation">(</span> <span class="token builtin">file</span> <span class="token punctuation">)</span> <span class="token comment"># Changing Delimiter</span> symbou <span class="token operator">=</span> <span class="token string">"|"</span> df <span class="token operator">=</span> pd <span class="token punctuation">.</span> read_csv <span class="token punctuation">(</span> <span class="token builtin">file</span> <span class="token punctuation">,</span> sep <span class="token operator">=</span> symbol <span class="token punctuation">)</span> |
Read the documentation here .
2. Save a DataFrame as a CSV file:
If you want to save the DataFrame to a CSV file, use the tocsv() method as shown below:
1 2 3 4 5 6 7 8 9 10 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd <span class="token builtin">file</span> <span class="token operator">=</span> <span class="token string">"file.csv"</span> <span class="token comment"># Saving CSV</span> df <span class="token punctuation">.</span> to_csv <span class="token punctuation">(</span> <span class="token builtin">file</span> <span class="token punctuation">)</span> <span class="token comment"># Changing Delimiter while saving</span> symbol <span class="token operator">=</span> <span class="token string">"|"</span> df <span class="token punctuation">.</span> to_csv <span class="token punctuation">(</span> <span class="token builtin">file</span> <span class="token punctuation">,</span> sep <span class="token operator">=</span> symbol <span class="token punctuation">)</span> |
Read the documentation here .
3. Create a DataFrame from a list of lists:
If you want to create a DataFrame from a list, use the pd.DataFrame() method as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd data <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token number">3</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">4</span> <span class="token punctuation">,</span> <span class="token number">5</span> <span class="token punctuation">,</span> <span class="token number">6</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> df <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> data <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">'A'</span> <span class="token punctuation">,</span> <span class="token string">'B'</span> <span class="token punctuation">,</span> <span class="token string">'C'</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> <span class="token triple-quoted-string string">""" A B C 0 1 2 3 1 4 5 6 """</span> |
Read the documentation here .
4. Create a DataFrame from a dictionary:
If you want to create a DataFrame from a dictionary, use the pd.DataFrame() method as shown below:
1 2 3 4 5 6 7 8 9 10 11 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd data <span class="token operator">=</span> <span class="token punctuation">{</span> <span class="token string">'A'</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token number">2</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token string">'B'</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token number">3</span> <span class="token punctuation">,</span> <span class="token number">4</span> <span class="token punctuation">]</span> <span class="token punctuation">}</span> df <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> data <span class="token punctuation">)</span> <span class="token triple-quoted-string string">""" A B 0 1 3 1 2 4 """</span> |
Read more here .
5. Merge DataFrames (Merging DataFrames):
Merge operations in DataFrames are the same as JOIN operations in SQL. We use it to concatenate two DataFrames on one or more columns. If you want to merge two DataFrames, use the pd.merge() method as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd df1 <span class="token operator">=</span> pd <span class="token punctuation">.</span> <span class="token function">DataFrame</span> <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token string">"A"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token string">"B"</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col1"</span> <span class="token punctuation">,</span> <span class="token string">"col2"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> df2 <span class="token operator">=</span> pd <span class="token punctuation">.</span> <span class="token function">DataFrame</span> <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token string">"A"</span> <span class="token punctuation">,</span> <span class="token number">3</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token string">"B"</span> <span class="token punctuation">,</span> <span class="token number">4</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col2"</span> <span class="token punctuation">,</span> <span class="token string">"col3"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> pd <span class="token punctuation">.</span> <span class="token function">merge</span> <span class="token punctuation">(</span> df1 <span class="token punctuation">,</span> df2 <span class="token punctuation">,</span> on <span class="token operator">=</span> <span class="token string">"col2"</span> <span class="token punctuation">,</span> how <span class="token operator">=</span> <span class="token string">"inner"</span> <span class="token punctuation">)</span> <span class="token string">""</span> " col1 col2 col2 <span class="token number">0</span> <span class="token number">1</span> <span class="token constant">A</span> <span class="token number">3</span> <span class="token number">1</span> <span class="token number">2</span> <span class="token constant">B</span> <span class="token number">4</span> <span class="token string">""</span> " |
Read more here .
6. Sorting a DataFrame (Sorting a DataFrame):
If you want to sort a DataFrame based on the values in a specific column, use the sortvalues() method as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd df <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token string">"A"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">3</span> <span class="token punctuation">,</span> <span class="token string">"B"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token string">"C"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col1"</span> <span class="token punctuation">,</span> <span class="token string">"col2"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> df <span class="token punctuation">.</span> sort_values <span class="token punctuation">(</span> by <span class="token operator">=</span> <span class="token string">"col1"</span> <span class="token punctuation">)</span> <span class="token triple-quoted-string string">""" col1 col2 2 1 C 0 2 A 1 3 B """</span> |
Read more here .
7. Concatenating DataFrame (Concatenating):
If you want to concatenate DataFrames, use the pd.concat() method as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd df1 <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token string">"A"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token string">"B"</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col1"</span> <span class="token punctuation">,</span> <span class="token string">"col2"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> df2 <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token string">"A"</span> <span class="token punctuation">,</span> <span class="token number">3</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token string">"B"</span> <span class="token punctuation">,</span> <span class="token number">4</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col3"</span> <span class="token punctuation">,</span> <span class="token string">"col4"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> pd <span class="token punctuation">.</span> concat <span class="token punctuation">(</span> <span class="token punctuation">(</span> df1 <span class="token punctuation">,</span> df2 <span class="token punctuation">)</span> <span class="token punctuation">,</span> axis <span class="token operator">=</span> <span class="token number">1</span> <span class="token punctuation">)</span> <span class="token triple-quoted-string string">""" col1 col2 col3 col4 0 1 A A 3 1 2 B B 4 """</span> |
- axis = 1 stacks the columns together.
- axis = 0 lines the rows together, provided they match the column headers.
Read more here .
8. Rename columns:
If you want to rename one or more columns in the DataFrame, use the rename() method as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd df <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token string">"A"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token string">"B"</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col1"</span> <span class="token punctuation">,</span> <span class="token string">"col2"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> df <span class="token punctuation">.</span> rename <span class="token punctuation">(</span> columns <span class="token operator">=</span> <span class="token punctuation">{</span> <span class="token string">"col1"</span> <span class="token punctuation">:</span> <span class="token string">"col3"</span> <span class="token punctuation">,</span> <span class="token string">"col2"</span> <span class="token punctuation">:</span> <span class="token string">"col4"</span> <span class="token punctuation">}</span> <span class="token punctuation">)</span> <span class="token triple-quoted-string string">""" col3 col4 0 1 A 1 2 B """</span> |
Read more here .
9. Add a new column:
If you want to add a new column to the DataFrame, you can use the usual assignment operation as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd df <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token string">"A"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token string">"B"</span> <span class="token punctuation">]</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col1"</span> <span class="token punctuation">,</span> <span class="token string">"col2"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> df <span class="token punctuation">[</span> <span class="token string">"col3"</span> <span class="token punctuation">]</span> <span class="token operator">=</span> df <span class="token punctuation">[</span> <span class="token string">"col1"</span> <span class="token punctuation">]</span> <span class="token operator">+</span> <span class="token number">2</span> <span class="token triple-quoted-string string">""" Col1 col2 cols 0 1 A 3 1 2 B 4 """</span> |
10. Filter by condition:
If you want to filter rows from DataFrame based on a condition, you can do it like below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd df <span class="token operator">=</span> pd <span class="token punctuation">.</span> DataFrame <span class="token punctuation">(</span> <span class="token punctuation">[</span> <span class="token punctuation">[</span> <span class="token number">1</span> <span class="token punctuation">,</span> <span class="token string">"A"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token string">"B"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">2</span> <span class="token punctuation">,</span> <span class="token string">"A"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> <span class="token punctuation">[</span> <span class="token number">3</span> <span class="token punctuation">,</span> <span class="token string">"C"</span> <span class="token punctuation">]</span> <span class="token punctuation">,</span> columns <span class="token operator">=</span> <span class="token punctuation">[</span> <span class="token string">"col1"</span> <span class="token punctuation">,</span> <span class="token string">"col2"</span> <span class="token punctuation">]</span> <span class="token punctuation">)</span> df <span class="token punctuation">(</span> df <span class="token punctuation">.</span> col1 <span class="token operator">></span> <span class="token number">1</span> <span class="token punctuation">]</span> <span class="token triple-quoted-string string">""" col1 col2 1 2 B 2 2 A 3 3 C """</span> |
Part 2 will be released soon. Thanks for reading. I hope this article was helpful.
References
https://towardsdatascience.com/20-of-pandas-functions-that-data-scientists-use-80-of-the-time-a4ff1b694707 https://pandas.pydata.org/docs/index.html