Combining DataFrames with Pandas: Exploring merge(), join(), concat(), and append() Methods

In this article, we will learn how to combine DataFrames with Pandas in Python. We’ll look at four different methods so that you can choose between them based on your needs. There are several use cases where we might want to combine multiple DataFrames:

  • Data integration: When working with data from different sources, we may need to combine them into a single data frame to perform analyses or build models that require data from multiple datasets.
  • Data cleaning and preprocessing: Combining DataFrames allows us to handle missing values, duplicate records, and other data quality issues in a structured manner.
  • Data enrichment: We might have additional information or features stored in a separate DataFrame that need to be added to an existing DataFrame to enrich the data for analysis.
  • Database-like joins: DataFrames can be merged similarly to SQL joins, allowing us to combine data based on common columns or indices.
  • Time series alignment: When dealing with time series data from different sources, we may want to align the data based on timestamps or time intervals.
  • Hierarchical data: Merging allows us to combine DataFrames with hierarchical or nested data structures.

Combine Two Pandas DataFrame in Python

Here we have created two DataFrames df1 and df2 and we have printed them.

Example:

import pandas as pd
df1 = pd.DataFrame({
    'ID':[1,2,3,5,9],
    'Col_1':[1,2,3,4,5],
    'Col_2':[6,7,8,9,10],
    'Col_3':[11,12,13,14,15],
    'Col_4':['apple','orange','banana','strawberry','raspberry']
    
})
df2 = pd.DataFrame({
    'ID':[1,1,3,5],
    'Col_A':[8,9,10,11],
    'Col_B':[12,13,15,17],
    'Col_4':['apple','orange','banana','kiwi']
})

Output:

Two Pandas DataFrames in Python

We can combine the two DataFrames in Python in the following ways:

  • Combining DataFrames using merge( )
  • Combining DataFrames using join( )
  • Combining DataFrames using concat( )
  • Combining DataFrames using append( )

Let us look at these methods one by one with several examples for better understanding.

1. Combining DataFrames using merge()

merge( ) is used for combining data on common columns. It is the most flexible, but also complex method, many-to-one, and many-to-many join are possible.

Example 1:

Here we are just doing the inner join which is by default in merge( ).

import pandas as pd
pd.merge(df1,df2)

Output:

Combining DataFrame using merge() Example 1

Example 2:

Here we have specified an ID column to merge on.

import pandas as pd
pd.merge(df1,df2,on='ID')

Output:

Combining DataFrame using merge() Example 2

Example 3:

Here we are merging on the basis of columns that are common between the DataFrames.

import pandas as pd
pd.merge(df1,df2,on=['ID','Col_4'])

Output:

Combining DataFrame using merge() Example 3

Example 4:

Here we have provided the suffixes to columns and merged on columns that are unique to each other.

import pandas as pd
pd.merge(df1,df2,suffixes=['_l','_r'],left_on='Col_2',right_on='Col_A')

Output:

Combining DataFrame using merge() Example 4

Example 5:

Here we have merged the indexes of our DataFrames.

import pandas as pd
pd.merge(df1,df2,suffixes=['_l','_r'],left_index=True,right_index=True)

Output:

Combining DataFrame using merge() Example 5

Example 6:

Here we have joined all rows from both DataFrames, no data have been lost and it’s called outer join.

import pandas as pd
pd.merge(df1,df2, on='Col_4', how ='outer', suffixes=['_l','_r'])

Output:

Combining DataFrame using merge() Example 6

Example 7:

Here we have joined all rows from a left DataFrames, the rows from the right DataFrame that do not match in the key column of the left DataFrame are discarded and it’s called left join.

import pandas as pd
pd.merge(df1, df2, on='Col_4', how ='left', suffixes=['_l','_r'])

Output:

Combining DataFrame using merge() Example 7

Example 8:

Here we have joined all the rows from a right DataFrame, the rows from the left DataFrame that do not have a match in the key column of the right DataFrame are discarded and it’s called right join.

import pandas as pd
pd.merge(df1,df2, on='Col_4', how ='right', suffixes=['_l','_r'])

Output:

Combining DataFrame using merge() Example 8

2. Combining DataFrames using join()

We can also use join() function to combine DataFrames. Let’s see how.

Example 1:

Here we have joined the left DataFrame that is df1 with the data that we want to join that is df2 and we also specified the suffixes.

import pandas as pd
df1.join(df2, on='ID' , lsuffix='_l', rsuffix='_r')

Output:

Combining DataFrame using join() Example 1

Example 2:

Here we have specified join type which is inner just like merge( ).

import pandas as pd
df1.join(df2,on='ID', how ='inner', lsuffix='_l', rsuffix='_r')

Output:

Combining DataFrame using join() Example 2

3. Combining DataFrames using concat()

In concat( ) we can join DataFrames side by side and stack them.

Example 1:

Here we have taken the default axis which is 0.

import pandas as pd
pd.concat([df1,df2])

We have written pd.concat( ) and passes the DataFrames df1 and df2 that we wanted to combine.

After running the code we can see that both the DataFrame got stacked on top of one another and our index wasn’t reset.

Output:

Combining DataFrame using concat() Example 1

Example 2:

Here we have reset the index so that we don’t have repeated values. So for that, we have put ignore_index=True.

import pandas as pd
pd.concat([df1,df2], ignore_index=True)

After running the code we can see that index got reset and goes from zero to eight with no repeated values.

Output:

Combining DataFrame using concat() Example 2

Example 3:

Here we have concatenated DataFrame df1 and df2 side by side by using axis=1.

import pandas as pd
pd.concat([df1,df2], axis=1)

After running this code we can see that we got a DataFrame similar to merging.

Output:

Combining DataFrame using concat() Example 3

Example 4:

The default join type of concat( ) is outer but here we have specified the join type as inner which is join=’inner’.

import pandas as pd
pd.concat([df1,df2],axis=1, join='inner')

Output:

Combining DataFrame using concat() Example 4

Example 5:

Here we have specified axis = 0 with join=’ínner’.

import pandas as pd
pd.concat([df1,df2],axis=0, join='inner')

Output:

Combining DataFrame using concat() Example 5

4. Combining DataFrames using append()

append( ) is the last method of combining DataFrames.

Example:

Here we appended the data to our left DataFrame which is df1 with df2.

import pandas as pd
df1.append(df2)

Output:

Combining DataFrame using append() Example

Summary

Combining Pandas DataFrame is a powerful technique for integrating, cleaning, and enriching data from various sources. It allows us to prepare our data for analysis and modelling, making it an essential skill for data scientists and analysts. We have discussed four methods for combining DataFrame with examples. After reading this article, we hope you can easily Combine Pandas DataFrame in Python.

Reference

https://stackoverflow.com/questions/12850345/how-do-i-combine-two-dataframes

Priyanshu Singh
Priyanshu Singh
Articles: 44