Lesson
Bubble charts
Learn Bubble charts in SQLPad's Data Science in Action: Interactive Visualization with Plotly and Pandas course with practical examples and guided lessons.
Welcome to the lesson on Bubble charts in the course 'Data Science in Action: Interactive Visualization with Plotly and Pandas'. In this lesson, we will focus on creating advanced and interactive bubble charts using Plotly, a popular Python library for creating interactive visualizations. Bubble charts are a powerful way to visualize multi-dimensional data, as they can display three or more dimensions of data in a single chart. By combining the use of x-axis, y-axis, and varying the size and color of the bubbles, we can effectively visualize complex datasets and extract valuable insights.
Creating a Basic Bubble Chart
In this code example, we will create a basic bubble chart using Plotly and a built-in dataset from Pandas. The bubble chart will display the relationship between three variables.
Step 1: Importing necessary libraries and preparing the data
import pandas as pd
import plotly.express as px
# Load built-in dataset from seaborn library
df = px.data.iris()
# Display the first 5 rows of the data
df.head()
Step 2: Creating the bubble chart
# Create a bubble chart using plotly express
fig = px.scatter(df,
x="sepal_width",
y="sepal_length",
size="petal_length",
color="species",
title="Iris Bubble Chart",
labels={"sepal_width":"Sepal Width",
"sepal_length":"Sepal Length",
"petal_length":"Petal Length",
"species":"Species"},
color_discrete_sequence=px.colors.qualitative.Pastel)
# Display the bubble chart
fig.show()
Customizing Bubble Size and Color
In this code example, we will customize the size and color of the bubbles in a bubble chart using Plotly and Pandas.
Code Block 1: Preparing the Data
First, we will import the required libraries and load the built-in iris dataset from Plotly Express. We will then print the first 5 rows of the dataset.
import plotly.express as px
import pandas as pd
# Load built-in iris dataset
df = px.data.iris()
# Display first 5 rows of the dataset
print(df.head())
Code Block 2: Customizing Bubble Size and Color
Next, we will create a bubble chart using the iris dataset. We will customize the size and color of the bubbles using the size and color parameters. The size parameter will be set to the petal_width column, and the color parameter will be set to the species_id column. Finally, we will display the chart.
# Create a bubble chart with custom size and color
fig = px.scatter(df, x='sepal_width', y='sepal_length', size='petal_width', color='species_id', hover_name='species', title='Iris Dataset Bubble Chart')
# Display the chart
fig.show()
Adding Text and Hover Information
In this code example, we will create a bubble chart and add text and hover information using Plotly and Pandas. We will use the built-in Iris dataset from Plotly for this example.
Code Block 1: Importing libraries and loading the dataset
First, let's import the necessary libraries and load the dataset into a Pandas dataframe.
import plotly.express as px
import pandas as pd
# Load the Iris dataset
data = px.data.iris()
# Create a Pandas dataframe
df = pd.DataFrame(data)
# Display the first 5 rows of the dataframe
print(df.head())
Code Block 2: Creating the bubble chart with text and hover information
Now, let's create the bubble chart using Plotly Express and add text and hover information to the chart.
# Create the bubble chart
fig = px.scatter(df,
x='sepal_width',
y='sepal_length',
size='petal_length',
color='species_id',
hover_name='species',
text='species',
title='Iris Dataset: Sepal Length vs. Sepal Width')
# Update the layout and hover mode
fig.update_layout(hovermode='closest')
# Show the chart
fig.show()
Animating Bubble Charts
In this code example, we will create an animated bubble chart using Plotly and Pandas.
First, let's import the necessary libraries and load a built-in dataset from Plotly.
import plotly.express as px
import pandas as pd
# Load the built-in Gapminder dataset from Plotly
df = px.data.gapminder()
# Display the first few rows of the dataset
df.head()
Now, let's create an animated bubble chart using the dataset.
# Create an animated bubble chart
fig = px.scatter(df,
x="gdpPercap",
y="lifeExp",
size="pop",
color="continent",
hover_name="country",
log_x=True,
size_max=60,
range_y=[20, 90],
animation_frame="year",
title="Gapminder Global Indicators: GDP per Capita vs Life Expectancy")
# Show the chart
fig.show()
Using Color Scales
In this code example, we will create a bubble chart using a color scale. We will use the built-in dataset from Plotly Express called gapminder.
First, let's import the required libraries and load the dataset.
import plotly.express as px
# Load the gapminder dataset
df = px.data.gapminder()
# Display the first few rows of the dataset
df.head()
Now that we have our dataset, let's create a bubble chart using the scatter function from Plotly Express. We will use the year, gdpPercap, lifeExp, and continent columns to create the chart, and we will apply a color scale based on the continent column.
# Create a bubble chart with color scale
fig = px.scatter(df,
x="gdpPercap",
y="lifeExp",
size="pop",
color="continent",
hover_name="country",
log_x=True,
size_max=60,
animation_frame="year",
range_y=[20, 90],
labels={"gdpPercap": "GDP per Capita", "lifeExp": "Life Expectancy", "pop": "Population", "continent": "Continent"},
title="GDP per Capita vs. Life Expectancy (1952-2007)")
# Show the chart
fig.show()
In the code above, we set the x-axis to gdpPercap, the y-axis to lifeExp, and the size of the bubbles to pop. We also used a logarithmic scale for the x-axis and set the maximum bubble size to 60. We added an animation frame for each year in the dataset, and we customized the labels and title for better readability.
Exercises
1. Bubble Charts
Instruction
In this exercise, you will create a Bubble chart to visualize the relationship between GDP per capita, life expectancy, and population for different countries in 2007 using the Gapminder dataset. Then, you will customize the appearance of the Bubble chart by adding a linear regression line to the plot.
Follow these steps:
- Import the necessary libraries and load the built-in Gapminder dataset.
- Filter the dataset to only include data from 2007.
- Create a scatter plot with custom marker sizes.
- Set the axis labels and plot title.
- Calculate the linear regression line.
- Add the regression line to the plot.
- Display the plot using
fig.show().
My Solution
# Your solution goes here
Hint
Start by importing the necessary libraries and loading the Gapminder dataset. Then, filter the dataset to only include data from 2007. Create a scatter plot with custom marker sizes and set the axis labels and plot title. Finally, calculate the linear regression line, add it to the plot, and display the plot using fig.show().
Solution
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
data = px.data.gapminder()
data['continent_id'] = pd.factorize(data['continent'])[0]
data_2007 = data[data['year'] == 2007]
fig = go.Figure(go.Scatter(
x=data_2007['gdpPercap'],
y=data_2007['lifeExp'],
mode='markers',
text=data_2007['country'],
marker=dict(
size=data_2007['pop'] / 1e6,
sizemode='diameter',
sizeref=0.1,
showscale=True,
colorscale='Viridis',
color=data_2007['continent_id'],
line=dict(width=1)
)
))
fig.update_layout(
title="GDP per Capita vs Life Expectancy (2007)",
xaxis=dict(title="GDP per Capita"),
yaxis=dict(title="Life Expectancy")
)
m, b = np.polyfit(data_2007['gdpPercap'], data_2007['lifeExp'], 1)
x = np.linspace(data_2007['gdpPercap'].min(), data_2007['gdpPercap'].max(), 100)
y = m * x + b
fig.add_trace(go.Scatter(
x=x,
y=y,
mode='lines',
name='Linear Regression',
line=dict(color='red')
))
fig.show()