Lesson

Scatter plots

Learn Scatter plots in SQLPad's Data Science in Action: Interactive Visualization with Plotly and Pandas course with practical examples and guided lessons.

Introduction

In this lesson, we will explore one of the most common and fundamental types of charts: Scatter plots. Scatter plots are widely used to visualize the relationship between two variables by displaying them as individual points on a 2D plane. By using Plotly and Pandas, we can create interactive and responsive scatter plots with ease.

In this lesson, you will learn how to create basic scatter plots, customize their appearance, and add various interactive features to enhance your data visualization experience.

Creating a Basic Scatter Plot

In this code example, we will create a basic scatter plot using the built-in dataset from the Plotly library. We will first load the dataset and then create a scatter plot using Plotly's graph_objects module.

Code Block 1: Loading the Dataset

First, let's load the built-in dataset from Plotly, which contains data about the Iris flower species. We will use the px.data.iris() function to load the dataset and then display the first 5 rows using the head() function.

import plotly.express as px

# Load the built-in Iris dataset
df = px.data.iris()

# Display the first 5 rows of the dataset
print(df.head())

Code Block 2: Creating the Scatter Plot

Next, let's create a scatter plot using Plotly's graph_objects module. We will plot the sepal width on the x-axis and sepal length on the y-axis. We will also color the data points based on the species of the Iris flower.

import plotly.graph_objects as go

# Create a scatter plot using Plotly's graph_objects module
fig = go.Figure(go.Scatter(x=df['sepal_width'],
                           y=df['sepal_length'],
                           mode='markers',
                           marker=dict(color=df['species_id'],
                                       colorscale='Viridis',
                                       showscale=True),
                           text=df['species']))

# Set the title and axis labels
fig.update_layout(title='Iris Dataset: Sepal Width vs Sepal Length',
                  xaxis_title='Sepal Width (cm)',
                  yaxis_title='Sepal Length (cm)')

# Display the scatter plot
fig.show()

Customizing Scatter Plot Markers

In this code example, we will learn how to customize scatter plot markers using Plotly and a built-in dataset from Plotly.

First, let's import necessary libraries and load the dataset.

import plotly.express as px

# Load built-in dataset
df = px.data.iris()

# Display the first few rows of the dataset
df.head()

Now, let's create a scatter plot and customize the markers.

fig = px.scatter(df, x='sepal_width', y='sepal_length', color='species',
                 size='petal_length', hover_data=['petal_width'],
                 title='Customizing Scatter Plot Markers',
                 labels={'sepal_width': 'Sepal Width (cm)',
                         'sepal_length': 'Sepal Length (cm)',
                         'species': 'Species'})

# Customize marker symbols, opacity, and size range
fig.update_traces(marker=dict(symbol='star', 
                          opacity=0.7, line=dict(width=1, color='black')))

fig.show()


Adding Text Labels to Scatter Plot Points

In this code example, you will learn how to add text labels to scatter plot points using Plotly and Pandas.

First, let's create a DataFrame using the built-in dataset iris from Plotly and display its head:

import plotly.express as px
import pandas as pd

# Load built-in iris dataset
df = px.data.iris()

# Display the head of the dataframe
print(df.head())

Now, let's create a scatter plot and add text labels to the points:

# Create a scatter plot
fig = px.scatter(df, x='sepal_width', y='sepal_length', 
        color='species_id', text='species')

# Add text labels to the scatter plot points
fig.update_traces(textposition='top center')

# Show the plot
fig.show()

Adjusting Scatter Plot Axis Ranges

In this code example, we will learn how to adjust the axis ranges of a scatter plot using the Plotly library in Python. We will use the built-in iris dataset from the Plotly library.

First, let's import the required libraries and load the dataset.

import plotly.express as px
import pandas as pd

# Load the Iris dataset
df = px.data.iris()

# Display the first few rows of the dataset
df.head()

Now, let's create a scatter plot and adjust the axis ranges.

# Create a scatter plot with custom axis ranges
fig = px.scatter(df, x='sepal_width', y='sepal_length', color='species',
                 title='Scatter Plot with Custom Axis Ranges')

# Set the x-axis range from 2 to 4.5
fig.update_xaxes(range=[2, 4.5])

# Set the y-axis range from 4 to 8
fig.update_yaxes(range=[4, 8])

# Display the scatter plot
fig.show()

In this example, we created a scatter plot of the iris dataset and adjusted the x-axis range from 2 to 4.5 and the y-axis range from 4 to 8. You can customize the ranges according to your needs.

Adding Hover Information to Scatter Plots

In this code example, we will show you how to add hover information to scatter plots using the Plotly library in Python. We will use the built-in dataset "iris" from the plotly.express library.

Code Block 1: Constructing the Pandas DataFrame

import plotly.express as px

# Load the built-in iris dataset
df = px.data.iris()

# Display the first five rows of the dataset
print(df.head())

Code Block 2: Constructing the Scatter Plot with Hover Information

import plotly.graph_objs as go

# Create a scatter plot with custom hover information
fig = go.Figure()

# Define the hover template
hovertemplate = (
    "<b>Species:</b> %{text}<br><b>sepal_width:</b> %{x}<br><b>sepal_length:</b> %{y}<extra></extra>"
)

# Add scatter plot traces for each species
for species, species_data in df.groupby("species"):
    fig.add_trace(
        go.Scatter(
            x=species_data["sepal_width"],
            y=species_data["sepal_length"],
            mode="markers",
            text=[species] * len(species_data),
            name=species,
            hovertemplate=hovertemplate,
        )
    )

# Set plot title and axis labels
fig.update_layout(
    title="Scatter Plot of Sepal Width vs Sepal Length with Hover Information",
    xaxis_title="Sepal Width",
    yaxis_title="Sepal Length",
)

# Show the plot
fig.show()

Multiple Scatter Plots on One Graph

In this code example, we will create multiple scatter plots on one graph using Plotly and Pandas built-in datasets. We will use the iris dataset from Plotly Express and the tips dataset from Seaborn.

First, let's import the libraries and load the datasets:

import pandas as pd
import plotly.graph_objs as go
import plotly.express as px

iris = px.data.iris()
tips = px.data.tips()

print(iris.head())
print(tips.head())

Now that we have the datasets loaded, let's create multiple scatter plots on one graph:

fig = go.Figure()

# Scatter plot for iris dataset
fig.add_trace(go.Scatter(x=iris['sepal_width'], y=iris['sepal_length'],
                         mode='markers', name='Iris'))

# Scatter plot for tips dataset
fig.add_trace(go.Scatter(x=tips['total_bill'], y=tips['tip'],
                         mode='markers', name='Tips'))

# Customize the layout
fig.update_layout(title='Multiple Scatter Plots on One Graph',
                  xaxis_title='X-axis label',
                  yaxis_title='Y-axis label')

fig.show()

In this example, we have created two scatter plots using the iris and tips datasets and displayed them on the same graph. The add_trace() function is used to add each scatter plot to the graph. The update_layout() function is used to customize the graph's title and axis labels.

Creating a 3D Scatter Plot

In this code example, we will create a 3D scatter plot using Plotly and a built-in dataset from Plotly.

First, let's import the required libraries and load the dataset. We will use the iris dataset for this example.

import plotly.express as px
import pandas as pd

# Load the iris dataset
df = px.data.iris()

# Display the first few rows of the dataset
print(df.head())

Now, let's create a 3D scatter plot using the scatter_3d function from Plotly Express. We will plot the sepal_width on the x-axis, sepal_length on the y-axis, and petal_length on the z-axis. We will also color the points based on the species column.

# Create a 3D scatter plot
fig = px.scatter_3d(df, x='sepal_width', y='sepal_length', z='petal_length', color='species')

# Show the plot
fig.show()

Exercises

1. Scatter Plot of Iris Dataset with Custom Markers and Size

Instruction

Create a scatter plot of the Iris dataset with custom markers and size. Use the sepal_width column for the x-axis, the sepal_length column for the y-axis, and the species column for coloring the points. Additionally, set the marker symbol based on the species, size the markers based on the petal_length column, and display the species name when hovering over a point.

My Solution

# Your solution goes here

Hint

  1. Import the plotly.express module as px.
  2. Load the Iris dataset using px.data.iris().
  3. Create a scatter plot using px.scatter() with the following parameters:
  4. data: The dataset to be used for the plot
  5. x: The name of the column to be plotted on the x-axis ('sepal_width')
  6. y: The name of the column to be plotted on the y-axis ('sepal_length')
  7. color: The name of the column to be used for coloring the points ('species')
  8. symbol: The name of the column to be used for changing the symbol of the points ('species')
  9. size: The name of the column to be used for sizing the points ('petal_length')
  10. hover_name: The name of the column to be displayed when hovering over a point ('species')
  11. title: The title of the plot ('Scatter Plot of Iris Dataset with Custom Markers and Size')
  12. Show the plot using the show() method of the Figure object.

Solution

import plotly.express as px

# Load the Iris dataset
data = px.data.iris()

# Create a scatter plot of sepal length vs. sepal width with custom markers and size
fig = px.scatter(data, x='sepal_width', y='sepal_length', color='species', symbol='species', size='petal_length', hover_name='species', title='Scatter Plot of Iris Dataset with Custom Markers and Size')

# Show the plot
fig.show()