Lesson
Box plots
Learn Box plots in SQLPad's Data Science in Action: Interactive Visualization with Plotly and Pandas course with practical examples and guided lessons.
Introduction
In this lesson, we will learn about box plots, a powerful visualization tool for displaying the distribution of data. We will use Plotly, a popular library for creating interactive charts in Python, to create box plots from Pandas DataFrames. Box plots are especially useful for comparing distributions of data across different categories or groups. By the end of this lesson, you will be able to create and customize box plots using Plotly and Pandas.
Creating a Simple Box Plot
In this code example, we will create a simple box plot using the built-in dataset iris from Plotly Express. We will use Plotly for plotting the chart and Pandas for handling the dataset.
First, we need to import Plotly Express and Pandas libraries. Then, load the built-in dataset iris and store it in a pandas dataframe named df. Finally, print the first few rows of the dataframe using df.head().
import pandas as pd
import plotly.express as px
# Load the tips dataset
df = px.data.iris()
# Display the first few rows of the dataset
df.head()
Now that we have the dataset, we can create a box plot using Plotly Express. We will plot the box plot for the variable sepal_width and group the data by the species column. Finally, we will show the plot using fig.show().
fig = px.box(df, x='species', y='sepal_width', title='Box Plot of Sepal Width by Species')
fig.show()
Customizing Box Plot Colors and Styles
In this code example, we will be customizing box plot colors and styles using the Plotly library. We will be using the built-in Iris dataset from the Plotly library for this example.
First, let's load the data and display the first few rows.
import plotly.express as px
# Load the Iris dataset
df = px.data.iris()
# Display the first few rows of the dataset
df.head()
Now, let's create a box plot with customized colors and styles.
# Create a box plot with custom colors and styles
fig = px.box(df, x="species", y="sepal_width", points="all",
color="species",
title="Customizing Box Plot Colors and Styles",
labels={"species": "Species", "sepal_width": "Sepal Width"},
category_orders={"species": ["setosa", "versicolor", "virginica"]},
template="plotly_dark")
# Update marker properties
fig.update_traces(marker=dict(size=6, line=dict(width=1, color='DarkSlateGrey')))
# Show the plot
fig.show()
Horizontal Box Plots
In this code example, we will create a horizontal box plot using Plotly and Pandas built-in dataset. We will be using the 'iris' dataset to demonstrate this example.
First, let's import the required libraries and load the dataset.
import plotly.express as px
import pandas as pd
# Load the built-in iris dataset from plotly
df = px.data.iris()
# Display the first 5 rows of the dataset
print(df.head())
Now that we have our dataset, let's create a horizontal box plot.
# Create a horizontal box plot
fig = px.box(df, y="species", x="sepal_width", orientation="h", color="species", title="Horizontal Box Plot - Sepal Width")
# Show the plot
fig.show()
Grouped Box Plots
In this code example, we will create a grouped box plot using the built-in dataset called 'tips' from the seaborn library. We will use the seaborn library to load the dataset, pandas library to manipulate the data, and plotly to create the plot.
First, let's load the dataset and create a Pandas DataFrame:
import pandas as pd
import plotly.express as px
# Load the tips dataset
data = px.data.tips()
# Create a pandas dataframe
df = pd.DataFrame(data)
# Preview the dataframe
print(df.head())
Next, let's create the grouped box plot using the Plotly library:
import plotly.express as px
# Create a grouped box plot
fig = px.box(df, x='day', y='total_bill', color='sex')
# Show the plot
fig.show()
Adding Jitter Points to Box Plots
In this code example, we will be adding jitter points to box plots using Plotly and Pandas libraries. We will use the built-in dataset "iris" from the Plotly library.
First, let's import the required libraries and load the dataset:
import plotly.express as px
import pandas as pd
# Load the iris dataset
df = px.data.iris()
print(df.head())
Now that we have our dataset, let's create a box plot with jitter points:
# Create a box plot with jitter points
fig = px.box(df, x="species", y="sepal_width", points="all")
# Show the plot
fig.show()
By following these two code blocks, you will be able to create a box plot with jitter points using the iris dataset.
Exercises
1. Box Plots with Plotly
Instruction
Create a box plot using the iris dataset from Plotly Express, with the x-axis representing the species and the y-axis representing the sepal width. Customize the appearance of the box plot by setting the marker symbol to a star, the marker size to 6, and the line width to 2. Also, update the plot title and axis titles.
My Solution
# Your solution goes here
Hint
- Load the iris dataset using
px.data.iris(). - Create a box plot using
px.box()with the x-axis set to 'species' and the y-axis set to 'sepal_width'. - Customize the appearance of the box plot using the
update_traces()method. - Update the plot title and axis titles using the
update_layout()method. - Show the plot using
fig.show().
Solution
import plotly.express as px
# Load the iris dataset
data = px.data.iris()
# Create a box plot
fig = px.box(data, x='species', y='sepal_width', color='species', boxmode='group', template='plotly_dark')
# Customize the plot
fig.update_traces(marker=dict(symbol='star', size=6, line=dict(width=2, color='orange')), line=dict(color='orange'))
# Update the layout
fig.update_layout(title='Box Plot of Sepal Width by Species', xaxis_title='Species', yaxis_title='Sepal Width (cm)')
# Show the plot
fig.show()