Lesson
Understanding the structure of Pandas DataFrames and Series
Learn Understanding the structure of Pandas DataFrames and Series in SQLPad's Data Science in Action course with practical examples and guided lessons.
In this lesson, we will learn about the basic structure of Pandas DataFrames and Series, which are essential components in data manipulation and visualization using the Plotly library.
What are Pandas DataFrames and Series?
Pandas is a powerful open-source library in Python used for data manipulation and analysis. It provides two main classes: DataFrames and Series.
-
A DataFrame is a two-dimensional table where data is organized in rows and columns. It is similar to a spreadsheet or a SQL table. You can think of a DataFrame as a collection of Series, where each column represents a separate Series.
-
A Series is a one-dimensional array of indexed data. It can be thought of as a single column in a DataFrame or a simple list of data.
Creating a Pandas DataFrame
Let's create a simple Pandas DataFrame using a Python dictionary:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 30, 35, 19],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Seattle']
}
df = pd.DataFrame(data)
print(df)
Accessing Data in DataFrames
You can access the data in a DataFrame using column names and row indices:
- To access a specific column, use the column name inside square brackets
[]:
ages = df['Age']
print(ages)
- To access a specific row, use the
iloc[]function with the row index:
first_row = df.iloc[0]
print(first_row)
Basic Plotting with Plotly and Pandas DataFrames
Now let's create a simple bar plot using Plotly and the DataFrame we created earlier:
import plotly.express as px
fig = px.bar(df, x='Name', y='Age', text='Age')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()
This will create a bar plot with names on the x-axis and ages on the y-axis.
Using Built-in Datasets from Plotly or Pandas
Both Plotly and Pandas provide built-in datasets that can be used for practice and demonstration. Let's use a built-in dataset from Plotly, called iris, and create a scatter plot:
import plotly.express as px
iris = px.data.iris()
fig = px.scatter(iris, x='sepal_width', y='sepal_length', color='species',
size='petal_length', hover_data=['petal_width'])
fig.show()
This will create a scatter plot with sepal width on the x-axis, sepal length on the y-axis, and different colors for different species. The size of the points represents the petal length, and when you hover over a point, you will see the petal width.
Summary
In this lesson, we learned the basics of Pandas DataFrames and Series, how to create and access data in them, and how to create simple plots using Plotly with Pandas DataFrames. Now you have a good understanding of the structure of Pandas DataFrames and Series, which is essential for data manipulation and visualization using the Plotly library.
Exercises
1. Creating a Bar Plot with Pandas DataFrame and Plotly
Instruction
In this exercise, you will create a bar plot using the given Pandas DataFrame and the Plotly library. Follow these steps:
- Import the necessary libraries:
pandasandplotly.express. - Create a Pandas DataFrame using the given data dictionary.
- Use the
plotly.express.bar()function to create a bar plot with names on the x-axis and ages on the y-axis. - Update the traces of the plot to display the age values outside the bars.
- Update the layout of the plot to adjust the text size and mode.
- Finally, display the plot using the
fig.show()method.
My Solution
# Your solution goes here
Hint
- Import the libraries using
import pandas as pdandimport plotly.express as px. - Create the DataFrame using
df = pd.DataFrame(data). - Create the bar plot using
fig = px.bar(df, x='Name', y='Age', text='Age'). - Update the traces using
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside'). - Update the layout using
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide'). - Display the plot using
fig.show().
Solution
import pandas as pd
import plotly.express as px
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 30, 35, 19],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Seattle']
}
df = pd.DataFrame(data)
fig = px.bar(df, x='Name', y='Age', text='Age')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()