Lesson

Understanding the structure of Pandas DataFrames and Series

Learn Understanding the structure of Pandas DataFrames and Series in SQLPad's Data Science in Action course with practical examples and guided lessons.

In this lesson, we will learn about the basic structure of Pandas DataFrames and Series, which are essential components in data manipulation and visualization using the Plotly library.

What are Pandas DataFrames and Series?

Pandas is a powerful open-source library in Python used for data manipulation and analysis. It provides two main classes: DataFrames and Series.

  • A DataFrame is a two-dimensional table where data is organized in rows and columns. It is similar to a spreadsheet or a SQL table. You can think of a DataFrame as a collection of Series, where each column represents a separate Series.

  • A Series is a one-dimensional array of indexed data. It can be thought of as a single column in a DataFrame or a simple list of data.

Creating a Pandas DataFrame

Let's create a simple Pandas DataFrame using a Python dictionary:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 35, 19],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Seattle']
}

df = pd.DataFrame(data)
print(df)

Accessing Data in DataFrames

You can access the data in a DataFrame using column names and row indices:

  • To access a specific column, use the column name inside square brackets []:
ages = df['Age']
print(ages)
  • To access a specific row, use the iloc[] function with the row index:
first_row = df.iloc[0]
print(first_row)

Basic Plotting with Plotly and Pandas DataFrames

Now let's create a simple bar plot using Plotly and the DataFrame we created earlier:

import plotly.express as px

fig = px.bar(df, x='Name', y='Age', text='Age')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

This will create a bar plot with names on the x-axis and ages on the y-axis.

Using Built-in Datasets from Plotly or Pandas

Both Plotly and Pandas provide built-in datasets that can be used for practice and demonstration. Let's use a built-in dataset from Plotly, called iris, and create a scatter plot:

import plotly.express as px

iris = px.data.iris()

fig = px.scatter(iris, x='sepal_width', y='sepal_length', color='species', 
                            size='petal_length', hover_data=['petal_width'])
fig.show()

This will create a scatter plot with sepal width on the x-axis, sepal length on the y-axis, and different colors for different species. The size of the points represents the petal length, and when you hover over a point, you will see the petal width.

Summary

In this lesson, we learned the basics of Pandas DataFrames and Series, how to create and access data in them, and how to create simple plots using Plotly with Pandas DataFrames. Now you have a good understanding of the structure of Pandas DataFrames and Series, which is essential for data manipulation and visualization using the Plotly library.

Exercises

1. Creating a Bar Plot with Pandas DataFrame and Plotly

Instruction

In this exercise, you will create a bar plot using the given Pandas DataFrame and the Plotly library. Follow these steps:

  1. Import the necessary libraries: pandas and plotly.express.
  2. Create a Pandas DataFrame using the given data dictionary.
  3. Use the plotly.express.bar() function to create a bar plot with names on the x-axis and ages on the y-axis.
  4. Update the traces of the plot to display the age values outside the bars.
  5. Update the layout of the plot to adjust the text size and mode.
  6. Finally, display the plot using the fig.show() method.

My Solution

# Your solution goes here

Hint

  1. Import the libraries using import pandas as pd and import plotly.express as px.
  2. Create the DataFrame using df = pd.DataFrame(data).
  3. Create the bar plot using fig = px.bar(df, x='Name', y='Age', text='Age').
  4. Update the traces using fig.update_traces(texttemplate='%{text:.2s}', textposition='outside').
  5. Update the layout using fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide').
  6. Display the plot using fig.show().

Solution

import pandas as pd
import plotly.express as px

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 35, 19],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Seattle']
}

df = pd.DataFrame(data)

fig = px.bar(df, x='Name', y='Age', text='Age')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()