Lesson

Introduction to Pandas

Learn Pandas fundamentals in SQLPad's Python Pandas Mastery course with practical examples and guided lessons.

Welcome to the first lesson of the Python Pandas Mastery: An Interactive and Practical Guide to Data Analysis course! In this lesson, we'll introduce you to Pandas, a powerful library for data manipulation and analysis.

Pandas is an open-source library that provides high-performance, easy-to-use data structures and data analysis tools for Python. It is built on top of the NumPy library and provides two key data structures: Series and DataFrame. Pandas is designed to work with a wide variety of data sources and formats, making it an essential tool for any data analyst or data scientist working with Python.

In this lesson, we'll give you a brief overview of Pandas, explain its key features and benefits, and provide you with some examples to demonstrate its capabilities. We'll also discuss how the Pandas library fits in with the other lessons in this course. Get ready to dive into the world of Python Pandas!

For this course, we've prepared online Python Code Editor, so you can execute almost all your python code in the browser without installing anything. simply click the play button to execute the code.

Welcome to the first lesson of the Python Pandas Mastery: An Interactive and Practical Guide to Data Analysis course! In this lesson, we'll introduce you to Pandas, a powerful library for data manipulation and analysis.

Pandas is an open-source library that provides high-performance, easy-to-use data structures and data analysis tools for Python. It is built on top of the NumPy library and provides two key data structures: Series and DataFrame. Pandas is designed to work with a wide variety of data sources and formats, making it an essential tool for any data analyst or data scientist working with Python.

In this lesson, we'll give you a brief overview of Pandas, explain its key features and benefits, and provide you with some examples to demonstrate its capabilities. We'll also discuss how the Pandas library fits in with the other lessons in this course. Get ready to dive into the world of Python Pandas!

For this course, we've prepared online Python Code Editor, so you can execute almost all your python code in the browser without installing anything. simply click the play button to execute the code.

Key Features of Pandas

# Importing pandas library
import pandas as pd

# Creating a simple data frame manually
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)

# Displaying the data frame
print(df)

Code Example 2: Loading a built-in dataset

# Importing seaborn library to load built-in dataset
import plotly.express as px

# Loading the 'iris' dataset
iris = px.data.iris()

# Displaying the first 5 rows of the dataset
print(iris.head())

Series and DataFrame: The Core Data Structures

In this code example, we will learn about the two main data structures in Pandas: Series and DataFrame. We will create these data structures from scratch and work with their properties.

Creating a Pandas Series

A Pandas Series is a one-dimensional labeled array that can hold any data type.

import pandas as pd

# Create a simple Pandas Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print(series)

Creating a Pandas DataFrame

A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different data types.

# Create a simple Pandas DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
print(df)

Accessing DataFrame Columns

You can access the columns of a DataFrame using either the column name or the dot notation.

# Access the 'Name' column
name_column = df['Name']
print(name_column)

# Access the 'Age' column using dot notation
age_column = df.Age
print(age_column)

Selecting Rows from a DataFrame

You can select rows from a DataFrame using slicing or by index label.

# Select the first row using slicing
first_row = df[:1]
print(first_row)

# Select the row with index label 2
row_2 = df.loc[2]
print(row_2)

Loading a Built-in Dataset

Pandas provides some built-in datasets that can be used for practice. Here, we will load the 'iris' dataset and display its first 5 rows.

# Load the iris dataset
from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)

# Print the first 5 rows of the iris dataset
print(iris_df.head())

Now, you should have a basic understanding of Pandas Series and DataFrame, and how to create and manipulate them. Practice these concepts in the online playground provided with the course.

Basic Data Manipulation

In this code example, we will explore basic data manipulation techniques using Pandas. We'll create a simple DataFrame, perform some operations, and then update the DataFrame.

# Importing Pandas
import pandas as pd

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)

# Displaying the DataFrame
print("Original DataFrame:")
print(df)

# Adding a new column 'Salary'
df['Salary'] = [70000, 80000, 90000, 100000]

# Updating Age column by adding 2 years
df['Age'] = df['Age'] + 2

# Renaming a column
df.rename(columns={'City': 'Location'}, inplace=True)

# Removing a column
df.drop('Location', axis=1, inplace=True)

# Displaying the updated DataFrame
print("\nUpdated DataFrame:")
print(df)

This code will output the following:

Original DataFrame:
      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles
3    David   40        Chicago

Updated DataFrame:
      Name  Age  Salary
0    Alice   27   70000
1      Bob   32   80000
2  Charlie   37   90000
3    David   42  100000

In this example, we created a DataFrame, added a new column, updated the values in an existing column, renamed a column, and removed a column. These are some of the basic data manipulation techniques available in Pandas.