Lesson
Introduction to Pandas
Learn Pandas fundamentals in SQLPad's Python Pandas Mastery course with practical examples and guided lessons.
Welcome to the first lesson of the Python Pandas Mastery: An Interactive and Practical Guide to Data Analysis course! In this lesson, we'll introduce you to Pandas, a powerful library for data manipulation and analysis.
Pandas is an open-source library that provides high-performance, easy-to-use data structures and data analysis tools for Python. It is built on top of the NumPy library and provides two key data structures: Series and DataFrame. Pandas is designed to work with a wide variety of data sources and formats, making it an essential tool for any data analyst or data scientist working with Python.
In this lesson, we'll give you a brief overview of Pandas, explain its key features and benefits, and provide you with some examples to demonstrate its capabilities. We'll also discuss how the Pandas library fits in with the other lessons in this course. Get ready to dive into the world of Python Pandas!
For this course, we've prepared online Python Code Editor, so you can execute almost all your python code in the browser without installing anything. simply click the play button to execute the code.
Welcome to the first lesson of the Python Pandas Mastery: An Interactive and Practical Guide to Data Analysis course! In this lesson, we'll introduce you to Pandas, a powerful library for data manipulation and analysis.
Pandas is an open-source library that provides high-performance, easy-to-use data structures and data analysis tools for Python. It is built on top of the NumPy library and provides two key data structures: Series and DataFrame. Pandas is designed to work with a wide variety of data sources and formats, making it an essential tool for any data analyst or data scientist working with Python.
In this lesson, we'll give you a brief overview of Pandas, explain its key features and benefits, and provide you with some examples to demonstrate its capabilities. We'll also discuss how the Pandas library fits in with the other lessons in this course. Get ready to dive into the world of Python Pandas!
For this course, we've prepared online Python Code Editor, so you can execute almost all your python code in the browser without installing anything. simply click the play button to execute the code.
Key Features of Pandas
# Importing pandas library
import pandas as pd
# Creating a simple data frame manually
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Displaying the data frame
print(df)
Code Example 2: Loading a built-in dataset
# Importing seaborn library to load built-in dataset
import plotly.express as px
# Loading the 'iris' dataset
iris = px.data.iris()
# Displaying the first 5 rows of the dataset
print(iris.head())
Series and DataFrame: The Core Data Structures
In this code example, we will learn about the two main data structures in Pandas: Series and DataFrame. We will create these data structures from scratch and work with their properties.
Creating a Pandas Series
A Pandas Series is a one-dimensional labeled array that can hold any data type.
import pandas as pd
# Create a simple Pandas Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print(series)
Creating a Pandas DataFrame
A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different data types.
# Create a simple Pandas DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
print(df)
Accessing DataFrame Columns
You can access the columns of a DataFrame using either the column name or the dot notation.
# Access the 'Name' column
name_column = df['Name']
print(name_column)
# Access the 'Age' column using dot notation
age_column = df.Age
print(age_column)
Selecting Rows from a DataFrame
You can select rows from a DataFrame using slicing or by index label.
# Select the first row using slicing
first_row = df[:1]
print(first_row)
# Select the row with index label 2
row_2 = df.loc[2]
print(row_2)
Loading a Built-in Dataset
Pandas provides some built-in datasets that can be used for practice. Here, we will load the 'iris' dataset and display its first 5 rows.
# Load the iris dataset
from sklearn import datasets
iris = datasets.load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
# Print the first 5 rows of the iris dataset
print(iris_df.head())
Now, you should have a basic understanding of Pandas Series and DataFrame, and how to create and manipulate them. Practice these concepts in the online playground provided with the course.
Basic Data Manipulation
In this code example, we will explore basic data manipulation techniques using Pandas. We'll create a simple DataFrame, perform some operations, and then update the DataFrame.
# Importing Pandas
import pandas as pd
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
# Displaying the DataFrame
print("Original DataFrame:")
print(df)
# Adding a new column 'Salary'
df['Salary'] = [70000, 80000, 90000, 100000]
# Updating Age column by adding 2 years
df['Age'] = df['Age'] + 2
# Renaming a column
df.rename(columns={'City': 'Location'}, inplace=True)
# Removing a column
df.drop('Location', axis=1, inplace=True)
# Displaying the updated DataFrame
print("\nUpdated DataFrame:")
print(df)
This code will output the following:
Original DataFrame:
Name Age City
0 Alice 25 New York
1 Bob 30 San Francisco
2 Charlie 35 Los Angeles
3 David 40 Chicago
Updated DataFrame:
Name Age Salary
0 Alice 27 70000
1 Bob 32 80000
2 Charlie 37 90000
3 David 42 100000
In this example, we created a DataFrame, added a new column, updated the values in an existing column, renamed a column, and removed a column. These are some of the basic data manipulation techniques available in Pandas.