Data Science Focus: NumPy and Pandas (Overview)

1 min read

This chapter introduces the two most essential external libraries for Python in the data science ecosystem. NumPy provides efficient numerical arrays, and Pandas offers powerful data structures for data manipulation and analysis. This is a high-level overview to prepare you for specialized data roles.

The strength of Python in data science comes from its rich third-party libraries. NumPy (Numerical Python) and Pandas are foundational for handling large datasets efficiently.

1. NumPy (Numerical Python)

NumPy provides the central data structure in scientific computing: the ndarray (N-dimensional array). It is a fast, memory-efficient container for numerical data.

A. The `ndarray`

Unlike Python lists, NumPy arrays can hold data of a single, uniform type (e.g., all integers or all floats). This uniformity allows Python to perform operations on the entire array much faster than iterating over a list.

Installation: pip install numpy
Import Convention:Pythonimport numpy as np # Standard alias

B. Vectorization (Optimized Operations)

NumPy allows you to perform operations on entire arrays without writing explicit loops, a concept called vectorization. This delegates the heavy lifting to highly optimized C code under the hood.

import numpy as np

# Creating an array
arr = np.array([1, 2, 3, 4])

# Vectorized operation: apply addition to every element simultaneously
result = arr + 10 

print(result) 
# Output: [11 12 13 14] 

# Operations between two arrays (element-wise)
arr2 = np.array([5, 5, 5, 5])
multiplied = arr * arr2

print(multiplied) 
# Output: [ 5 10 15 20]

C. Multidimensional Arrays

NumPy easily handles multi-dimensional arrays, which are crucial for linear algebra and machine learning.

# Create a 2x3 array (2 rows, 3 columns)
matrix = np.array([
    [1, 2, 3], 
    [4, 5, 6]
])

print(matrix.shape) # Output: (2, 3)

2. Pandas (Data Analysis Library)

Pandas builds on NumPy and provides highly intuitive, labeled data structures designed for manipulating, cleaning, and analyzing tabular data (like spreadsheets or SQL tables).

A. Core Data Structures

Pandas introduces two primary structures:

Series: A one-dimensional array with explicit labels (an index). Think of it as a single column in a spreadsheet.
DataFrame (Crucial): A two-dimensional table with both row and column labels (indices). This is the most common object for data analysis.

Installation: pip install pandas
Import Convention:Pythonimport pandas as pd # Standard alias

B. Creating and Inspecting a DataFrame

DataFrames are often created by loading files (CSV, Excel, SQL), but they can also be created from dictionaries or NumPy arrays.

import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['NY', 'LA', 'SF']
}
df = pd.DataFrame(data)

# Accessing a column (Series)
ages = df['Age']

# Viewing the first few rows (essential for data inspection)
print(df.head())

C. Data Manipulation (Cleaning and Filtering)

Pandas excels at letting you filter, aggregate, and reshape data using simple, expressive syntax.

# Filtering the DataFrame
# Selects all rows where the 'Age' column value is greater than 25
older_than_25 = df[df['Age'] > 25]

print("\nOlder than 25:")
print(older_than_25)
# Output:
#     Name  Age City
# 1    Bob   30   LA

What are your Feelings

Happy
Normal
Sad

Python Basics

Python Intermediate

Python Advance

Data Science Focus: NumPy and Pandas (Overview)

1. NumPy (Numerical Python)

A. The `ndarray`

B. Vectorization (Optimized Operations)

C. Multidimensional Arrays

2. Pandas (Data Analysis Library)

A. Core Data Structures

B. Creating and Inspecting a DataFrame

C. Data Manipulation (Cleaning and Filtering)

Leave a Reply Cancel reply

1. NumPy (Numerical Python)

A. The ndarray

B. Vectorization (Optimized Operations)

C. Multidimensional Arrays

2. Pandas (Data Analysis Library)

A. Core Data Structures

B. Creating and Inspecting a DataFrame

C. Data Manipulation (Cleaning and Filtering)

Share This Article :

Leave a Reply Cancel reply

A. The `ndarray`