Chapter 02 - Data Handling with Pandas I

CBSE Class 12 Informatics Practices

2.1 Introduction to Python Libraries

2.1.1 Context Set by NCERT

NCERT opens this chapter by connecting it to what students already know about Python programming.

Up to now, Python programs written by students mainly:

  • Took input from the keyboard
  • Processed it using variables, loops, and conditions
  • Displayed output on the screen

Such programs are suitable for small amounts of data.

However, in real-world situations:

  • Data is very large
  • Data is often stored in files or databases
  • Data needs to be analysed, not just displayed

This creates the need for specialised tools.


2.1.2 What is a Python Library?

NCERT defines a library as:

A collection of pre-written code that can be reused to perform common tasks.

In simpler terms:

  • A library contains ready-made functions and classes
  • These save time and effort
  • The programmer does not need to write everything from scratch

Python becomes powerful because it allows the use of external libraries.


2.1.3 Why Python Libraries are Required for Data Handling

NCERT highlights a key limitation of basic Python:

  • Lists, tuples, and dictionaries are not efficient for:

    • Large datasets
    • Tabular data
    • Statistical analysis
    • Data visualisation

For example:

  • Calculating the average of thousands of values
  • Filtering records based on conditions
  • Working with rows and columns like a table

Doing this using plain Python would be:

  • Slow
  • Complex
  • Error-prone

Therefore, Python libraries are used to:

  • Handle large volumes of data
  • Provide efficient data structures
  • Perform data analysis easily

2.1.4 Important Python Libraries for Data Handling (NCERT Focus)

NCERT introduces three major libraries that are used extensively in data analysis:

  1. NumPy
  2. Pandas
  3. Matplotlib

This chapter focuses primarily on Pandas, while the other two are used for support.


2.1.5 Brief Introduction to NumPy

NCERT mentions NumPy as a foundational library.

Key points explained by NCERT:

  • NumPy stands for Numerical Python
  • It provides support for:

    • Large multi-dimensional arrays
    • Mathematical and statistical operations
  • It is faster than normal Python lists

NumPy arrays form the base on which Pandas is built.

At this stage, NCERT does not go into NumPy details — it is introduced only for context.


2.1.6 Introduction to Pandas (Core of This Chapter)

NCERT now introduces Pandas, the main library of this chapter.

Key points from NCERT:

  • Pandas is an open-source Python library
  • It is used for:

    • Data analysis
    • Data manipulation
    • Data cleaning
  • It is especially suited for tabular data

Tabular data means:

  • Data arranged in rows and columns
  • Similar to:

    • Spreadsheet tables
    • Database tables

2.1.7 What Kind of Data Can Pandas Handle?

NCERT explains that Pandas can work with data from:

  • CSV files
  • Text files
  • Excel sheets
  • Databases

It allows:

  • Reading data from external sources
  • Processing and analysing data
  • Writing processed data back to files

This makes Pandas extremely useful in real-life applications.


2.1.8 Data Structures Provided by Pandas

NCERT introduces two core data structures of Pandas:

  1. Series
  2. DataFrame

These are introduced by name only in this section.

  • A Series is a one-dimensional data structure
  • A DataFrame is a two-dimensional data structure

Detailed discussion of these structures begins in the next sections.


2.1.9 Why Pandas is Important (NCERT Intent)

By the end of this section, NCERT wants students to understand that:

  • Pandas makes data handling:

    • Easier
    • Faster
    • More efficient
  • It bridges the gap between:

    • Raw data
    • Meaningful information
  • It is widely used in:

    • Data science
    • Business analytics
    • Research

This section sets the conceptual foundation for all subsequent sections.


2.1.10 Key Learning Outcome of Section 2.1

After studying this section, a student should be able to:

  • Explain what a Python library is
  • State why Python libraries are required
  • Name important Python libraries for data handling
  • Understand the role of Pandas in data analysis
  • Identify the two main Pandas data structures

Section 2.1 Completed

We have now fully and sequentially covered:

  • Introduction to Python libraries
  • Need for libraries in data handling
  • Role of NumPy and Pandas
  • Purpose of Pandas in this chapter

2.2 Series

2.2.1 What is a Pandas Series? (NCERT Definition Explained)

NCERT defines a Series as:

A one-dimensional labelled array capable of holding data of any type.

Let us break this down carefully.

A Pandas Series:

  • Is one-dimensional → data is stored in a single column
  • Stores a sequence of values
  • Each value is associated with a label, called an index

So, unlike a Python list:

  • Every element in a Series has an explicit index
  • The index helps in identifying and accessing data

2.2.2 Why Do We Need Series When Python Lists Exist?

NCERT implicitly motivates this by pointing out limitations of lists.

Python lists:

  • Do not have labelled indices
  • Do not support vectorised operations efficiently
  • Are not ideal for data analysis

Pandas Series:

  • Support automatic indexing
  • Support fast mathematical operations
  • Work seamlessly with missing values
  • Integrate well with tabular data

Thus, a Series is designed specifically for data analysis, not just storage.


2.2.3 Importing Pandas (Mandatory Step)

Before creating a Series, NCERT shows that we must import the Pandas library.

Standard convention:

import pandas as pd

Explanation:

  • pd is an alias (short name)
  • This is used throughout Pandas programs
  • All Series-related functions are accessed using pd

2.2.4 Creating a Series from a List

This is the first and simplest method shown by NCERT.

Example 1: Creating a Series from a list

import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)

Output

0    10
1    20
2    30
3    40
dtype: int64

Explanation

  • Pandas automatically assigns indices starting from 0
  • Values are stored vertically
  • dtype shows the data type of elements

Important NCERT observation:

If index is not provided, Pandas assigns default integer indices.


2.2.5 Accessing Values of a Series

Accessing using index number

print(s[0])
print(s[2])

Output:

10
30

Explanation:

  • Index works similar to list indexing
  • But index values are labels, not just positions

2.2.6 Creating a Series with Custom Index

NCERT now shows how to assign meaningful labels.

Example 2: Series with custom index

import pandas as pd

data = [85, 90, 78]
index = ['Maths', 'Physics', 'Chemistry']

marks = pd.Series(data, index=index)
print(marks)

Output

Maths        85
Physics      90
Chemistry    78
dtype: int64

Explanation

  • Each value is now linked to a subject name
  • This makes data more readable
  • Index values can be:

    • Strings
    • Numbers
    • Any immutable type

2.2.7 Accessing Series Elements Using Custom Index

print(marks['Maths'])
print(marks['Chemistry'])

Output:

85
78

This is a major advantage of Series over lists.


2.2.8 Creating a Series from a Dictionary

NCERT introduces this as a very important method.

Example 3: Series from dictionary

import pandas as pd

data = {'Apple': 100, 'Banana': 40, 'Orange': 60}
fruits = pd.Series(data)
print(fruits)

Output

Apple     100
Banana     40
Orange     60
dtype: int64

Explanation

  • Dictionary keys become index
  • Dictionary values become data
  • Order follows insertion order of dictionary

This method is widely used in real-world data processing.


2.2.9 Series with Scalar Value

NCERT shows that a Series can also be created using a single value.

Example 4: Scalar value with index

import pandas as pd

s = pd.Series(5, index=['a', 'b', 'c', 'd'])
print(s)

Output

a    5
b    5
c    5
d    5
dtype: int64

Explanation

  • Same value is repeated for all index labels
  • Useful for initializing data

2.2.10 Attributes of a Series

NCERT introduces some basic attributes.

Example

print(marks.index)
print(marks.values)
print(marks.dtype)

Output:

Index(['Maths', 'Physics', 'Chemistry'], dtype='object')
[85 90 78]
int64

Explanation

  • index → labels
  • values → actual data (as NumPy array)
  • dtype → data type

2.2.11 Mathematical Operations on Series

One of the strongest features of Series is vectorised operations.

Example: Adding a number

print(marks + 5)

Output:

Maths        90
Physics      95
Chemistry    83
dtype: int64

Example: Multiplying values

print(marks * 2)

Explanation

  • Operation is applied to each element
  • No loops required
  • Faster and cleaner than Python lists

2.2.12 Operations Between Two Series

NCERT now shows operations between Series objects.

Example

s1 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s2 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

print(s1 + s2)

Output:

a    11
b    22
c    33
dtype: int64

Important NCERT Concept: Index Alignment

If indices do not match, Pandas aligns data by index.

Example

s1 = pd.Series([10, 20], index=['a', 'b'])
s2 = pd.Series([1, 2], index=['b', 'c'])

print(s1 + s2)

Output:

a     NaN
b    21.0
c     NaN
dtype: float64

Explanation:

  • Only index 'b' matches
  • Missing values become NaN
  • This behaviour is unique to Pandas

2.2.13 Handling Missing Values (NaN)

NCERT briefly introduces NaN.

print(s1.isnull())

Output:

a    True
b    False
c    True
dtype: bool

This concept becomes very important in later chapters.


2.2.14 Summary of Section 2.2

By the end of this section, NCERT expects students to be able to:

  • Define a Pandas Series
  • Create Series using:

    • List
    • Dictionary
    • Scalar value
  • Assign and use custom indices

  • Access elements using labels

  • Perform mathematical operations

  • Understand index alignment and NaN


Section 2.2 Completed

We have now fully and sequentially covered:

  • Concept of Series
  • Creation methods
  • Indexing
  • Attributes
  • Operations
  • NCERT-style examples with code

2.3 DataFrame

2.3.1 What is a Pandas DataFrame? (NCERT Definition Explained)

NCERT defines a DataFrame as:

A two-dimensional, size-mutable, heterogeneous tabular data structure with labelled axes (rows and columns).

Let us carefully unpack this definition.

A Pandas DataFrame:

  • Is two-dimensional

    • Data is arranged in rows and columns
  • Is tabular

    • Similar to a spreadsheet or database table
  • Can store different data types in different columns

  • Has:

    • Row labels (index)
    • Column labels

In short:

A DataFrame is like a table in a database or an Excel sheet in Python.


2.3.2 Relationship Between Series and DataFrame

NCERT emphasises an important conceptual link:

  • A Series is one-dimensional
  • A DataFrame is a collection of Series aligned by a common index

Each column in a DataFrame is actually a Series.


2.3.3 Importing Pandas (Reminder)

Every DataFrame operation requires Pandas to be imported.

import pandas as pd

This line is assumed in all examples that follow.


2.3.4 Creating a DataFrame from a Dictionary of Lists

This is the first and most common method shown by NCERT.

Example 1: Dictionary of lists

import pandas as pd

data = {
    'Name': ['Amit', 'Neha', 'Rahul'],
    'Age': [17, 18, 17],
    'Marks': [85, 90, 78]
}

df = pd.DataFrame(data)
print(df)

Output

    Name  Age  Marks
0   Amit   17     85
1   Neha   18     90
2  Rahul   17     78

Explanation

  • Dictionary keys become column names
  • Dictionary values (lists) become column data
  • Pandas automatically assigns row indices starting from 0

2.3.5 Creating a DataFrame with Custom Index

NCERT now shows how to assign meaningful row labels.

Example 2: Custom index

import pandas as pd

data = {
    'Name': ['Amit', 'Neha', 'Rahul'],
    'Age': [17, 18, 17],
    'Marks': [85, 90, 78]
}

df = pd.DataFrame(data, index=['S1', 'S2', 'S3'])
print(df)

Output

     Name  Age  Marks
S1   Amit   17     85
S2   Neha   18     90
S3  Rahul   17     78

Explanation

  • Row indices are now S1, S2, S3
  • This improves readability
  • Index labels can be:

    • Strings
    • Numbers
    • Any immutable type

2.3.6 Creating a DataFrame from a Dictionary of Series

NCERT explicitly highlights this method to reinforce the Series–DataFrame relationship.

Example 3: Dictionary of Series

import pandas as pd

s1 = pd.Series([85, 90, 78], index=['S1', 'S2', 'S3'])
s2 = pd.Series([17, 18, 17], index=['S1', 'S2', 'S3'])

data = {
    'Marks': s1,
    'Age': s2
}

df = pd.DataFrame(data)
print(df)

Output

    Marks  Age
S1     85   17
S2     90   18
S3     78   17

Explanation

  • Each column is created from a Series
  • Pandas aligns data using index labels
  • This reinforces that:

A DataFrame is a collection of aligned Series


2.3.7 Creating a DataFrame from a List of Dictionaries

NCERT introduces this as another practical method.

Example 4: List of dictionaries

import pandas as pd

data = [
    {'Name': 'Amit', 'Age': 17, 'Marks': 85},
    {'Name': 'Neha', 'Age': 18, 'Marks': 90},
    {'Name': 'Rahul', 'Age': 17, 'Marks': 78}
]

df = pd.DataFrame(data)
print(df)

Output

    Name  Age  Marks
0   Amit   17     85
1   Neha   18     90
2  Rahul   17     78

Explanation

  • Each dictionary represents one row
  • Dictionary keys become column names
  • Missing keys (if any) result in NaN

2.3.8 Accessing Columns of a DataFrame

NCERT shows two methods.

Method 1: Using column name as key

print(df['Marks'])

Output:

0    85
1    90
2    78
Name: Marks, dtype: int64

This returns a Series.


Method 2: Using dot operator

print(df.Marks)

Important note (NCERT intent):

  • Dot notation works only if:

    • Column name has no spaces
    • Column name is not a keyword

2.3.9 Accessing Rows of a DataFrame (Basic Idea)

NCERT briefly introduces row access conceptually.

Accessing row by index label using loc

print(df.loc[0])

Output:

Name     Amit
Age        17
Marks      85
Name: 0, dtype: object

This returns a Series representing a row.

(Detailed indexing is handled later in the syllabus.)


2.3.10 Accessing Individual Elements

print(df.loc[1, 'Name'])
print(df.loc[2, 'Marks'])

Output:

Neha
78

2.3.11 DataFrame Attributes (NCERT Focus)

NCERT introduces commonly used attributes.

Example

print(df.shape)
print(df.size)
print(df.columns)
print(df.index)
print(df.dtypes)

Output (example)

(3, 3)
9
Index(['Name', 'Age', 'Marks'], dtype='object')
RangeIndex(start=0, stop=3, step=1)
Name      object
Age        int64
Marks      int64
dtype: object

Explanation

  • shape → (rows, columns)
  • size → total elements
  • columns → column labels
  • index → row labels
  • dtypes → data type of each column

2.3.12 Adding a New Column to a DataFrame

NCERT shows column creation using assignment.

Example

df['Grade'] = ['B', 'A', 'B']
print(df)

Output:

    Name  Age  Marks Grade
0   Amit   17     85     B
1   Neha   18     90     A
2  Rahul   17     78     B

2.3.13 Deleting a Column from a DataFrame

df.drop('Age', axis=1, inplace=True)
print(df)

Explanation:

  • axis=1 → column
  • inplace=True → modifies original DataFrame

2.3.14 Mathematical Operations on DataFrame Columns

Operations are vectorised, just like Series.

df['Marks'] = df['Marks'] + 5
print(df)

Each value in the column is updated automatically.


2.3.15 Key Learning Outcomes of Section 2.3

NCERT expects students to be able to:

  • Define a DataFrame
  • Create DataFrames using:

    • Dictionary of lists
    • Dictionary of Series
    • List of dictionaries
  • Access columns and rows

  • Use DataFrame attributes

  • Add and delete columns

  • Perform operations on columns


Section 2.3 Completed

We have now fully and sequentially covered:

  • Concept of DataFrame
  • Creation methods
  • Accessing data
  • Attributes
  • Modification operations

2.4 Importing and Exporting Data between CSV Files and DataFrames

2.4.1 Why Importing and Exporting Data is Required (NCERT Context)

Up to this point, we have created:

  • Series from lists and dictionaries
  • DataFrames from Python data structures

However, NCERT now highlights an important reality:

In real-life applications, data is rarely entered manually in Python programs.

Instead, data is usually:

  • Stored in files
  • Shared between systems
  • Generated by other applications

One of the most common file formats used for storing tabular data is the CSV file.


2.4.2 What is a CSV File?

CSV stands for Comma Separated Values.

NCERT explains that:

  • A CSV file stores data in text format
  • Each line represents a row
  • Values in a row are separated by commas

Example: student.csv

RollNo,Name,Marks
1,Amit,85
2,Neha,90
3,Rahul,78

Here:

  • First line is the header row
  • Remaining lines contain data records

CSV files are popular because:

  • They are simple
  • They are platform-independent
  • They can be opened in spreadsheet software and text editors

2.4.3 Reading a CSV File into a DataFrame

NCERT introduces the function:

pd.read_csv()

This function:

  • Reads data from a CSV file
  • Converts it into a Pandas DataFrame

Example 1: Reading a CSV file

Assume the file student.csv exists in the same directory as the Python program.

import pandas as pd

df = pd.read_csv("student.csv")
print(df)

Output

   RollNo   Name  Marks
0       1   Amit     85
1       2   Neha     90
2       3  Rahul     78

Explanation

  • Pandas automatically:

    • Detects the delimiter (comma)
    • Reads the first row as column names
    • Assigns default row indices starting from 0
  • The result is a DataFrame


2.4.4 Reading a CSV File with Custom Index

NCERT explains that sometimes:

  • One column represents a unique identifier
  • Such a column should be used as row index

Example: Using RollNo as index

import pandas as pd

df = pd.read_csv("student.csv", index_col="RollNo")
print(df)

Output

        Name  Marks
RollNo               
1        Amit     85
2        Neha     90
3       Rahul     78

Explanation

  • index_col="RollNo" tells Pandas:

    • Use the RollNo column as index
  • This makes row identification clearer and meaningful


2.4.5 Reading Only Selected Columns from a CSV File

NCERT mentions that sometimes:

  • We do not need all columns
  • Reading unnecessary columns wastes memory

Example: Reading only Name and Marks

import pandas as pd

df = pd.read_csv("student.csv", usecols=["Name", "Marks"])
print(df)

Output

    Name  Marks
0   Amit     85
1   Neha     90
2  Rahul     78

Explanation

  • usecols limits the columns loaded
  • This improves performance for large datasets

2.4.6 Handling Missing Data While Reading CSV (NCERT Idea)

NCERT briefly introduces the idea that:

  • CSV files may have missing values

Example CSV:

RollNo,Name,Marks
1,Amit,85
2,Neha,
3,Rahul,78

When read:

df = pd.read_csv("student.csv")
print(df)

Output:

   RollNo   Name  Marks
0       1   Amit   85.0
1       2   Neha    NaN
2       3  Rahul   78.0

Explanation:

  • Missing numeric values are represented as NaN
  • Pandas automatically adjusts data type (Marks becomes float)

2.4.7 Writing a DataFrame to a CSV File

NCERT now explains the reverse process:

Exporting data from a DataFrame into a CSV file.

The function used is:

DataFrame.to_csv()

Example 2: Writing DataFrame to CSV

import pandas as pd

data = {
    'Name': ['Amit', 'Neha', 'Rahul'],
    'Marks': [85, 90, 78]
}

df = pd.DataFrame(data)
df.to_csv("output.csv")

This creates a file output.csv with contents:

,Name,Marks
0,Amit,85
1,Neha,90
2,Rahul,78

Explanation

  • By default:

    • Row index is also written to the file
    • Column headers are included

2.4.8 Writing CSV Without Index

Often, we do not want the index column.

Example

df.to_csv("output.csv", index=False)

Resulting file:

Name,Marks
Amit,85
Neha,90
Rahul,78

This is the most commonly used format.


2.4.9 Writing CSV Without Header

Sometimes, header row is not required.

Example

df.to_csv("output.csv", index=False, header=False)

Resulting file:

Amit,85
Neha,90
Rahul,78

2.4.10 Reading a CSV File Without Header

NCERT explains that some CSV files:

  • Do not contain column names

Example CSV

1,Amit,85
2,Neha,90
3,Rahul,78

Reading such a file

df = pd.read_csv("student.csv", header=None)
print(df)

Output:

   0      1   2
0  1   Amit  85
1  2   Neha  90
2  3  Rahul  78

Assigning Column Names Manually

df.columns = ['RollNo', 'Name', 'Marks']
print(df)

2.4.11 Changing Delimiter (NCERT Concept)

Although comma is standard, NCERT notes that:

  • Sometimes files use other separators (like ;)

Example

df = pd.read_csv("student.csv", sep=';')

2.4.12 Key Learning Outcomes of Section 2.4

By the end of this section, NCERT expects students to be able to:

  • Explain what a CSV file is
  • Import CSV data into a DataFrame
  • Use read_csv() with different parameters
  • Export DataFrame data to CSV
  • Control index and header while exporting
  • Handle missing values in CSV files

Section 2.4 Completed

We have now fully and sequentially covered:

  • Concept of CSV files
  • Reading CSV into DataFrame
  • Index handling
  • Column selection
  • Writing DataFrame to CSV

Good. We now proceed to the final section of Chapter 02, again strictly following NCERT, conceptually, and clearly, without shortcuts or extra tooling.


Chapter 02: Data Handling with Pandas I


2.5 Pandas Series vs NumPy ndarray


2.5.1 Why NCERT Compares Pandas Series and NumPy ndarray

NCERT includes this section to help students understand why Pandas exists even though NumPy is already available.

By this point in the chapter, students have learnt that:

  • Pandas Series is used for one-dimensional data
  • NumPy also provides one-dimensional arrays (ndarray)

This naturally raises a question:

If NumPy arrays already exist, why do we need Pandas Series?

Section 2.5 answers this question by highlighting the differences.


2.5.2 What is a NumPy ndarray? (Brief Recap)

NCERT assumes that students have basic awareness of NumPy.

A NumPy ndarray:

  • Is a homogeneous array (all elements are of the same data type)
  • Uses integer-based indexing
  • Is designed for numerical computation
  • Does not support labelled indices

Example:

import numpy as np

arr = np.array([10, 20, 30, 40])
print(arr)

Output:

[10 20 30 40]

Accessing elements:

print(arr[0])
print(arr[2])

Output:

10
30

Here, access is based only on position, not labels.


2.5.3 What is a Pandas Series? (Contextual Reminder)

A Pandas Series:

  • Is one-dimensional
  • Can store heterogeneous data
  • Uses label-based indexing
  • Is built on top of NumPy

Example:

import pandas as pd

s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s)

Output:

a    10
b    20
c    30
d    40
dtype: int64

Accessing elements:

print(s['a'])
print(s['c'])

Output:

10
30

2.5.4 Key Difference 1: Indexing

This is the most important difference highlighted by NCERT.

NumPy ndarray

  • Uses implicit integer index
  • Index values are always 0, 1, 2, ...

    arr = np.array([100, 200, 300])
    print(arr[1])
    

Output:

200

Pandas Series

  • Uses explicit labels
  • Index values can be:

    • Numbers
    • Strings
    • Any immutable type

      s = pd.Series([100, 200, 300], index=['x', 'y', 'z'])
      print(s['y'])
      

Output:

200

NCERT stresses that labelled indexing makes Series more suitable for data analysis.


2.5.5 Key Difference 2: Handling of Missing Data

NCERT explicitly points out this distinction.

NumPy ndarray

  • Does not handle missing data naturally
  • Missing values must be manually managed
  • Numeric arrays cannot easily store missing values

Example:

arr = np.array([10, 20, None])
print(arr)

Result:

  • Data type becomes object
  • Numerical operations become inefficient

Pandas Series

  • Has built-in support for missing values (NaN)
  • Automatically adjusts data type when required

Example:

s = pd.Series([10, 20, None])
print(s)

Output:

0    10.0
1    20.0
2     NaN
dtype: float64

NCERT highlights that:

Handling missing data is a common requirement in real-world datasets.


2.5.6 Key Difference 3: Data Alignment

This is a very important conceptual advantage of Pandas Series.

NumPy ndarray

  • Performs operations position-wise
  • Does not consider labels

    arr1 = np.array([10, 20, 30])
    arr2 = np.array([1, 2, 3])
    
    print(arr1 + arr2)
    

Output:

[11 22 33]

Pandas Series

  • Performs operations based on index labels
  • Aligns data automatically

    s1 = pd.Series([10, 20], index=['a', 'b'])
    s2 = pd.Series([1, 2], index=['b', 'c'])
    
    print(s1 + s2)
    

Output:

a     NaN
b    21.0
c     NaN
dtype: float64

Explanation:

  • Only matching index 'b' is added
  • Non-matching indices produce NaN

NCERT emphasises that automatic alignment is a powerful feature for data analysis.


2.5.7 Key Difference 4: Data Type Flexibility

NumPy ndarray

  • All elements must be of the same data type
  • Mixed data types are discouraged

    arr = np.array([10, "A", 20])
    print(arr)
    

Result:

  • Entire array becomes type object

Pandas Series

  • Can hold mixed data types
  • Each element is treated independently

    s = pd.Series([10, "A", 20.5])
    print(s)
    

Output:

0      10
1       A
2    20.5
dtype: object

This makes Series more flexible for real datasets.


2.5.8 Key Difference 5: Metadata (Labels)

NCERT notes that:

  • Pandas Series carries metadata in the form of index labels
  • NumPy ndarray carries only raw values

This makes Series:

  • Self-describing
  • Easier to interpret
  • More suitable for tabular and labelled data

2.5.9 Summary Table (NCERT-Style Comparison)

Feature Pandas Series NumPy ndarray
Dimension One-dimensional One-dimensional
Indexing Label-based Integer-based
Missing values Supported (NaN) Limited support
Data alignment Automatic Not available
Data type Can be mixed Homogeneous
Purpose Data analysis Numerical computation

This table captures the essence of Section 2.5.


2.5.10 NCERT Learning Outcome of Section 2.5

By the end of this section, NCERT expects students to be able to:

  • Distinguish between Pandas Series and NumPy ndarray
  • Explain why Pandas Series is preferred for data analysis
  • Identify situations where NumPy arrays are more suitable
  • Understand the advantages of labelled data

Section 2.5 Completed

Chapter 02 Completed Fully

We have now fully and sequentially completed:

  • 2.1 Introduction to Python Libraries
  • 2.2 Series
  • 2.3 DataFrame
  • 2.4 CSV Import/Export
  • 2.5 Series vs NumPy ndarray

All sections were covered:

  • In NCERT order
  • With explanations and code
  • One section at a time