06 Introduction to NumPy
6.1 Introduction
In earlier chapters, we have used Python variables and basic data types such as integers, floating-point numbers, and strings to store and process data. While these are sufficient for small programs, they become inefficient and inconvenient when working with large amounts of numerical data, such as marks of thousands of students, temperature readings, sales figures, or scientific measurements.
To efficiently handle such large numerical datasets, Python provides powerful external libraries. One of the most important and widely used libraries for numerical computation is NumPy.
NumPy stands for Numerical Python. It is a Python library specifically designed to work with arrays and numerical data. The NCERT textbook introduces NumPy to help students understand how large datasets can be stored, processed, and analysed efficiently.
Why NumPy Is Needed
Using normal Python lists for numerical computations has certain limitations:
- Python lists can store different data types, which increases memory usage
- Operations on large lists are slow
- Mathematical operations require explicit loops
- Code becomes lengthy and difficult to manage
NumPy solves these problems by providing:
- Fast execution
- Efficient memory usage
- Built-in mathematical operations
- Powerful array-handling capabilities
Features of NumPy (NCERT-Aligned)
The NCERT textbook highlights the following key features of NumPy:
- NumPy provides a powerful array object
- NumPy arrays store homogeneous data (same data type)
- NumPy supports vectorised operations
- NumPy arrays use less memory than Python lists
- NumPy allows easy mathematical and statistical operations
These features make NumPy ideal for data analysis, scientific computing, and numerical processing.
Installing and Importing NumPy
Before using NumPy, it must be installed on the system. In most Python distributions (such as Anaconda), NumPy is already installed.
To use NumPy in a Python program, it must be imported.
The standard way to import NumPy is:
import numpy as np
Here:
numpyis the name of the librarynpis an alias (short name)- Using
npmakes programs shorter and easier to read
π NCERT Observation Using aliases is a common programming practice and does not change the functionality of the library.
First NumPy Program
Let us begin with a very simple program to check whether NumPy is working correctly.
import numpy as np
print(np.__version__)
Explanation:
__version__is an attribute that stores the installed NumPy version- Printing it confirms that NumPy is available
Comparing Python Lists and NumPy Arrays (Conceptual)
To understand the importance of NumPy, consider a simple example of adding two lists.
Using Python lists:
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i])
print(result)
Using NumPy (preview only; details later):
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)
The NumPy version is:
- Shorter
- Clearer
- Faster
- Easier to read
This demonstrates why NumPy is preferred for numerical operations.
Areas Where NumPy Is Used
NumPy is widely used in:
- Data analysis
- Machine learning
- Artificial intelligence
- Scientific simulations
- Engineering calculations
- Statistical analysis
In Informatics Practices, NumPy is introduced as a foundation tool for data handling, preparing students for advanced topics.
Important Points to Remember (NCERT-Oriented)
- NumPy is a library, not a programming language
- NumPy focuses on numerical data
- NumPy arrays are faster than Python lists
- NumPy operations reduce the need for loops
6.2 Array
Before understanding NumPy arrays, it is important to understand the basic concept of an array itself. The idea of arrays does not belong only to NumPy; it is a fundamental data concept used in many programming languages to handle multiple values efficiently.
An array is a collection of similar data elements stored together under a single name, where each element can be accessed using its position (index). All elements in an array are of the same data type, which makes arrays efficient in terms of memory and processing.
The NCERT textbook introduces arrays to help students understand how large collections of numerical data can be stored and processed systematically.
Why Arrays Are Needed
Consider a situation where you want to store marks of 100 students.
Without arrays:
- You would need 100 separate variables
- Code becomes lengthy and unmanageable
- Processing data becomes difficult
With arrays:
- All values are stored under one variable
- Data can be accessed using index numbers
- Operations on data become simpler and faster
This makes arrays extremely useful when dealing with bulk data.
Characteristics of an Array
The NCERT textbook highlights the following important characteristics of arrays:
- All elements in an array are of the same data type
- Each element is stored at a contiguous memory location
- Each element can be accessed using an index
- Indexing usually starts from 0
These characteristics make arrays suitable for numerical and scientific computations.
Real-Life Analogy of an Array
An array can be compared to:
- A row of lockers
- A list of roll numbers
- Marks stored in a register column
Each locker or entry has:
- A fixed position
- A stored value
- A unique index
Array Indexing Concept
In arrays, each element is identified by its index position.
For example, in an array of five elements:
| Index | Value |
|---|---|
| 0 | 10 |
| 1 | 20 |
| 2 | 30 |
| 3 | 40 |
| 4 | 50 |
Here:
- The first element is at index
0 - The last element is at index
4
π NCERT Exam Point Array indexing starts from 0, not 1.
One-Dimensional and Multi-Dimensional Arrays
Arrays can be classified based on their structure:
One-Dimensional Array
- Contains elements in a single row
- Accessed using one index
Example conceptually:
[10, 20, 30, 40]
Two-Dimensional Array
- Contains rows and columns
- Accessed using two indices (row and column)
Example conceptually:
[ [1, 2, 3],
[4, 5, 6] ]
The NCERT textbook focuses mainly on one-dimensional arrays initially and then extends the concept to higher dimensions using NumPy.
Difference Between Python List and Array (Conceptual)
Although Python lists and arrays may appear similar, they are conceptually different.
| Python List | Array |
|---|---|
| Can store different data types | Stores same data type |
| Not memory efficient | Memory efficient |
| Slower numerical operations | Faster numerical operations |
| Flexible but slower | Structured and faster |
This difference becomes especially important when working with large datasets.
Limitations of Arrays Without NumPy
In basic Python, there is no built-in array data structure optimised for numerical computation. While lists can be used, they have limitations:
- Slower performance
- Higher memory usage
- No direct mathematical operations
These limitations lead to the introduction of NumPy arrays, which combine the concept of arrays with Pythonβs simplicity.
Key Observations (NCERT-Oriented)
- Arrays store multiple values under one name
- All elements in an array are of the same type
- Arrays support indexed access
- Arrays are essential for handling large numerical datasets
Understanding this basic concept is crucial before learning NumPy arrays, which build upon these ideas.
6.3 NumPy Array
After understanding the basic idea of an array, we now move to the most important concept of this chapter: the NumPy array. The NumPy array is the core data structure provided by the NumPy library and is the reason why NumPy is so powerful and widely used.
A NumPy array is a special object provided by NumPy that stores homogeneous data (data of the same type) in a compact and efficient form. It is designed specifically for numerical computation and large datasets.
In NCERT, the NumPy array is introduced as an improved and efficient alternative to Python lists when working with numbers.
What Makes NumPy Arrays Different
Although NumPy arrays and Python lists may look similar when printed, internally they are very different.
A NumPy array:
- Stores elements of the same data type
- Uses less memory
- Performs operations faster
- Supports vectorised operations (operations on entire arrays at once)
These features make NumPy arrays ideal for mathematical, statistical, and scientific applications.
Creating a NumPy Array
To create a NumPy array, we use the array() function provided by NumPy.
Syntax
numpy.array(object)
In practice, since NumPy is imported as np, we write:
np.array(object)
Here, object is usually a list or a tuple containing elements.
Example 1: Creating a One-Dimensional NumPy Array
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr)
Output:
[10 20 30 40 50]
Explanation:
- A Python list is passed to
np.array() - NumPy converts it into an array
- Elements are stored as a single block of memory
Checking the Type of NumPy Array
print(type(arr))
Output:
<class 'numpy.ndarray'>
This confirms that arr is a NumPy array (ndarray stands for N-dimensional array).
Homogeneous Nature of NumPy Arrays
One of the most important characteristics of NumPy arrays is that all elements must be of the same data type.
Example 2: Array with Same Data Type
arr = np.array([1, 2, 3, 4])
print(arr)
print(arr.dtype)
Output:
[1 2 3 4]
int32 (or int64 depending on system)
Example 3: Mixed Data Types in Input
arr = np.array([1, 2.5, 3])
print(arr)
print(arr.dtype)
Output:
[1. 2.5 3. ]
float64
Explanation:
- NumPy automatically converts all elements to a common data type
- Integers are converted to floats to maintain homogeneity
π NCERT Observation NumPy performs automatic type promotion to maintain uniform data type in arrays.
NumPy Array vs Python List (Practical Comparison)
Example 4: Addition Using Python Lists
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i])
print(result)
Example 5: Addition Using NumPy Arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)
Output:
[5 7 9]
Explanation:
- NumPy performs element-wise addition automatically
- No loop is required
- Code is shorter, cleaner, and faster
This feature is called vectorisation.
Vectorised Operations
Vectorisation means performing operations on entire arrays at once instead of using loops.
Example 6: Scalar Operations on NumPy Array
arr = np.array([10, 20, 30])
print(arr + 5)
print(arr * 2)
Output:
[15 25 35]
[20 40 60]
Each element of the array is automatically updated.
π NCERT Exam Point Vectorised operations eliminate the need for explicit loops.
Creating Arrays Using dtype
NumPy allows the programmer to specify the data type explicitly using the dtype parameter.
Example 7: Specifying Data Type
arr = np.array([1, 2, 3, 4], dtype=float)
print(arr)
print(arr.dtype)
Output:
[1. 2. 3. 4.]
float64
This is useful when precision or memory control is required.
Creating Multi-Dimensional NumPy Arrays
NumPy arrays can also be multi-dimensional.
Example 8: Two-Dimensional NumPy Array
arr2d = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr2d)
Output:
[[1 2 3]
[4 5 6]]
Here:
- Each inner list represents a row
- The array has 2 rows and 3 columns
Multi-dimensional arrays are used extensively in data science and matrix operations.
Important Attributes of NumPy Array
NumPy arrays have useful attributes:
| Attribute | Meaning |
|---|---|
ndim |
Number of dimensions |
shape |
Rows and columns |
size |
Total number of elements |
dtype |
Data type |
Example 9: Using Array Attributes
arr = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr.ndim)
print(arr.shape)
print(arr.size)
print(arr.dtype)
Key Points to Remember
- NumPy arrays are faster than Python lists
- All elements are of the same data type
- Vectorised operations are supported
- Arrays can be one-dimensional or multi-dimensional
- NumPy arrays are the foundation for all further operations
6.4 Indexing and Slicing
After creating NumPy arrays, the next important task is to access and manipulate individual elements or groups of elements stored inside the array. NumPy provides powerful mechanisms called indexing and slicing to perform these operations efficiently.
The NCERT textbook introduces indexing and slicing to help students understand how data inside arrays can be retrieved, modified, and analysed without using loops.
6.4.1 Indexing in NumPy Arrays
Indexing refers to accessing a single element of an array using its position (index).
Just like Python lists, NumPy arrays use zero-based indexing, which means the first element is at index 0.
Indexing in One-Dimensional NumPy Arrays
Example 1: Accessing Elements Using Index
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[0])
print(arr[2])
print(arr[4])
Output:
10
30
50
Explanation:
arr[0]accesses the first elementarr[2]accesses the third elementarr[4]accesses the fifth element
Negative Indexing
NumPy also supports negative indexing, which allows access to elements starting from the end of the array.
| Index | Meaning |
|---|---|
-1 |
Last element |
-2 |
Second last element |
Example 2: Negative Indexing
print(arr[-1])
print(arr[-3])
Output:
50
30
Explanation:
-1refers to the last element-3refers to the third element from the end
π NCERT Exam Point Negative indexing is useful when the size of the array is unknown.
6.4.2 Indexing in Multi-Dimensional Arrays
In multi-dimensional arrays, elements are accessed using multiple indices, separated by commas.
Example 3: Indexing in Two-Dimensional Array
arr2d = np.array([[1, 2, 3],
[4, 5, 6]])
print(arr2d[0, 0])
print(arr2d[0, 2])
print(arr2d[1, 1])
Output:
1
3
5
Explanation:
- First index represents the row
- Second index represents the column
arr2d[1, 1]accesses element in second row and second column
Accessing Entire Rows or Columns
NumPy allows access to entire rows or columns using slicing-like notation.
print(arr2d[0]) # First row
print(arr2d[:, 1]) # Second column
6.4.3 Slicing in NumPy Arrays
Slicing refers to extracting a sub-array (a portion of an array). Slicing allows accessing multiple elements at once.
Syntax of Slicing
array[start : stop : step]
startβ starting index (inclusive)stopβ ending index (exclusive)stepβ gap between elements (optional)
Example 4: Simple Slicing
arr = np.array([10, 20, 30, 40, 50, 60])
print(arr[1:4])
Output:
[20 30 40]
Explanation:
- Starts from index 1
- Stops before index 4
- Extracts elements at index 1, 2, and 3
Slicing with Step Value
print(arr[0:6:2])
Output:
[10 30 50]
Explanation:
- Step value
2skips every alternate element
Slicing from Beginning or Till End
print(arr[:3])
print(arr[3:])
Output:
[10 20 30]
[40 50 60]
Negative Slicing
print(arr[-4:-1])
Output:
[30 40 50]
6.4.4 Slicing in Two-Dimensional Arrays
Slicing becomes even more powerful when used with multi-dimensional arrays.
Example 5: Slicing Rows and Columns
arr2d = np.array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
print(arr2d[0:2, 1:3])
Output:
[[20 30]
[50 60]]
Explanation:
0:2selects first two rows1:3selects second and third columns
Extracting a Single Column
print(arr2d[:, 0])
Output:
[10 40 70]
Extracting a Single Row
print(arr2d[2, :])
Output:
[70 80 90]
6.4.5 Modifying Array Elements Using Indexing
Indexing can also be used to modify array values.
Example 6: Modifying Elements
arr = np.array([10, 20, 30, 40])
arr[1] = 99
print(arr)
Output:
[10 99 30 40]
Modifying Multiple Elements Using Slicing
arr[1:3] = [55, 66]
print(arr)
Output:
[10 55 66 40]
Important Observations (NCERT-Oriented)
- NumPy uses zero-based indexing
- Negative indexing accesses elements from the end
- Slicing returns a sub-array
- Multi-dimensional slicing uses row and column indices
- Indexing can be used to modify values
Common Errors to Avoid
- Index out of range
- Incorrect slicing bounds
- Confusing rows with columns in 2D arrays
- Forgetting comma in multi-dimensional indexing
6.5 Operations on Arrays
One of the greatest strengths of NumPy is its ability to perform operations directly on arrays without using explicit loops. These operations are fast, concise, and easy to read. The NCERT textbook highlights array operations as a key reason why NumPy is preferred over Python lists for numerical computation.
Operations on NumPy arrays are usually element-wise, meaning that the operation is applied to each corresponding element of the array.
6.5.1 Arithmetic Operations on NumPy Arrays
NumPy allows all common arithmetic operations to be performed directly on arrays.
Example 1: Element-wise Addition
import numpy as np
a = np.array([10, 20, 30])
b = np.array([1, 2, 3])
print(a + b)
Output:
[11 22 33]
Explanation:
- Each element of
ais added to the corresponding element ofb - No loop is required
- Operation is vectorised
Example 2: Element-wise Subtraction
print(a - b)
Output:
[ 9 18 27]
Example 3: Element-wise Multiplication
print(a * b)
Output:
[10 40 90]
Example 4: Element-wise Division
print(a / b)
Output:
[10. 10. 10.]
π NCERT Exam Point Arithmetic operations on NumPy arrays are element-wise by default.
6.5.2 Scalar Operations on NumPy Arrays
NumPy allows arithmetic operations between an array and a single number (scalar).
Example 5: Scalar Addition and Multiplication
arr = np.array([5, 10, 15])
print(arr + 2)
print(arr * 3)
Output:
[ 7 12 17]
[15 30 45]
Explanation:
- The scalar value is applied to every element
- This eliminates the need for loops
6.5.3 Comparison Operations on NumPy Arrays
Comparison operations return Boolean arrays, where each element indicates whether the condition is satisfied.
Example 6: Comparison Operations
arr = np.array([10, 25, 40, 55])
print(arr > 30)
Output:
[False False True True]
Using Comparison Results
print(arr[arr > 30])
Output:
[40 55]
Explanation:
- Boolean indexing is used
- Only elements satisfying the condition are selected
6.5.4 Mathematical Functions (Universal Functions)
NumPy provides many built-in mathematical functions that operate element-wise on arrays. These are called universal functions (ufuncs).
Example 7: Using Mathematical Functions
arr = np.array([1, 4, 9, 16])
print(np.sqrt(arr))
Output:
[1. 2. 3. 4.]
Example 8: Power and Absolute Value
arr = np.array([-1, -2, 3, -4])
print(np.abs(arr))
print(np.square(arr))
Output:
[1 2 3 4]
[ 1 4 9 16]
Example 9: Trigonometric Functions
angles = np.array([0, np.pi/2, np.pi])
print(np.sin(angles))
π NCERT Observation Universal functions operate on arrays without explicit loops and are highly efficient.
6.5.5 Aggregate Operations on Arrays
Aggregate operations calculate a single result from the entire array.
Example 10: Sum and Product
arr = np.array([1, 2, 3, 4])
print(np.sum(arr))
print(np.prod(arr))
Output:
10
24
Example 11: Minimum and Maximum
print(np.min(arr))
print(np.max(arr))
Example 12: Cumulative Sum
print(np.cumsum(arr))
Output:
[ 1 3 6 10]
6.5.6 Broadcasting Concept (Introductory)
Broadcasting allows NumPy to perform operations on arrays of different shapes under certain conditions.
Example:
arr = np.array([1, 2, 3])
print(arr + 10)
This works because the scalar is broadcast to all elements.
NCERT introduces broadcasting only conceptually at this stage.
Important Points to Remember
- Operations on NumPy arrays are vectorised
- Arithmetic and comparison operations are element-wise
- Scalar operations apply to every element
- Mathematical functions work efficiently on arrays
- Aggregate functions reduce arrays to single values
6.6 Concatenating Arrays
In practical data processing tasks, data is often available in multiple parts. For example, marks of students may be stored in different arrays for different sections, or experimental readings may be collected in separate batches. To analyse such data together, it becomes necessary to combine multiple arrays into a single array.
The process of joining two or more arrays together is known as concatenation. NumPy provides several built-in functions to concatenate arrays efficiently.
The NCERT textbook introduces array concatenation to help students understand how multiple datasets can be merged for further processing.
Understanding Array Concatenation
Array concatenation means:
- Joining arrays end to end
- Combining arrays along a specified axis
- Creating a new array that contains elements of all input arrays
Before performing concatenation, it is important to understand the concept of axis.
The Concept of Axis
In NumPy:
- Axis 0 represents rows (vertical direction)
- Axis 1 represents columns (horizontal direction)
Understanding axis is essential for correct concatenation.
For a 2D array:
axis = 0 β join row-wise
axis = 1 β join column-wise
6.6.1 Concatenating One-Dimensional Arrays
Using np.concatenate()
The concatenate() function joins two or more arrays into one.
Syntax
np.concatenate((array1, array2), axis=0)
Example 1: Concatenating 1D Arrays
import numpy as np
a = np.array([10, 20, 30])
b = np.array([40, 50, 60])
result = np.concatenate((a, b))
print(result)
Output:
[10 20 30 40 50 60]
Explanation:
- Both arrays are one-dimensional
- Elements are joined end to end
- Axis defaults to 0 for 1D arrays
π NCERT Exam Point All arrays must have the same shape, except along the axis of concatenation.
6.6.2 Concatenating Two-Dimensional Arrays
When working with two-dimensional arrays, concatenation can be done:
- Row-wise (axis = 0)
- Column-wise (axis = 1)
Example 2: Row-wise Concatenation (axis = 0)
a = np.array([[1, 2, 3],
[4, 5, 6]])
b = np.array([[7, 8, 9],
[10, 11, 12]])
result = np.concatenate((a, b), axis=0)
print(result)
Output:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Explanation:
- Rows of
bare added below rows ofa - Number of columns must be the same
Example 3: Column-wise Concatenation (axis = 1)
result = np.concatenate((a, b), axis=1)
print(result)
Output:
[[ 1 2 3 7 8 9]
[ 4 5 6 10 11 12]]
Explanation:
- Columns of
bare added to the right ofa - Number of rows must be the same
6.6.3 Using hstack() and vstack()
NumPy provides simpler functions for stacking arrays:
hstack()β horizontal stacking (column-wise)vstack()β vertical stacking (row-wise)
These functions are easier to use than concatenate().
Example 4: Using vstack()
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.vstack((a, b))
print(result)
Output:
[[1 2 3]
[4 5 6]]
Example 5: Using hstack()
result = np.hstack((a, b))
print(result)
Output:
[1 2 3 4 5 6]
π NCERT Observation
vstack() and hstack() internally use concatenate().
6.6.4 Common Errors in Concatenation
Students should be careful about:
- Mismatched dimensions
- Incorrect axis value
- Attempting to concatenate arrays of incompatible shapes
Example of error:
# Error due to shape mismatch
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6, 7]])
np.concatenate((a, b), axis=0)
Difference Between concatenate(), hstack(), and vstack()
| Function | Purpose |
|---|---|
concatenate() |
General-purpose joining |
vstack() |
Row-wise stacking |
hstack() |
Column-wise stacking |
Key Points to Remember
- Concatenation joins multiple arrays
- Axis determines direction of joining
- Shapes must be compatible
hstack()andvstack()simplify concatenation
6.7 Reshaping Arrays
When working with numerical data, the structure of data is just as important as the data itself. Often, the same set of values needs to be represented in different formsβfor example, as a single row, multiple rows, or a table. NumPy provides a powerful feature called reshaping, which allows the structure of an array to be changed without altering its data.
The NCERT textbook introduces reshaping to help students understand how array dimensions can be reorganised for different types of processing and analysis.
Understanding Shape and Dimensions
Every NumPy array has a shape, which describes:
- The number of rows
- The number of columns
For example:
arr = np.array([1, 2, 3, 4, 5, 6])
print(arr.shape)
Output:
(6,)
This indicates that the array is one-dimensional with 6 elements.
6.7.1 The reshape() Function
The reshape() function is used to change the shape of an array.
Syntax
array.reshape(new_shape)
The total number of elements must remain the same after reshaping.
Example 1: Reshaping a 1D Array into 2D
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
new_arr = arr.reshape(2, 3)
print(new_arr)
print(new_arr.shape)
Output:
[[1 2 3]
[4 5 6]]
(2, 3)
Explanation:
- Original array has 6 elements
- New shape has 2 rows and 3 columns
- Total elements remain unchanged
π NCERT Exam Point Reshaping does not change data, only its structure.
6.7.2 Reshaping into Different Dimensions
The same array can be reshaped into different valid shapes.
Example 2: Reshaping into 3 Rows and 2 Columns
new_arr = arr.reshape(3, 2)
print(new_arr)
Output:
[[1 2]
[3 4]
[5 6]]
Invalid Reshaping
# This will cause an error
arr.reshape(4, 2)
This fails because 4 Γ 2 = 8 β 6.
6.7.3 Using -1 in Reshaping
NumPy allows one dimension to be specified as -1. NumPy automatically calculates the appropriate size for that dimension.
Example 3: Automatic Dimension Calculation
new_arr = arr.reshape(2, -1)
print(new_arr)
Output:
[[1 2 3]
[4 5 6]]
Explanation:
- NumPy calculates the second dimension automatically
- Useful when total size is known but shape is flexible
π NCERT Observation
Only one dimension can be set to -1.
6.7.4 Reshaping Multi-Dimensional Arrays
Reshaping also works with multi-dimensional arrays.
Example 4: Reshaping a 2D Array
arr2d = np.array([[1, 2, 3],
[4, 5, 6]])
new_arr = arr2d.reshape(3, 2)
print(new_arr)
Output:
[[1 2]
[3 4]
[5 6]]
6.7.5 Flattening an Array
Flattening converts a multi-dimensional array into a one-dimensional array.
Example 5: Using flatten()
arr2d = np.array([[10, 20],
[30, 40]])
flat_arr = arr2d.flatten()
print(flat_arr)
Output:
[10 20 30 40]
Flattening is useful when data needs to be processed sequentially.
6.7.6 Reshape vs Resize (Conceptual)
NCERT introduces reshaping but briefly distinguishes it from resizing:
| reshape() | resize() |
|---|---|
| Returns a new array | Modifies original array |
| Does not change data | May repeat or truncate data |
| Safer to use | Needs caution |
At this level, reshape() is preferred.
Common Errors in Reshaping
Students should avoid:
- Choosing incompatible shapes
- Forgetting total element count
- Using more than one
-1
Key Points to Remember
- Reshaping changes structure, not data
- Total number of elements must remain same
-1allows automatic dimension calculation- Reshaping works for multi-dimensional arrays
- Flattening converts arrays to 1D
6.8 Splitting Arrays
In the previous section, we learned how to combine multiple arrays using concatenation. In many real-world situations, the opposite operation is also requiredβdividing a large array into smaller parts. This operation is known as splitting.
The NCERT textbook introduces array splitting to help students understand how datasets can be broken into meaningful sub-arrays for separate processing or analysis.
What Is Array Splitting
Array splitting means:
- Dividing an array into multiple smaller arrays
- Splitting can be done equally or at specified positions
- The original array remains unchanged
Splitting is especially useful when:
- Data needs to be processed in parts
- Rows and columns must be separated
- Training and testing datasets are required (conceptual)
6.8.1 Splitting One-Dimensional Arrays
Using np.split()
The split() function divides an array into equal-sized sub-arrays.
Syntax
np.split(array, number_of_splits)
Example 1: Splitting a 1D Array into Equal Parts
import numpy as np
arr = np.array([10, 20, 30, 40, 50, 60])
result = np.split(arr, 3)
print(result)
Output:
[array([10, 20]), array([30, 40]), array([50, 60])]
Explanation:
- Original array has 6 elements
- It is split into 3 equal parts
- Each sub-array has 2 elements
π NCERT Exam Point The number of splits must divide the array exactly, otherwise an error occurs.
Invalid Split Example
np.split(arr, 4)
This produces an error because 6 elements cannot be divided into 4 equal parts.
6.8.2 Splitting at Specific Positions
NumPy also allows splitting at specific index positions.
Example 2: Splitting Using Index Positions
result = np.split(arr, [2, 4])
print(result)
Output:
[array([10, 20]), array([30, 40]), array([50, 60])]
Explanation:
- First split at index 2
- Second split at index 4
- Resulting arrays are formed accordingly
6.8.3 Splitting Two-Dimensional Arrays
Splitting becomes more powerful when applied to multi-dimensional arrays.
Splitting Row-wise Using vsplit()
The vsplit() function splits an array vertically (row-wise).
Example 3: Row-wise Splitting
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]])
result = np.vsplit(arr2d, 2)
print(result)
Output:
[array([[1, 2, 3],
[4, 5, 6]]),
array([[7, 8, 9],
[10, 11, 12]])]
Explanation:
- Array is split into 2 parts vertically
- Each part has 2 rows
Splitting Column-wise Using hsplit()
The hsplit() function splits an array horizontally (column-wise).
Example 4: Column-wise Splitting
result = np.hsplit(arr2d, 3)
print(result)
Output:
[array([[ 1],
[ 4],
[ 7],
[10]]),
array([[ 2],
[ 5],
[ 8],
[11]]),
array([[ 3],
[ 6],
[ 9],
[12]])]
Explanation:
- Each column becomes a separate array
- Useful for separating features in datasets
6.8.4 Unequal Splitting Using Indices
Both hsplit() and vsplit() also support index-based splitting.
Example 5: Unequal Column-wise Split
result = np.hsplit(arr2d, [1, 2])
print(result)
Explanation:
- First split after column 1
- Second split after column 2
- Remaining columns form the last sub-array
Difference Between split(), hsplit(), and vsplit()
| Function | Splitting Direction |
|---|---|
split() |
General-purpose |
vsplit() |
Row-wise (vertical) |
hsplit() |
Column-wise (horizontal) |
Common Errors in Splitting Arrays
Students should be careful about:
- Using incorrect number of splits
- Shape mismatch during splitting
- Confusing row-wise and column-wise splitting
- Assuming original array is modified (it is not)
Important Observations (NCERT-Oriented)
- Splitting returns a list of arrays
- Original array remains unchanged
- Equal splitting requires exact division
- Index-based splitting offers flexibility
6.9 Statistical Operations on Arrays
In data handling and analysis, it is often not enough to simply store and display data. To draw meaningful conclusions, we need to summarise and analyse data numerically. Statistical operations help in understanding the central tendency, spread, and distribution of data.
The NCERT textbook introduces statistical operations on NumPy arrays to show how large datasets can be analysed efficiently using built-in functions.
NumPy provides a wide range of statistical functions that operate directly on arrays without the need for explicit loops.
Why Statistical Operations Are Important
Statistical operations help in:
- Finding average performance
- Identifying minimum and maximum values
- Measuring variation in data
- Comparing datasets
For example, when analysing marks of students, we may want to know:
- Average marks
- Highest and lowest marks
- How much marks vary from the average
6.9.1 Sum of Array Elements
The sum of all elements in an array can be calculated using the sum() function.
Example 1: Calculating Sum
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(np.sum(arr))
Output:
150
Explanation:
- All elements of the array are added together
- Result is a single value
Sum Along Axis (2D Array)
arr2d = np.array([[10, 20, 30],
[40, 50, 60]])
print(np.sum(arr2d, axis=0))
print(np.sum(arr2d, axis=1))
Output:
[50 70 90]
[ 60 150]
Explanation:
axis=0β column-wise sumaxis=1β row-wise sum
6.9.2 Mean (Average)
The mean represents the average value of the data.
Formula
Mean = Sum of values / Number of values
NumPy provides the mean() function to calculate this easily.
Example 2: Calculating Mean
arr = np.array([60, 70, 80, 90])
print(np.mean(arr))
Output:
75.0
Mean of 2D Array
print(np.mean(arr2d, axis=0))
print(np.mean(arr2d, axis=1))
π NCERT Exam Point
Statistical functions can be applied row-wise or column-wise using the axis parameter.
6.9.3 Minimum and Maximum Values
The minimum and maximum values in an array can be found using min() and max().
Example 3: Minimum and Maximum
arr = np.array([45, 72, 18, 90, 66])
print(np.min(arr))
print(np.max(arr))
Output:
18
90
Minimum and Maximum Along Axis
print(np.min(arr2d, axis=0))
print(np.max(arr2d, axis=1))
6.9.4 Median
The median is the middle value when the data is arranged in ascending order.
- If the number of elements is odd β middle value
- If even β average of two middle values
NumPy provides the median() function.
Example 4: Calculating Median
arr = np.array([30, 10, 50, 20, 40])
print(np.median(arr))
Output:
30.0
6.9.5 Standard Deviation
Standard deviation measures how much the values in a dataset deviate from the mean.
A small standard deviation indicates that values are close to the mean, while a large value indicates wide variation.
NumPy provides the std() function.
Example 5: Calculating Standard Deviation
arr = np.array([10, 12, 14, 16, 18])
print(np.std(arr))
π NCERT Observation Standard deviation is useful for analysing the spread of data.
6.9.6 Variance
Variance is another measure of dispersion and is the square of the standard deviation.
Example 6: Calculating Variance
print(np.var(arr))
6.9.7 Combined Statistical Example
marks = np.array([65, 70, 75, 80, 85])
print("Sum:", np.sum(marks))
print("Mean:", np.mean(marks))
print("Minimum:", np.min(marks))
print("Maximum:", np.max(marks))
print("Median:", np.median(marks))
print("Standard Deviation:", np.std(marks))
This type of program is very common in practical exams.
Important Points to Remember
- Statistical functions return numeric results
axiscontrols row-wise or column-wise calculation- Mean, median, and standard deviation describe data behaviour
- NumPy simplifies statistical analysis significantly
6.10 Loading Arrays from Files
In practical data-handling situations, data is rarely entered manually inside a program. Instead, data is usually stored in files such as text files or CSV files. To analyse such data using NumPy, it is necessary to load the data from files into NumPy arrays.
The NCERT textbook introduces file-based loading of arrays to help students understand how external data sources can be connected to NumPy programs.
NumPy provides simple and efficient functions to read data stored in files and convert it into NumPy arrays for further processing.
Why Load Data from Files
Loading data from files is useful because:
- Large datasets cannot be typed manually
- Data may already exist in files
- Same data can be reused multiple times
- Data can be shared between programs
Common examples include:
- Student marks stored in a file
- Temperature readings collected daily
- Sales data stored month-wise
6.10.1 Loading Data Using loadtxt()
The most commonly used NumPy function for loading data from text files is loadtxt().
Syntax
np.loadtxt(filename, delimiter)
filenameβ name of the filedelimiterβ character that separates values (such as space or comma)
Example 1: Loading Data from a Text File
Assume a text file marks.txt contains the following data:
45 67 78 89
56 68 90 72
60 75 85 95
Python program:
import numpy as np
data = np.loadtxt("marks.txt")
print(data)
Output:
[[45. 67. 78. 89.]
[56. 68. 90. 72.]
[60. 75. 85. 95.]]
Explanation:
- Each row in the file becomes a row in the array
- All values are converted to floating-point numbers
- The result is a 2D NumPy array
π NCERT Exam Point
By default, loadtxt() reads data as float type.
6.10.2 Using Delimiter in loadtxt()
When data values are separated by commas (CSV format), the delimiter must be specified.
Example 2: Loading CSV Data
Assume file scores.csv contains:
65,70,75
80,85,90
60,68,72
Python program:
data = np.loadtxt("scores.csv", delimiter=",")
print(data)
Explanation:
- Comma is specified as delimiter
- Data is loaded correctly into array form
6.10.3 Skipping Header Rows
Sometimes files contain header rows (titles or labels). These rows must be skipped.
Example 3: Skipping Rows
Assume file data.txt:
Marks of Students
45 67 78
56 68 90
Python program:
data = np.loadtxt("data.txt", skiprows=1)
print(data)
Explanation:
skiprows=1skips the header line- Only numeric data is loaded
6.10.4 Loading Selected Columns
NumPy allows loading specific columns from a file using the usecols parameter.
Example 4: Loading Selected Columns
data = np.loadtxt("marks.txt", usecols=(0, 2))
print(data)
Explanation:
- Only first and third columns are loaded
- Useful when only part of data is needed
6.10.5 Handling Missing Values (Conceptual)
If a file contains missing or invalid data, loadtxt() may raise an error.
At this level, NCERT expects students to:
- Ensure clean numeric data
- Avoid missing values in files
Advanced handling of missing values is discussed in higher classes.
Important Points to Remember
- NumPy can read data directly from files
loadtxt()converts file data into NumPy arrays- Default data type is float
- Delimiters must be specified for CSV files
- Header rows can be skipped
- Selected columns can be loaded
6.11 Saving NumPy Arrays in Files on Disk
In the previous section, we learned how to load data from files into NumPy arrays. In many practical situations, the reverse operation is equally important. After processing or analysing data, we often need to store the results permanently so that they can be reused later. This process is known as saving arrays to files.
The NCERT textbook introduces saving NumPy arrays to files to help students understand how processed data can be stored, shared, and reused across different programs.
NumPy provides simple functions to save arrays in:
- Text files
- Binary files
At this level, NCERT mainly focuses on saving arrays in text format.
Why Save NumPy Arrays to Files
Saving arrays to files is useful because:
- Results of computations can be preserved
- Data can be reused without recalculation
- Large datasets can be stored efficiently
- Data can be shared between programs or users
For example:
- Saving processed marks of students
- Storing statistical results
- Writing sensor readings to disk
6.11.1 Saving Arrays Using savetxt()
The most commonly used NumPy function to save arrays in text files is savetxt().
Syntax
np.savetxt(filename, array, delimiter)
filenameβ name of the output filearrayβ NumPy array to be saveddelimiterβ character separating values (optional)
Example 1: Saving a One-Dimensional Array
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
np.savetxt("numbers.txt", arr)
Explanation:
- The array is saved in the file
numbers.txt - Each element is written on a new line by default
Contents of numbers.txt:
10.000000000000000000e+01
20.000000000000000000e+01
30.000000000000000000e+01
40.000000000000000000e+01
50.000000000000000000e+01
π NCERT Observation By default, NumPy saves values in scientific notation.
6.11.2 Controlling the Output Format
To make the output more readable, the fmt parameter can be used.
Example 2: Formatting Output
np.savetxt("numbers.txt", arr, fmt="%d")
Contents of numbers.txt:
10
20
30
40
50
Explanation:
%dformats values as integers- Output becomes cleaner and readable
6.11.3 Saving Two-Dimensional Arrays
Two-dimensional arrays are saved row-wise, with values separated by spaces or a specified delimiter.
Example 3: Saving a 2D Array
arr2d = np.array([[65, 70, 75],
[80, 85, 90],
[60, 68, 72]])
np.savetxt("marks.txt", arr2d, fmt="%d")
Contents of marks.txt:
65 70 75
80 85 90
60 68 72
6.11.4 Saving Arrays in CSV Format
When data needs to be saved in CSV (Comma-Separated Values) format, the delimiter parameter is used.
Example 4: Saving as CSV File
np.savetxt("marks.csv", arr2d, fmt="%d", delimiter=",")
Contents of marks.csv:
65,70,75
80,85,90
60,68,72
This format is widely used and can be opened in spreadsheet software.
6.11.5 Adding Headers and Comments
NumPy allows adding a header to the file for clarity.
Example 5: Saving with Header
np.savetxt("marks.csv", arr2d, fmt="%d", delimiter=",",
header="Maths,Science,English")
Explanation:
- Header is added at the top of the file
- Useful for describing columns
- Header lines start with
#by default
6.11.6 Reloading Saved Files (Conceptual Link)
Files saved using savetxt() can be loaded again using loadtxt().
Example:
data = np.loadtxt("marks.csv", delimiter=",")
print(data)
This demonstrates a complete data cycle:
- Load data
- Process data
- Save results
- Reload when required
Common Errors While Saving Arrays
Students should avoid:
- Forgetting to specify delimiter for CSV files
- Using incorrect format specifiers
- Overwriting important files unintentionally
- Saving non-numeric data using
savetxt()
Difference Between loadtxt() and savetxt()
| loadtxt() | savetxt() |
|---|---|
| Reads data from file | Writes data to file |
| Creates NumPy array | Saves NumPy array |
| Input operation | Output operation |
Key Points to Remember
- NumPy arrays can be saved to disk
savetxt()writes arrays to text files- Formatting improves readability
- CSV format is widely used
- Saved files can be reloaded using NumPy