Chapter 07 - Understanding Data
7.1 Introduction to Data
What is Data?
Data refers to raw facts and figures collected from observations, measurements, or surveys.
📌 Data by itself has no meaning unless it is processed.
Examples
- Student marks
- Temperature readings
- Sales figures
- Population numbers
Data → Information
When data is processed, analyzed, and interpreted, it becomes information.
📌 Example:
- Data:
45, 60, 72 - Information: “Average marks of students is 59”
Importance of Data
- Helps in decision making
- Used in business, science, healthcare, education
- Basis of data analytics and AI
7.2 Data Collection
What is Data Collection?
Data collection is the process of gathering data from various sources.
Types of Data
1️⃣ Primary Data
Data collected first-hand for a specific purpose.
📌 Examples:
- Surveys
- Interviews
- Questionnaires
- Experiments
✔ More accurate ❌ Time-consuming and expensive
2️⃣ Secondary Data
Data collected by others and reused.
📌 Examples:
- Census reports
- Government databases
- Websites
- Books and journals
✔ Easy to obtain ❌ May not be fully reliable
Methods of Data Collection
| Method | Description |
|---|---|
| Survey | Collects opinions |
| Observation | Watching events |
| Interview | Direct interaction |
| Online forms | Digital data |
7.3 Data Storage
What is Data Storage?
Data storage refers to saving data in a structured format so that it can be accessed later.
Types of Storage
1️⃣ Primary Storage (Main Memory)
- RAM
- Temporary storage
- Fast access
📌 Data lost when power is off
2️⃣ Secondary Storage
- Hard Disk
- SSD
- Pen Drive
- Cloud storage
📌 Permanent storage
Data Storage Formats
- Text files (
.txt,.csv) - Databases (tables)
- Binary files
- Cloud databases
Importance of Data Storage
- Data reuse
- Data security
- Data backup
- Efficient retrieval
7.4 Data Processing
What is Data Processing?
Data processing is the conversion of raw data into meaningful information.
Steps in Data Processing Cycle
1️⃣ Data Collection 2️⃣ Data Input 3️⃣ Data Processing 4️⃣ Data Output 5️⃣ Data Storage
📌 This cycle is continuous.
Types of Data Processing
| Type | Description |
|---|---|
| Manual | Done by humans |
| Mechanical | Uses machines |
| Electronic | Uses computers |
Data Cleaning
Before processing, data must be:
- Checked for errors
- Missing values handled
- Duplicate data removed
📌 This improves accuracy.
7.5 Statistical Techniques for Data Processing
Statistical techniques help in analyzing and interpreting data.
🔸 1️⃣ Mean (Average)
Formula: [ \text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}} ]
📌 Example:
Data: 10, 20, 30
Mean = (10+20+30)/3 = 20
🔸 2️⃣ Median
The middle value when data is arranged in order.
📌 Steps:
- Arrange data
- If odd → middle value
- If even → average of two middle values
Example:
Data: 5, 10, 15 → Median = 10
🔸 3️⃣ Mode
The value that occurs most frequently.
Example:
Data: 2, 4, 4, 6 → Mode = 4
🔸 4️⃣ Range
Difference between highest and lowest values.
Formula: [ \text{Range} = \text{Maximum} - \text{Minimum} ]
🔸 5️⃣ Frequency Distribution
Shows how often values occur.
📌 Used to:
- Identify trends
- Detect patterns
- Summarize data
Example (Table)
| Value | Frequency |
|---|---|
| 10 | 2 |
| 20 | 3 |
| 30 | 1 |
Importance of Statistical Techniques
- Helps summarize large data
- Makes comparisons easy
- Used in data science, AI, business analytics
📝 NCERT EXAM SUMMARY (VERY IMPORTANT)
- Data = raw facts
- Information = processed data
- Primary vs Secondary data
- Data processing cycle
- Mean, Median, Mode formulas
- Range = Max – Min
- Data must be cleaned before analysis
Short Answer Questions & Answers
Q1. What is data?
Answer: Data refers to raw facts and figures collected from observations, measurements, or surveys. By itself, data has no meaning until it is processed.
Q2. How is data different from information?
Answer: Data is raw and unprocessed, whereas information is processed data that is meaningful and useful for decision-making.
Q3. What is primary data? Give one example.
Answer: Primary data is data collected first-hand for a specific purpose. Example: Data collected through surveys or interviews.
Q4. What is secondary data? Mention one source.
Answer: Secondary data is data collected by others and reused. Example: Census reports, government databases, books.
Q5. What is data storage? Why is it needed?
Answer: Data storage is the process of saving data for future use. It is needed for data reuse, backup, security, and efficient retrieval.
Q6. What is the data processing cycle?
Answer: The data processing cycle consists of: Data collection → Input → Processing → Output → Storage.
Q7. What is data cleaning?
Answer: Data cleaning is the process of removing errors, duplicates, and missing values from data to improve accuracy before analysis.
Q8. Define mean with formula.
Answer: Mean is the average of data values. [ \text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}} ]
Q9. What is median? How is it calculated?
Answer: Median is the middle value of an arranged data set.
- If values are odd → middle value
- If values are even → average of two middle values
Q10. What is mode and range?
Answer:
- Mode is the value that occurs most frequently in a data set.
- Range is the difference between the highest and lowest values in the data.