Database Normalization
What Is Normalization?
Database normalization is the process of organizing data to minimize redundancy and improve data integrity. It involves breaking down tables into smaller, related tables.
Normal Forms
First Normal Form (1NF)
- Remove repeating groups
- Each column contains atomic (indivisible) values
- No duplicate columns
Second Normal Form (2NF)
- Satisfy 1NF
- Remove partial dependencies
- Non-key attributes must depend on entire primary key
Third Normal Form (3NF)
- Satisfy 2NF
- Remove transitive dependencies
- Non-key attributes depend only on primary key
Boyce-Codd Normal Form (BCNF)
- Stricter than 3NF
- Every determinant is a candidate key
Example: Normalization Process
Unnormalized (Bad)
Students
StudentID | Name | Courses
1 | Alice | Math, Physics, Chemistry
2 | Bob | English, History
1NF (Better)
Students
StudentID | Name
1 | Alice
2 | Bob
StudentCourses
StudentID | Course
1 | Math
1 | Physics
1 | Chemistry
2 | English
2 | History
Benefits
- Reduces data redundancy
- Improves data integrity
- Easier to maintain and update
- Better query performance (fewer joins needed)
- Reduces storage space
Trade-offs
- More tables = more joins
- More complex queries
- Slightly slower reads
- Need careful schema design
Denormalization
Sometimes intentionally denormalize for performance:
- Pre-computed aggregates
- Cached calculations
- Duplicate data for read performance
Always measure impact before denormalizing.
Best Practices
- Normalize to at least 3NF by default
- Denormalize only when measured need exists
- Document your normalization decisions
- Use surrogate keys (auto-increment IDs)
- Define proper foreign keys with constraints
- Regularly review schema for optimization
