Array Data Structures
Arrays are fundamental data structures that store elements in contiguous memory locations, enabling efficient access by index. Understanding arrays deeply is essential for solving algorithmic problems and building efficient systems.
What Are Arrays?
An array is a collection of elements stored at consecutive memory addresses. Each element can be accessed directly using its index, making arrays one of the most efficient data structures for random access.
Key Properties
- Fixed Size: Classic arrays have a predetermined size
- Contiguous Memory: Elements stored sequentially in memory
- Index-Based Access: O(1) time to access any element
- Homogeneous: Typically store elements of the same type
Array Variations
One-Dimensional Arrays
The simplest form: a linear sequence of elements.
# Python list (dynamic array)
arr = [1, 2, 3, 4, 5]
# Access element
print(arr[2]) # Output: 3
# Update element
arr[2] = 10
# Iterate
for num in arr:
print(num)
Multidimensional Arrays
Arrays with multiple dimensions, commonly used for matrices, grids, and tables.
Two-Dimensional Arrays (Matrices):
# 2D array (matrix)
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Access element at row i, column j
print(matrix[1][2]) # Output: 6
# Iterate through 2D array
for i in range(len(matrix)):
for j in range(len(matrix[i])):
print(matrix[i][j], end=' ')
print()
Higher Dimensions:
# 3D array (cube)
cube = [
[[1, 2], [3, 4]],
[[5, 6], [7, 8]]
]
# Access element at [layer][row][col]
print(cube[0][1][0]) # Output: 3
Dynamic Arrays
Arrays that can grow or shrink in size automatically.
# Python lists are dynamic arrays
dynamic = []
# Append (amortized O(1))
dynamic.append(1)
dynamic.append(2)
# Insert at position (O(n))
dynamic.insert(0, 0)
# Remove element (O(n))
dynamic.remove(1)
# Pop from end (O(1))
dynamic.pop()
Common Operations
Access
Search
# Linear search: O(n)
def linear_search(arr, target):
for i, num in enumerate(arr):
if num == target:
return i
return -1
# Binary search (sorted array): O(log n)
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Insert
# Insert at end (dynamic array): O(1) amortized
arr.append(value)
# Insert at position: O(n)
arr.insert(index, value)
Delete
# Delete from end: O(1)
arr.pop()
# Delete from position: O(n)
arr.pop(index)
del arr[index]
# Delete by value: O(n)
arr.remove(value)
Update
Common Patterns
Prefix Sum
Efficiently compute range sums.
def build_prefix_sum(arr):
"""Build prefix sum array."""
prefix = [0] * (len(arr) + 1)
for i in range(len(arr)):
prefix[i + 1] = prefix[i] + arr[i]
return prefix
def range_sum(prefix, left, right):
"""Get sum of elements from left to right (inclusive)."""
return prefix[right + 1] - prefix[left]
# Example
arr = [1, 2, 3, 4, 5]
prefix = build_prefix_sum(arr)
print(range_sum(prefix, 1, 3)) # Sum of arr[1:4] = 2+3+4 = 9
Two Pointers
Process array from both ends or with fast/slow pointers.
def reverse_array(arr):
"""Reverse array in-place using two pointers."""
left, right = 0, len(arr) - 1
while left < right:
arr[left], arr[right] = arr[right], arr[left]
left += 1
right -= 1
Sliding Window
Maintain a window of elements.
def max_sum_subarray(arr, k):
"""Find maximum sum of k consecutive elements."""
if len(arr) < k:
return None
# Initial window
window_sum = sum(arr[:k])
max_sum = window_sum
# Slide window
for i in range(k, len(arr)):
window_sum += arr[i] - arr[i - k]
max_sum = max(max_sum, window_sum)
return max_sum
Matrix Operations
Matrix Traversal
def traverse_matrix(matrix):
"""Common traversal patterns."""
rows, cols = len(matrix), len(matrix[0])
# Row-wise
for i in range(rows):
for j in range(cols):
print(matrix[i][j])
# Column-wise
for j in range(cols):
for i in range(rows):
print(matrix[i][j])
# Diagonal
for i in range(min(rows, cols)):
print(matrix[i][i])
Matrix Rotation
def rotate_90_clockwise(matrix):
"""Rotate n x n matrix 90 degrees clockwise in-place."""
n = len(matrix)
# Transpose
for i in range(n):
for j in range(i + 1, n):
matrix[i][j], matrix[j][i] = matrix[j][i], matrix[i][j]
# Reverse each row
for i in range(n):
matrix[i].reverse()
Spiral Traversal
def spiral_order(matrix):
"""Traverse matrix in spiral order."""
if not matrix:
return []
result = []
top, bottom = 0, len(matrix) - 1
left, right = 0, len(matrix[0]) - 1
while top <= bottom and left <= right:
# Right
for j in range(left, right + 1):
result.append(matrix[top][j])
top += 1
# Down
for i in range(top, bottom + 1):
result.append(matrix[i][right])
right -= 1
# Left
if top <= bottom:
for j in range(right, left - 1, -1):
result.append(matrix[bottom][j])
bottom -= 1
# Up
if left <= right:
for i in range(bottom, top - 1, -1):
result.append(matrix[i][left])
left += 1
return result
Performance Characteristics
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Access by index | O(1) | O(1) |
| Search (unsorted) | O(n) | O(1) |
| Search (sorted) | O(log n) | O(1) |
| Insert at end (dynamic) | O(1) amortized | O(1) |
| Insert at position | O(n) | O(1) |
| Delete from end | O(1) | O(1) |
| Delete from position | O(n) | O(1) |
Common Applications
Data Storage
- Sequential data storage
- Fixed-size collections
- Buffer implementation
Algorithm Building Blocks
- Sorting algorithms
- Searching algorithms
- Graph representations (adjacency matrix)
Image Processing
- Pixel data storage (2D/3D arrays)
- Convolution operations
- Image transformations
Mathematical Operations
- Matrix operations
- Vector operations
- Numerical computations
Practice Tips
- Master index manipulation and boundary handling
- Understand in-place vs extra space trade-offs
- Practice common patterns: two pointers, sliding window, prefix sums
- Visualize array state changes during operations
- Consider edge cases: empty arrays, single elements, duplicates
- Think about space complexity for multidimensional problems
Common Mistakes
- Off-by-one errors in loops and indices
- Array index out of bounds
- Modifying array while iterating
- Not handling empty array cases
- Inefficient nested loop operations
- Memory issues with large multidimensional arrays
- Forgetting to handle matrix dimensions properly
Advanced Range Query Techniques
When processing arrays, you often need to answer questions about ranges: "What's the sum from index i to j?" or "What's the minimum value in this range?" Naive approaches are O(n) per query. Advanced techniques reduce this dramatically.
Technique 1: Prefix Sum Arrays
Problem: Answer multiple range sum queries efficiently.
Naive Approach: O(n) per query - iterate and sum each time Optimized: O(1) per query after O(n) preprocessing
class RangeSumQuery:
"""
Preprocess array for O(1) range sum queries.
Build Time: O(n)
Query Time: O(1)
Space: O(n)
"""
def __init__(self, nums):
"""Build prefix sum array."""
n = len(nums)
self.prefix = [0] * (n + 1)
# prefix[i] = sum of nums[0..i-1]
for i in range(n):
self.prefix[i + 1] = self.prefix[i] + nums[i]
def range_sum(self, left, right):
"""
Sum of elements from left to right (inclusive).
Math: sum[left..right] = prefix[right+1] - prefix[left]
"""
return self.prefix[right + 1] - self.prefix[left]
# Example: Stock price changes
price_changes = [2, -1, 3, -2, 1]
rsq = RangeSumQuery(price_changes)
print(rsq.range_sum(1, 3)) # -1 + 3 + -2 = 0
print(rsq.range_sum(0, 4)) # Total change = 3
- Applications
- Subarray sum problems
- Image processing (summed-area tables)
- Database query optimization
- Cumulative statistics
Technique 2: Sparse Table (Range Minimum/Maximum Query)
Problem: Answer range min/max queries where array doesn't change.
Approach: Precompute answers for all ranges of size 2^k
import math
class SparseTable:
"""
Range minimum query using sparse table.
Build Time: O(n log n)
Query Time: O(1)
Space: O(n log n)
Perfect for static arrays with many queries.
"""
def __init__(self, arr):
n = len(arr)
if n == 0:
self.table = []
self.log_table = []
return
# Max power of 2 we need
max_log = int(math.log2(n)) + 1
# table[i][j] = min in range [i, i + 2^j - 1]
self.table = [[0] * max_log for _ in range(n)]
# Base case: ranges of length 1
for i in range(n):
self.table[i][0] = arr[i]
# Fill table for ranges of length 2^j
for j in range(1, max_log):
i = 0
while i + (1 << j) <= n:
# Range [i, i + 2^j - 1] =
# min of [i, i + 2^(j-1) - 1] and [i + 2^(j-1), i + 2^j - 1]
left = self.table[i][j - 1]
right = self.table[i + (1 << (j - 1))][j - 1]
self.table[i][j] = min(left, right)
i += 1
# Precompute log values for O(1) query
self.log_table = [0] * (n + 1)
for i in range(2, n + 1):
self.log_table[i] = self.log_table[i // 2] + 1
def range_min(self, left, right):
"""
Minimum in range [left, right].
Key insight: Any range can be covered by two
(possibly overlapping) power-of-2 ranges.
"""
if not self.table:
return None
# Find largest power of 2 that fits in range
length = right - left + 1
j = self.log_table[length]
# Cover range with two (possibly overlapping) blocks
left_min = self.table[left][j]
right_min = self.table[right - (1 << j) + 1][j]
return min(left_min, right_min)
# Example: Stock prices - find minimum in any date range
prices = [100, 80, 120, 70, 110, 90]
st = SparseTable(prices)
print(f"Min price days 1-4: ${st.range_min(1, 4)}") # 70
print(f"Min price days 0-2: ${st.range_min(0, 2)}") # 80
print(f"Min price all days: ${st.range_min(0, 5)}") # 70
- When to Use Sparse Table
- Array is static (no updates between queries)
- Need O(1) query time
- Many queries on same array
- Idempotent operations (min, max, GCD) - not sum!
Technique 3: Difference Array (Range Updates)
Problem: Apply updates to ranges efficiently.
class RangeUpdateArray:
"""
Efficiently apply multiple range updates.
Build: O(n)
Update: O(1) per update
Finalize: O(n) to get final array
Perfect when you have many updates before needing final values.
"""
def __init__(self, size):
# Difference array: diff[i] = arr[i] - arr[i-1]
self.diff = [0] * (size + 1)
self.size = size
def range_add(self, left, right, val):
"""Add val to all elements in range [left, right]."""
self.diff[left] += val
self.diff[right + 1] -= val
def get_array(self):
"""Reconstruct final array from difference array."""
result = [0] * self.size
current = 0
for i in range(self.size):
current += self.diff[i]
result[i] = current
return result
# Example: Event scheduling with overlapping events
schedule = RangeUpdateArray(10) # 10 time slots
# Event 1: slots 0-3
schedule.range_add(0, 3, 1)
# Event 2: slots 2-5
schedule.range_add(2, 5, 1)
# Event 3: slots 4-7
schedule.range_add(4, 7, 1)
load = schedule.get_array()
print("Concurrent events per slot:", load)
# [1, 1, 2, 2, 3, 3, 2, 2, 0, 0]
print(f"Max concurrent events: {max(load)}") # 3
- Applications
- Calendar applications (booking conflicts)
- Network bandwidth allocation
- Task scheduling
- Image filters (box blur)
Five Data Structures for Image Processing
Image processing relies heavily on efficient array operations. Understanding these structures is crucial for computer vision and graphics.
1. 2D Arrays (Pixel Matrices)
Core Structure: Images are 2D arrays where each element represents a pixel.
# Grayscale image: 2D array of intensity values (0-255)
grayscale = [
[100, 120, 130],
[110, 125, 135],
[115, 130, 140]
]
# RGB image: 3D array (height, width, channels)
rgb_image = [
[[255, 0, 0], [0, 255, 0]], # Row 1: Red, Green
[[0, 0, 255], [255, 255, 0]] # Row 2: Blue, Yellow
]
def get_pixel(image, row, col):
"""Access pixel with bounds checking."""
if 0 <= row < len(image) and 0 <= col < len(image[0]):
return image[row][col]
return None # Out of bounds
2. Summed-Area Tables (Integral Images)
Purpose: Calculate sum of any rectangular region in O(1).
def build_summed_area_table(image):
"""
Build integral image for fast region queries.
Time: O(rows * cols)
Query: O(1)
"""
rows, cols = len(image), len(image[0])
sat = [[0] * (cols + 1) for _ in range(rows + 1)]
for i in range(rows):
for j in range(cols):
sat[i + 1][j + 1] = (
image[i][j] +
sat[i][j + 1] + # Left
sat[i + 1][j] - # Top
sat[i][j] # Overlap correction
)
return sat
def region_sum(sat, r1, c1, r2, c2):
"""
Sum of rectangle from (r1,c1) to (r2,c2).
Uses inclusion-exclusion principle.
"""
return (
sat[r2 + 1][c2 + 1] -
sat[r1][c2 + 1] -
sat[r2 + 1][c1] +
sat[r1][c1]
)
# Example: Fast box blur
image = [
[10, 20, 30],
[40, 50, 60],
[70, 80, 90]
]
sat = build_summed_area_table(image)
# Average of 2x2 region (0,0) to (1,1)
total = region_sum(sat, 0, 0, 1, 1)
avg = total // 4
print(f"Average: {avg}") # (10+20+40+50)/4 = 30
- Applications
- Fast box blur filters
- Haar feature calculation (face detection)
- Template matching
- Adaptive thresholding
3. Convolution Kernels (Small 2D Arrays)
Purpose: Apply filters by sliding kernel over image.
def apply_kernel(image, kernel):
"""
Apply convolution kernel to image.
Time: O(rows * cols * k_rows * k_cols)
"""
rows, cols = len(image), len(image[0])
k_rows, k_cols = len(kernel), len(kernel[0])
# Output size (valid convolution, no padding)
out_rows = rows - k_rows + 1
out_cols = cols - k_cols + 1
result = [[0] * out_cols for _ in range(out_rows)]
for i in range(out_rows):
for j in range(out_cols):
# Apply kernel
total = 0
for ki in range(k_rows):
for kj in range(k_cols):
total += image[i + ki][j + kj] * kernel[ki][kj]
result[i][j] = total
return result
# Common kernels
edge_detect = [
[-1, -1, -1],
[-1, 8, -1],
[-1, -1, -1]
]
sharpen = [
[ 0, -1, 0],
[-1, 5, -1],
[ 0, -1, 0]
]
gaussian_blur = [
[1, 2, 1],
[2, 4, 2],
[1, 2, 1]
] # Divide by 16 after applying
4. Histograms (1D Arrays for Intensity Distribution)
Purpose: Analyze pixel intensity distribution.
def compute_histogram(image):
"""
Build histogram of pixel intensities.
Time: O(rows * cols)
Space: O(256) for 8-bit images
"""
histogram = [0] * 256
for row in image:
for pixel in row:
histogram[pixel] += 1
return histogram
def histogram_equalization(image):
"""
Enhance contrast using histogram equalization.
Redistributes intensities to use full range.
"""
hist = compute_histogram(image)
# Compute cumulative distribution
cdf = [0] * 256
cdf[0] = hist[0]
for i in range(1, 256):
cdf[i] = cdf[i - 1] + hist[i]
# Normalize CDF
total_pixels = sum(hist)
cdf_normalized = [int((cdf[i] / total_pixels) * 255) for i in range(256)]
# Apply mapping to image
rows, cols = len(image), len(image[0])
result = [[cdf_normalized[image[i][j]] for j in range(cols)] for i in range(rows)]
return result
- Applications
- Contrast enhancement
- Thresholding (segmentation)
- Color correction
- Image comparison
5. Quadtrees (Hierarchical 2D Arrays)
Purpose: Efficiently represent sparse images or regions.
class QuadtreeNode:
"""Node in quadtree representation of image."""
def __init__(self, x, y, size, value=None):
self.x = x # Top-left corner
self.y = y
self.size = size # Width/height (square)
self.value = value # None if internal node
self.children = [] # [NW, NE, SW, SE]
def is_leaf(self):
return len(self.children) == 0
def build_quadtree(image, x, y, size, threshold=10):
"""
Build quadtree from image region.
Recursively subdivide if region variance > threshold.
"""
# Extract region
region = []
for i in range(y, min(y + size, len(image))):
for j in range(x, min(x + size, len(image[0]))):
region.append(image[i][j])
if not region:
return None
# Check if region is homogeneous
avg = sum(region) / len(region)
variance = sum((p - avg) ** 2 for p in region) / len(region)
if variance <= threshold or size == 1:
# Leaf node: represent entire region with average
return QuadtreeNode(x, y, size, int(avg))
# Internal node: subdivide into 4 quadrants
node = QuadtreeNode(x, y, size)
half = size // 2
# Recursively build children
node.children = [
build_quadtree(image, x, y, half, threshold), # NW
build_quadtree(image, x + half, y, half, threshold), # NE
build_quadtree(image, x, y + half, half, threshold), # SW
build_quadtree(image, x + half, y + half, half, threshold) # SE
]
return node
def count_nodes(node):
"""Count total nodes in quadtree."""
if node is None:
return 0
if node.is_leaf():
return 1
return 1 + sum(count_nodes(child) for child in node.children)
# Example: Compress image with large uniform regions
# Full resolution: 256x256 = 65,536 pixels
# Quadtree: ~1,000 nodes for typical image (98.5% compression!)
- Applications
- Image compression
- Collision detection in games
- Spatial indexing
- Level-of-detail rendering
Further Learning
The techniques and structures above provide production-ready implementations for real-world scenarios. For additional perspectives:
- Comprehensive Array Fundamentals: External resources may offer alternative explanations of 1D/2D/3D arrays and dynamic array implementation details
- Advanced Range Query Optimizations: Some resources cover additional data structures like Segment Trees and Fenwick Trees for updateable range queries
- Image Processing Deep Dives: Computer vision textbooks provide mathematical foundations for filters and transformations
The key is understanding why each structure is chosen for specific scenarios - that's more valuable than memorizing implementations.