(i) Please explain how SIFT improves the implement efficiency of blob detection.
Lecture 7
It detects blobs at different scales using a Difference of Gaussian (DoG) method.
It efficiently identifies scale-invariant blobs and computes their descriptors, allowing for robust matching across images with varying scales and rotations.
(ii)How many difference of Gaussian scale-space layers can be acquired from the image below? The blob features can be detected in how many scales for each octave?
5 DoG layers can be obtained from 6 Gaussian-smoothed images.
3 scales are used for blob feature detection (considering the need to compare adjacent layers).
(c) The bookstore hopes to develop a camera based book sales information retrieval system. When books arrive at the bookstore for storage, the bookstore administrator will upload the image of the book cover along with relevant information to the system. Potential purchasing users can place the book cover close to the camera, and the system will search for relevant information such as book prices, discounts, and inventory based on the cover image. The distance between the book and the camera is not completely fixed. Please detail how a possible approach to implementing such an image based lookup system based on the SIFT feature detector and descriptor.
(图书入库时)
Apply the SIFT algorithm to detect key points and extract descriptors from the uploaded cover.
Store these features in a database along with the associated book information.
(用户购买时)
Match the features from the captured image with the stored features using a matching algorithm, such as the K-NN .
Use RANSAC to eliminate outliers and find the best match.
(d) The following picture contains four regions (A, B, C, D) with different values of and , where and are the eigenvalues of . Classify and choose the correspondences of A, B, C, D to linear edge, flat, and corner, respectively, based on the values of and .
A: Flat region
Both λ₁ and λ₂ are small
B: Edge
Small λ₁, large λ₂
C: Corner
Both λ₁ and λ₂ are large
D: Edge
Large λ₁, small λ₂
(e)
(i) For image with the resolution of 2 Mega pixel , what is the maximum false positive rate in order to avoid having a false positive in every image?
Lecture 11
(ii) Could you briefly introduce how does Viola & Jones algorithm can achieve rapid face detection with small false positive rate?
Leture 11
Integral images for fast Haar-like feature evaluation
Boosting for feature selection
Attentional cascade for fast rejection of non-face windows
Q2
(a)
(i) What effect can be achieved by filtering images using the following filtering kernels.
Leture 2
(a) Shift left by 1-pixel
(b)Double the intensity of the center pixel
(c) Blur 1-pixel vertical edges
(ii) The 3*3 Median filtering result over the image below. Is median filtering linear?
Sorting the numbers:
The median (5th number):
Therefore, median filtering is NOT linear.
(b)
(i) Please design an algorithm to classify the different textures in the following images, and describe the algorithm steps in detail.
Leture 10
Bag of words Generation
Extract local descriptors (SIFT) from images
Apply K-means clustering to create visual words
Store cluster centroids as vocabulary
Feature Extraction
Divide each image into patches
Compute descriptors for each patch
Assign each descriptor to nearest visual word
Generate histogram of word frequencies
Classification
Use histogram as feature vector
Train SVM classifier to distinguish texture classes
This method can effectively differentiate between:
Regular dot pattern (left image)
Grid pattern with lines (middle image)
Hexagonal mesh pattern (right image)
(ii) Why do we say that step (i) is a manually designed feature extraction method, while deep convolutional neural networks are automatic feature extraction methods?
Manual Feature Extraction (like the algorithm described):
Requires human expertise to explicitly define what features are important
Requires explicit programming of mathematical formulas
Automatic Feature Extraction (CNNs):
Learns features directly from raw data through training
Features emerge from the network’s learning process without explicit programming
Q3
(a)
(i) List the steps involved in RANSAC for line fitting.
Leture 4
Steps
Repeat times:
Draw points uniformly at random
Fit line to these points
Find inliers to this line among the remaining points (i.e., points whose distance from the line is less than threshold)
lf there are or more inliers, accept the line and refit using all inliers
(ii) Given the outlier ratio to be , please choose the number of samples so that, with probability , at least one random sample is free from outliers.
Leture 4
(b) In object clasification task, what is the difference between the discriminative methods and the generative methods from the statistical viewpoint?
Leture 10
Discriminative methods model posterior
Generative methods mode likelihood and prior
(c) The figure below shows an image of a checkboard on the left and its corresponding Hough space on the right. Demonstrate your understanding of the Hough transform by explaining how the geometric structures in the input image are manifested in the resultant Hough space.
The polar representation of the line is shown as below.
Therefore, the vertical lines in checkboard will be represented by intersections whose ,and the difference of ρbetween adjacent intersections equals to the width of the grid in the checkboard.
Similarly, there are horizontal lines that will be transformed to intersections whose , and the difference of ρbetween adjacent intersections equals to the height of the grid in the checkboard.
2023-CS410FZ -January
Q1
(a) Provide details of the convolution operation including a mathematical expression for the operation in the 2D discrete domain of a grayscale image. Provide two examples of convolution-based approaches to noise removal in images. Compare and contrast both approaches.
Leture 2
Median Filter: A sliding window approach, where for each pixel, the median value of the neighboring pixels is taken.
Smooths the image and reduces noise but can blur edges.
Gaussian Filter: A Gaussian kernel is used to smooth the image and reduce noise by averaging pixel values with a Gaussian distribution.
More robust to salt-and-pepper noise and preserves edges better than Gaussian blur.
(b) The image neighborhood below shows a single white pixel surrounded by a set of black pixels. Demonstrate your understanding of the convolution operation by computing the output of mean filtering the image with a mean filter kernel.
Leture 2
Note: in your answer you should only show the result for the central 3 pixels of the middle row (i.e. as highlighted in bold)
For each pixel, take the sum of the product of the kernel and the image pixels in the 3x3 neighborhood.
After applying the mean filter, the center pixel of the image will be replaced by the computed average value .
(c) List and explain the four steps involved in the Canny edge detector. In terms of edge detection, what trade-off does the detector try to optimize?
Leture 3
Steps
Gaussian filter smoothing to reduce noise in the image
Magnitude and orientation of gradient calculation to detect intensity changes, indicating potential edges.
Non-maximum suppression to thin the edges to a single pixel width
Linking and hysteresis thresholding to finalize the edges by connecting weak edges to strong ones.
Trade-off
The detector strives to detect true edges while minimizing false edges (noise), and at the same time, ensuring that the edges are well-localized and not overly blurred.
This balance is controlled through the thresholding and smoothing parameters.
Q2
(a) With the aid of diagrams or pseudo-code, explain how the Hough transform can be used to detect lines in images.
(b) With reference to part (a), comment on how the gradient orientation can be used to improve the efficiency of Hough transform. In your answer you should detail the changes required to the pseudo-code to take advantage of the gradient orientation.
Rather than iterating over all possible angles ( to degrees), restrict the range of based on the gradient direction. This reduces the number of computations needed in the Hough Transform.
(c) Given the following image of a skewed checkerboard, select the corresponding Hough space from the 4 Hough spaces shown in the table that follows. Provide an explanation justifying your choice.
I would select option as the corresponding Hough space.
The checkerboard image exhibits a perspective-skewed pattern where the parallel lines appear to converge towards a vanishingpoint. This perspective distortion causes the lines to project as curves in the Hough space.
As a result, we would expect a more spread-out and asymmetric distribution of these curves. The parallel lines will map to curves in Hough space that intersect at a common point, representing the vanishing point where these lines seem to meet.
Q3
(a) Detail the main steps involved in the detection and extraction of SIFT features in an image.
Scale-space extrema detection using DoG at multiple scales.
Keypoint localization for discarding weak keypoints.
Orientation assignment for ensuring rotation invariance.
Keypoint descriptor creation by sampling the gradients of pixels around the keypoint in a local neighborhood.
Matching keypoints from different images.
(b) What is the difference between SIFT and Harris features?
SIFT
Uses the second moment matrix to compute corner response.
Designed for detecting keypoints that are invariant to scale, rotation, and affine transformations, making it suitable for matching features across images with different scales and orientations.
Harris
Uses a difference of Gaussian (DoG) to detect keypoints at multiple scales, then assigns a descriptor to each keypoint based on its local image patch.
Focuses on detecting corners by looking for regions where the gradient changes significantly in multiple directions. It is not invariant to scale or rotation.
(c) Given different types of image patches, like Linear Edge, Flat, Corner:
The following pictures (a), (b), (c) are different distributions of and , where and represent the intensity derivatives along and dimensions. Classify and choose the correspondances of (a), (b) and (c) to linear edge, flat, and corner, respectively.
(a) Flat
similar, small derivatives in all directions
(b) Linear Edge
strong gradients in -direction, weak in -direction