3D Fixings Pose Estimation via Local 3D Feature Matching
Published on February 09, 2022

In industrial and manufacturing contexts, fixings and locators play a critical role in assembly, positioning, and quality control. Accurately estimating their 3D pose — both position and orientation — is essential for automation, inspection, and digital twin workflows.
I developed a 3D pose estimation solution capable of estimating the pose of more than 150 different fixing types, each defined by a small but unique 3D geometry. Rather than relying on classification or 2D appearance, the solution is based on pure 3D geometric matching.
This article describes the full pipeline, from mesh preprocessing to robust 3D pose estimation.
1. Problem Overview: Why 3D Matching
Each fixing type:
- has a distinct 3D shape,
- may appear in arbitrary orientation,
- must be localized precisely in 3D space.
Because geometry is the most discriminative signal, the problem is naturally addressed using 3D local feature matching, rather than image-based or template-based methods.
The objective is to estimate:
- 3D location (x, y, z),
- 3D orientation (rotation),
of a fixing relative to a reference part.
2. Input Representation: STL Meshes
The input to the system consists of:
- an STL mesh of the fixing,
- an STL mesh of the target part.
Meshes are well suited for CAD and manufacturing pipelines, but directly operating on mesh vertices is often suboptimal due to irregular sampling density.
To enable robust local feature computation, the first step is to convert meshes into dense point clouds.
3. Mesh Sampling to Dense Point Clouds

Each STL mesh is converted into a point cloud through surface sampling.
Key characteristics:
- Sampling density is controlled via a leaf size parameter.
- Smaller leaf size → denser point cloud.
- Larger leaf size → faster computation but less detail.
This step produces a uniform, dense point cloud that better captures surface geometry than sparse mesh vertices alone.
4. Normal Estimation from Mesh Geometry

Local surface orientation is a critical signal for 3D matching.
Instead of estimating normals from noisy point neighborhoods, I leverage:
- the original mesh face normals,
- propagated to the sampled dense points.
Each dense point is assigned a normal based on the underlying mesh surface it originates from.
This results in:
- stable normals,
- consistent orientation,
- improved descriptor quality.
5. Keypoint Selection Strategy

From the dense point cloud, a subset of points is selected as keypoints.
For simplicity and robustness, I chose:
- mesh vertices as keypoints.
This choice is motivated by the fact that:
- mesh vertices capture geometric discontinuities,
- edges, corners, and characteristic shapes are preserved,
- keypoints naturally align with CAD design intent.
This avoids heuristic keypoint detectors and ensures consistent keypoint placement across fixings.
6. Local 3D Descriptor: SHOT
To describe the local geometry around each keypoint, I use the
Signature of Histograms of Orientations (SHOT) descriptor.
Why SHOT?
SHOT descriptors are:
- local,
- rotation-aware,
- robust to noise,
- well suited for surface-based geometry.
Rather than relying on raw 3D coordinates, SHOT encodes:
- normal orientation relationships between a keypoint and its neighborhood.
Normals are more representative of local surface structure than absolute point positions.
How SHOT Works (Conceptually)

For each keypoint:
- A local neighborhood is defined (sphere).
- The deviation angle between the keypoint normal and each neighbor’s normal is computed.
- These deviations are accumulated into a histogram.
In this implementation:
- the descriptor uses 32 histogram bins,
- encoding the distribution of surface orientations around the keypoint’s sphere neighberhood.
This produces a compact yet expressive description of local geometry.
7. Local Feature Matching
Once descriptors are computed for:
- the fixing,
- and the target part,
the next step is descriptor matching.
To efficiently compute correspondences, I use:
- a KD-Tree,
- accelerated with FLANN (Fast Library for Approximate Nearest Neighbors).
This step identifies potential keypoint correspondences between fixing and part based on descriptor similarity.
At this stage:
- many matches are incorrect,
- outliers are expected.
Robust filtering is therefore essential.
8. Correspondence Grouping and Outlier Rejection

To identify valid fixing instances, I use a Hough Transform–based grouping strategy.
Hough-Based Voting
Each local correspondence:
- casts a vote in a 3D Hough space,
- corresponding to a potential reference point and orientation.
Correct correspondences:
- vote consistently,
- form clusters in Hough space.
Incorrect matches:
- vote randomly,
- do not form coherent clusters.
This process:
- groups compatible correspondences,
- rejects outliers,
- isolates valid fixing instances.
9. 3D Pose Estimation
Once a consistent set of correspondences is identified, the 3D pose of the fixing can be computed.
Using the validated correspondences:
- a rigid transformation is estimated,
- yielding translation and rotation.
The output is:
- the 3D position of the fixing,
- the 3D orientation relative to the part.
This completes the pose estimation pipeline.
10. Why This Approach Scales
This solution scales effectively because:
- it relies on geometry, not appearance,
- new fixing types only require their STL mesh,
- no retraining or data annotation is needed,
- it generalizes across orientations and placements.
By combining:
- dense surface sampling,
- normal-based descriptors,
- robust correspondence grouping,
the system achieves reliable pose estimation across a large and diverse set of fixing geometries.
Closing Thoughts
This work demonstrates that classical 3D geometric methods, when carefully engineered, remain extremely powerful for industrial perception problems.
By grounding the solution in:
- CAD geometry,
- local surface descriptors,
- and robust spatial voting,
I built a scalable, explainable, and production-ready 3D pose estimation system for fixings and locators.
In many industrial contexts, geometry is the signal — and exploiting it directly leads to robust solutions.