Visual Odometry

namespace: odometry

The visual odometer here specifically refers to the RGBD visual odometer. The function of the visual odometer is to estimate the camera movement between two frames, which is the first step of SLAM. Two visual odometries are implemented in OnePiece, which are widely used: Sparse matching based on feature points, and dense matching based on optical flow. The usage is very simple:

std::shared_ptr<SparseTrackingResult> SparseTracking(const cv::Mat &source_color, const cv::Mat &target_color, 
const cv::Mat &source_depth, const cv::Mat &target_depth);


std::shared_ptr<SparseTrackingResult> SparseTracking(geometry::RGBDFrame &source_frame, 
geometry::RGBDFrame &target_frame);


std::shared_ptr<SparseTrackingResult> SparseTrackingMILD(const cv::Mat &source_color, const cv::Mat &target_color, 
const cv::Mat &source_depth, const cv::Mat &target_depth);

std::shared_ptr<SparseTrackingResult> SparseTrackingMILD(geometry::RGBDFrame &source_frame, 
geometry::RGBDFrame &target_frame);

std::shared_ptr<DenseTrackingResult> DenseTracking(const cv::Mat &source_color, const cv::Mat &target_color, 
const cv::Mat &source_depth, const cv::Mat &target_depth, 
const geometry::TransformationMatrix &initial_T, int term_type = 0);

std::shared_ptr<DenseTrackingResult> DenseTracking(geometry::RGBDFrame &source_frame, 
geometry::RGBDFrame &target_frame, 
const geometry::TransformationMatrix &initial_T, int term_type = 0);

You can choose to use a pair of rgb and depth to tracking, or a pair of geometry::RGBDFrame. The result retunred is a shared_ptr, which pointer an object ofodometry::SparseTrackingResult or odometry::DenseTrackingResult. They are defined as following:

class SparseTrackingResult
{
    public:
    //relative pose
    geometry::TransformationMatrix T;
    //indices of inliers
    geometry::FMatchSet correspondence_set_index;
    //nliers
    geometry::PointCorrespondenceSet correspondence_set;
    //rmse
    double rmse = 1e6;
    //if the tracking is successful
    bool tracking_success;
};
class DenseTrackingResult
{
    public:
    //relative pose
    geometry::TransformationMatrix T;
    //indices of inliers
    geometry::PixelCorrespondenceSet pixel_correspondence_set;
    //inliers
    geometry::PointCorrespondenceSet correspondence_set;
    //rmse
    double rmse = 1e6;
    //if the tracking is successful
    bool tracking_success;
};

The main difference is that the ID types of the matching points finally obtained by the two are different. The sparse tracking feature point ID is one-dimensional. For example, a pair of correctly matched feature points is \((501, 308)\), which refers to the 501st feature point of source frame and the 308th feature point of target frame. The ID of the point for dense tracking is two-dimensional. The result is for example: \([(501, 308), (489, 311)]\), which refers to a pair of 3D points of source frame and target frame in pixel \((501, 308)\) and \((489, 311)\). Of course, the results of dense tracking can be integrated into the same form as the results of sparse tracking, but OnePiece believes that this format is closer to the essence of the algorithm, because it reveals a piece of information: sparse tracking is based on the extracted feature points, while dense tracking is pixel by pixel.

SparseTracking

The basic steps of SparseTracking are as follows: - Extract feature points - Feature point matching - Outlier filtering - Compute the transformation matrix

OnePiece extract ORB feature, which is a fast binary feature. For feature matching, OnePiece introduce two methods: BFMatcher(brutal fource) based on OpenCV, and Sparse Match introduced in MILD(Open Source Library). Refer to Beyond SIFT Using Binary features in Loop Closure Detection for details, which is downloaded in relative_paper/sparse_match.pdf. This method is packaged in class SparseMatcher, the main member functions are listed as following:

//Match
void Match(const geometry::Descriptor &source_descriptors, const geometry::Descriptor &target_descriptors, 
    geometry::DMatchSet & matches);
//Refine the matching results after matching
//Need 3D points
void RefineMatches(const geometry::Descriptor &source_descriptors, const geometry::Point3List &source_local_points,
    const geometry::KeyPointSet &source_keypoints,  const geometry::KeyPointSet &target_keypoints, 
    const geometry::TransformationMatrix & H,geometry::DMatchSet & matches);

There are also several methods for outlier filtering, odometry::outlier_filter provides several common methods.

//Delete matches with a distance greater than n times the minimum distance (Naive)
void MinDistance(const geometry::KeyPointSet &source_keypoints,
    const geometry::KeyPointSet &target_keypoints,geometry::DMatchSet & matches);
//Only keep the matches with a larger ratio between the sub-optimal distance and the optimal distance (the optimal result of a correct match should be much better than the sub-optimal result)
void KnnMatch(const geometry::Descriptor &source, 
    geometry::Descriptor & target, const cv::BFMatcher &matcher, 
    std::vector< cv::DMatch > &matches, float minRatio = 1.5);
//Use ransic to find the Homography matrix, keep the inliers
void Ransac(const geometry::KeyPointSet &source_keypoints, 
    const geometry::KeyPointSet &target_keypoints,geometry::DMatchSet & matches);
//For the currently checked matching pair, randomly select several other matches to see if they are consistent
//Wrong matching is difficult to find other consistent matching pairs
void RanSaPC(const geometry::Point3List &source_local_points, 
    const geometry::Point3List &target_local_points, std::default_random_engine &engine, std::vector< cv::DMatch > &init_matches);

OnePiece names the tracking function using MILD as SparseTrackingMILD. Generally speaking, it has better performance than SparseTracking and can get more The right inliers.

DenseTracking

For details of DenseTracking, refer to Real-Time Visual Odometry from Dense RGB-D Images, you can also find the paper from related_paper/steinbruecker_sturm_cremers_iccv11.pdf.

The above paper is only for tracking the intensity of the color map, that is, the optical flow method. However it is easy to extend to track the depth of the depth map. In OnePiece, you can choose to use color or depth term, or hybrid terms by changing the last parameter term_type :

if(term_type == 0 )//hybrid
std::cout<<BLUE<<"[DEBUG]::Using hybrid term"<<RESET<<std::endl;
else if(term_type == 1)//color
std::cout<<BLUE<<"[DEBUG]::Using photo term"<<RESET<<std::endl;
else if(term_type == 2)//depth
std::cout<<BLUE<<"[DEBUG]::Using geometry term"<<RESET<<std::endl;

Steps of DenseTracking: - Calculate the corresponding point in target from for each pixel of the source frame according to the current pose - Find all inliers of corresponding points - According to these points, compute the Jacobian matrix for inliers and update the pose - Repeat the above process

It can be seen that DenseTracking requires an initial pose. If there is no initial estimation, it is necessary to assume that the relative poses of the two frames are very close. This process is a bit like PointToPoint's ICP, the difference is that the optimized objective function, so PointToPoint needs to find the closest point instead of using reprojection. The process of finding the closest point is time-consuming, so dense tracking is much faster than ICP.

In the specific implementation, the image pyramid will be used to improve the efficiency.

Comparison

SparseTracking has fewer feature points, is faster, and does not require initial pose. However, it only considers visual information, and cannot be used in some places with less texture or poor color map. DenseTracking is slower and requires an initial pose. But it is more robust, and can be used even in the case of only depth or only color map. It is a good backup solution when SparseTracking fails.

RGBDFrame

It can be seen that there are two functions for each type of tracking: they act directly on rgb and depth or act on RGBDFrame:

std::shared_ptr<SparseTrackingResult> SparseTracking(const cv::Mat &source_color, const cv::Mat &target_color, 
const cv::Mat &source_depth, const cv::Mat &target_depth);
std::shared_ptr<SparseTrackingResult> SparseTracking(geometry::RGBDFrame &source_frame, 
geometry::RGBDFrame &target_frame);

The difference is that geometry::RGBDFrame will save intermediate information, which can avoid repeated calculation when building a SLAM system:

//part of RGBDFrame
/*for sparse odometry*/
geometry::Descriptor descriptor;
geometry::KeyPointSet keypoints;
PointCloud feature_pcd;//for optimization


/*for dense odometry*/
cv::Mat gray;//gray image of rgb
cv::Mat depth32f;//refined depth
std::vector<cv::Mat> color_pyramid;
std::vector<cv::Mat> depth_pyramid;
std::vector<cv::Mat> color_dx_pyramid;
std::vector<cv::Mat> color_dy_pyramid;
std::vector<cv::Mat> depth_dx_pyramid;
std::vector<cv::Mat> depth_dy_pyramid;
std::vector<geometry::ImageXYZ> image_xyz;

std::vector<geometry::ImageXYZ> image_xyz can be considered as 3D image with a correspondig 3D point for each pixel. We can use 2D pixel coordinate to find the point, which is the reason why the corresponding point ID in DenseTrackingResult is two-dimensional.