Comparing Seven Methodologies for Rigid Alignment of Point Clouds with Focus on Frame-to-Frame Registration in Depth Sequences

—Pairwise rigid registration aims to ﬁnd the rigid transformation that best registers two surfaces represented by point clouds. This work presents a comparison between seven algorithms, with different strategies to tackle rigid registration tasks. We focus on the frame-to-frame problem, in which the point clouds are extracted from a video sequence with depth information generating partial overlapping 3 D data. We use both point clouds and RGB-D video streams in the experimental results. The former is considered under different viewpoints with the addition of a case-study simulating missing data. Since the ground truth rotation is provided, we discuss four different metrics to measure the rotation error in this case. Among the seven considered techniques, the Sparse ICP and Sparse ICP-CTSF outperform the other ﬁve ones in the point cloud registration experiments without considering incomplete data. However, the evaluation facing missing data indicates sensitivity for these methods against this problem and favors ICP-CTSF in such situations. In the tests with video sequences, the depth information is segmented in the ﬁrst step, to get the target region. Next, the registration algorithms are applied and the average root mean squared error, rotation and translation errors are computed. Besides, we analyze the robustness of the algorithms against spatial and temporal sampling rates. We conclude from the experiments using a depth video sequences that ICP-CTSF is the best technique for frame-to-frame registration.


I. INTRODUCTION
Surface registration is a common computer vision problem, with applications in computer graphics, robotics, quality inspection, photogrammetry, augmented reality, pose estimation, among others [1]. Rigid registration is a sub-problem, dealing only with sets that differ by a rigid motion. In this problem, given two point clouds, named source set P = {p i |p i = (p ix , p iy , p iz )} and target set Q = {q j |q j = (q jx , q jy , q jz )}, we need to find a motion transformation ψ, composed by a rotation R and a translation t, that applied to P best aligns both clouds (ψ(P ) ≈ Q), according to a distance metric.
The classical and most cited algorithm in the literature to rigid registration is the Iterative Closest Point (ICP) [2]. This algorithm takes as input the point clouds P and Q, and consists of the iteration of two major steps: matching between the point clouds and transformation estimation. The matching searches the closest point in P for every point in Q. This set of correspondences is used to estimate a rigid transformation. These two steps are iterated until a termination criterion is satisfied.
Although simple in concept, ICP assumes that there is a correct correspondence between the points of both clouds. This assumption easily fails on real applications because, in general, the acquired data is noisy and we need to scan the object from multiple directions, due to self-occlusion as well as limited sensor range, producing only partially overlapped point clouds. Another issue of ICP and some variants is that they expect that the point clouds are already coarsely aligned.
The mentioned issues have been more or less addressed by more recent methods that have been produced by researchers in computer graphics, computational geometry and computer vision communities, as we can see in related surveys [3]- [5]. The large variety of such techniques poses a problem to decide a specific technique for a specific application.
In this paper, our goal is to compare the convergence characteristics of surface registration methods in the frame-toframe problem, where the frames are obtained from a video stream of range images. The video sequence is processed to extract the point clouds that represent sample sets of the target object surface. The objective is to register point clouds in consecutive frames. In this application, we observe the following problems: partial overlapping point clouds, noise, outliers, scale variation, and missing data.
In order to limit the scope of the problem, and avoid a combinatorial explosion in the number of possibilities to test, we focus on rigid transformation techniques that fulfill at least one of the following requirements: (a) Incorporate local geometric features to enhance the quality of the matching step; (b) Estimate the transformation using a distance different from the Euclidean one; (c) Perform registration without correspondence.
The former is motivated by the fact that ICP, and many other registration techniques, use just the criterion of minimizing point-to-point Euclidean distances between the sets P and Q to compute the matching between the point clouds. This approach might not be efficient in cases of partial overlapping, because only a subset of each point cloud has a correct correspondent instead of all the points. We are also supposing that the video is acquired by simply waving the capture device at the scene following smooth and slow motion paths. Therefore we can discard scale changes when registering two consecutive frames, since they should be very small, which justifies only contemplate rigid transformations. Moreover, the characteristics of the solution to the rigid registration problem depends on the used notion of distance in the environment space. Usually, registration techniques apply the Euclidean distance that is derived from the L 2 norm. However, when using the L 2 norm, we get an optimization problem in the least-squares sense, imposing a fundamental assumption that the error residuals assume a normal distribution, where inliers are typical events whereas outliers rarely happen. Another paradigm, that motivates requirement (b), would be to use a norm that maximizes the number of zero distances between correspondences. Besides, the requirement (c) comes because we would like to test a method that attempts to align the given two point sets without establishing the explicit point correspondence. A trick in this case is to model each of the two point sets by a probability distribution, in order to get a procedure less sensitive to missing correspondences and outliers. Obviously, we must consider the ICP in order to obtain a relative measure about how efficient each chosen methodology is against the difficulties of the frame-to-frame registration problem.
To evaluate each algorithm in the target application, we firstly consider point clouds acquired through a Cyberware 3030 MS scanner [11] available in the Stanford 3D scanning repository [12]. The Bunny model was chosen for the tests and the corresponding point clouds captured considering four viewpoints of it. In this case, the ground truth rotation is available and, as a consequence, we could evaluate four different metrics to measure the rotation error (Section IV). Results show better performance for Sparse ICP and Sparse ICP CTSF in these experiments in the inner product of unit quaternions metric. Besides the original data, a case-study is generated to simulate missing data. Visual results are shown in order to link the error measurements with the results of the methods on the chosen examples. However, when simulating missing data with the Bunny model, we notice a decrease in the registration precision of Sparse ICP and Sparse ICP CTSF. In this case, the ICP-CTSF obtain outstanding results. Moreover, we present the CPU time spent on the executions, in order to highlight the computational complexity of each technique.
Next, we evaluate the alignment techniques for frame-toframe registration using three video sequences with depth information. We perform the segmentation of each frame of the sequence through a simple depth threshold operation. The obtained result generates a point cloud which we must register with the previous one. We use the average root mean squared error, average rotation and translation errors as measures to analyze the results. The tests show that ICP-CTSF is more reliable for this application.
This work is an extended version of the material published in [13]. In the current version we have improved the introduction and we add a related works section. Besides, the Section III (Registration Algorithms) is augmented with a description of each target technique to make the material self-contained. Also, in Section III, we offer details about tensor elements behind ICT-CTSF and SWC-ICP, with a complete derivation of the latter based on fundamental results in point clouds registration in R 3 . In the Experimental Results (Section V) we include one more case in the point clouds experiments, incorporate more details about the CPU time and the influence of trimming parameter. We substitute the scenario generated using noise and outliers used in [13] to new example involving missing points. With this, we can complete the results presented in [13] having tested the registration techniques against noise, outliers and missing data, that are common problems in frame-to-frame registration. Moreover, we have added new experiments to evaluate the techniques using two benchmark videos of the database available in the web site [14], that are accompanied with the ground truth for the rigid registration. Differently from the work [13], which was not conclusive in this point, the frame-to-frame registration results presented in Sections V-B and V-C, show that ICP-CTSF is the best method to register point clouds extracted from depth sequences.
The remainder of this paper is organized as follows. The Section II describes related works dealing with comparisons and qualitative analysis of rigid surface registration methods. Then, in Section III, we summarize the considered methods. Next, Section IV describes four different metrics to measure the rotation error. The Section V shows the experimental results obtained by applying the registration methods to point clouds and to depth video sequence. Section VI presents the conclusions and future researches.

II. RELATED WORKS
The survey of Sabata and Aggarwal [15] was one of the first works to list methods to compute 3D rigid motions between two sets, whether they are points, lines or surfaces. Points are the most common representation drawing attention from most papers of the rigid registration literature. They also classify the solution found by the methods in iterative or closed form. However, the listed methods are not compared. Eggert et al. [16] compare quantitatively four closed solution for estimating rigid transformations using controlled synthetic experiments: singular value decomposition [17], unit quaternion [18], dual quaternion [19], and orthonormal matrices [20]. No significant differences were observed in the accuracy and robustness of the algorithms for non-degenerate 3-D point sets with various levels of noise. In terms of stability, for non-degenerate cases, the unit quaternions and singular value decomposition methods were superior than the other methods, with the latter marginally more stable than the former. Some variants of the ICP were surveyed by Rusinkiewicz and Levoy [21], that classified them in six stages where optimizations could be made: selection of points, matching, weighting correspondences, rejection of pairs, error metric and minimization of error metric. They compare the variants regarding the RMS error, number of iterations and the time until correct convergence, in order to propose a high-speed ICP, using the best strategy in each stage, to address real time registration.
Dalley and Flynn [22] presented a quantitative analysis of two methods to reject pairs of matched points, on partially overlapping range images. In these cases, there is an expected number of points without homologous correspondence, justifying the need of such methods.
Salvi et al. [3] proposed a classification of methods in fine registration and coarse registration. In fine registration, the methods try to find the most accurate solution as possible, refining an already computed initial guess. The latter is a class of algorithms that aim to find an initial estimation of the correct alignment between point sets. These methods tend to be more robust to noise once make no assumptions about the relative position of the point sets. However, in general, their solutions must be improved by a fine registration technique, that takes the coarse transformation as an initial estimation of the motion (a guess), and iterate until convergence to a more accurate solution. This way, new methodologies are generated through the combination of coarse and fine registration techniques, called coarse-to-fine schemes [7]. After reviewing some methods of each class, Salvi et al. [3] compare them measuring root mean squared error (RMS), rotation error, translation error and computational time.
Moreover, considering the specific point of rotation error, Huynh [23] presents a detailed analysis of six known functions for measuring distance between 3D rotations considering metric and group concepts (SO(3); the group of orthogonal matrices with determinant +1). The conclusions favor quaternions for 3D rotations representation. Besides, according to Besl and McKay [2], for two and three dimensions, the quaternionbased method is preferred, since reflections are not desired.
In this paper, we show how some recent approaches to rigid registration perform in frame-to-frame application cases. To the best of our knowledge, it is the first work to address this kind of comparison. Besides the chosen techniques, we must take into account other recent works that could be also used in the target application. In [24] it is described an algorithm, based on a probabilistic model, for joint registration of multiple point clouds (JR-MPC). The technique shares with the GMM (Gaussian Mixture Model) [9] the idea of using Gaussian mixtures to represent point sets. However, differently from GMM, the JR-MPC assumes that all the point sets are generated from the same Gaussian mixture model, that includes also an uniform distribution parameterized by the volume of the convex hull encompassing the clouds. In our application we have a video stream V with |V | frames, each one defining a point cloud in R 3 . The application of JR-MPC to jointly register these point sets is impractical. Besides, the assumption that such point clouds could be jointly registered could be false in such application due to scene changes along the frames.
Still in the scenario of probabilistic mixture models, the technique presented in [25] proposes a joint distribution associated to the observations that allow to incorporate color information associated with each 3D point. Despite of its theoretical generality, in practice this strategy cannot be directly employed for high dimensional 3D shape features due to complexity problems. Thus, in [26] the authors proposes an adaptation in the spirit of the bag-of-words paradigm in order to build a computationally efficient mixture model for the common joint distribution that originates the 3D points as well as the corresponding features. All these probabilistic mixture models suffer from both computational and memory cost issues for large point sets (tens of thousands or millions of points) due to the increase in the number of mixture components. The deterministic model [27] also associates RGB information and depth measurements through a four dimensional approach that allows to design an ICP version in RGB-D space without the computational complexity of mixture approaches.
Besides, in the case of cross-source point clouds, the performance of feature-based methods like [26] deteriorates due to the difficult to reliably extract similar features from point clouds acquired through different sensors. Such application motivates the CSGM technique [28], that applies a graph framework to organize and encode data information, which allows to convert the registration into a graph matching problem. In [28], the CSGM is also compared with ICP and JR-MPC for 3D data from the same kinds of sensor, outperforming the latter and achieving lower rate of error than JR-MPC in some tests.
In our work we avoid usual problems with RGB information (sensitivity against illumination conditions and shadows) by keeping only 3D data and shape features. We focus on point clouds acquired through a single sensor and apply shape features only to improve the match between point sets. Consequently, we consider only the methods already selected, which are reviewed in the next section.

III. REGISTRATION ALGORITHMS
We compare in this work seven different algorithms to frame-to-frame rigid registration: the classical ICP [2] and four variants (the ICP-CTSF [6], SWC-ICP [7], Sparse ICP [8], and Sparse ICP CTSF [6]), the Super 4PCS [10], and the GMM framework [9]. In this section, we aim to establish the necessary notation and the mathematical formulation behind these techniques.
Hence, the bold uppercase symbols represent tensor objects, such as T, S; the normal uppercase symbols represent matrices, data sets and subspaces (P , U , D, Σ, etc.); the bold lowercase symbols denote vectors (represented by column arrays) such as x, y. The normal lowercase symbols are used to represent functions as well as scalar numbers (f , ψ, λ, α, etc.). Also, given a matrix A ∈ R m×m and a set S, then tr (A) = A 11 + A 22 + . . . + A mm is the trace of A, and |S| means the number of elements of S. Besides, I m represents the m × m identity matrix.
Our focus is rigid registration in the frame-to-frame problem. So, let the source and target point clouds in R m be represented, respectively, by P = {p 1 , p 2 , . . . , p n P } ⊂ R m and Q = q 1 , q 2 , . . . , q n Q ⊂ R m . A rigid transformation ψ : R m → R m is given by: with R ∈ SO(m) and t ∈ R m being the rotation matrix and translation vector, respectively. The registration problem aims at finding a rigid transformation ψ : R m → R m that brings set P as close as possible to set Q in terms of a designated set distance, computed using a suitable metric d : R m ×R m → R + , usually the Euclidean one denoted by d (p, q) = p − q 2 . To solve this task, the first step is to compute the matching relation C (P, Q) ⊂ P × Q that denotes the set of all correspondence pairs to be used as input in the procedure to compute the transformation ψ. Formally, we consider: where P × Q denotes the Cartesian product between sets P and Q. We can check that |C (P, Q) | = |Q|. However, in the remaining text we say that |C (P, Q) | = c to simplify the expressions. Moreover, in the focused application only partial matches are expected in general. Therefore, it is desirable a trimmed approach that discards a percentage of the worst matches [29]. So, we sort the pairs of the set C (P, Q) such that d (x i1 , y i1 ) ≤ d (x i2 , y i2 ) ≤ · · · ≤ d (x ic , y ic ) and consider a trimming parameter 0 ≤ τ ≤ 1 and the new correspondence relation: which is supposed to have |C 1 (P, Q, τ ) | = n. We must notice that C 1 (P, Q, τ ) = C (P, Q) if τ = 0. The relationship defined by the expression (3) is based on the distance function and nearest neighbor computation. We could also consider shape descriptors computed over each point cloud. Generally speaking, given a point cloud S, the shape descriptors can be formulated as a function f : S → P (R), where P (R) is the set of all subsets of R, named the power set of R. In this case, besides the distance criterion, we can also include shape information in the correspondence computation by applying a boolean correspondence function f c : P × Q → {0, 1} such that [5]: Also, before building C (P, Q) in expression (2) we could perform a down-sampling in the two point sets, based on the selection of key points through the shape function, or through a naive interlaced sampling over same spatial data structure [30].

A. Iterative Closest Point
The classical ICP [2], described in the Algorithm 1, receives the source P and target Q point clouds and each iteration of the main loop is composed by two major steps: matching between the point clouds and transformation estimation. The former is performed by computing the set C 1 (P s+1 , Q, τ ) through equation (2). At the end of the matching process we get a base of the set P , denoted by X = {x 1 , x 2 , . . . , x n } ⊂ P , and a base of the set Q, denoted by Y = {y 1 , y 2 , . . . , y n } ⊂ Q such that C 1 (P, Q, τ ) stands for the set of n correspondence pairs (x i , y i ) ∈ X × Y . This matching relation will be used to estimate a rigid transformation that aligns the point clouds P and Q. Specifically, ICP seeks for a rotation matrix R and a translation t that minimizes the mean squared distance: which is used as a measure of the distance between the target set Q and the transformed source point cloud ψ (P ) = {ψ (p 1 ) , ψ (p 2 ) , . . . , ψ (p n )}, with ψ defined by equation (1). Now, we focus in the specific three-dimensional case (m = 3) and state the fundamental theorem that steers most of the solutions for the registration problem in R 3 . Theorem 1: Let X = {x 1 , x 2 , . . . , x n } ⊂ R 3 and Y = {y 1 , y 2 , . . . , y n } ⊂ R 3 , the centers of mass µ x , µ y for the respective point sets X and Y , the cross-covariance Σ xy , and the matrices A and M , given by: Hence, the optimum rotation R and translation t vector that minimizes the error in expression (5) are determined uniquely as follows [18]. The matrix R is computed through the unit T of M , corresponding to its maximum eigenvalue: and t is calculated through R and centroids in expressions (6)-(7) as: Based on the above theorem, in the second stage, the ICP estimates the rigid transformation by computing the rotation matrix and translation vector using equations (11) and (12). The matching and transformation estimation are repeated until the allowed maximum number of iterations is achieved or the error falls bellow a pre-defined threshold. The ICP technique is summarized in the Algorithm 1.

Algorithm 1: Iterative Closest Point
Apply the transformation to all points of the source: Compute the matching relation C 1 (P s+1 , Q, τ ) through expression (2). Compute the principal eigenvector v of the matrix M defined in (10). Calculate the rotation matrix R s+1 and translation vector t s+1 using expressions (11)- (12). Compute the error between the two point sets: The ICP-CTSF [6] implements a matching strategy using a feature invariant to rigid transformations, based on the shape of second-order orientation tensors associated to each point. A voting algorithm is used, divided into an isotropic and an anisotropic voting field. So, given a cloud point p ∈ P , let L k (p) ⊂ P be the set of k% nearest neighbor of p and s ∈ L k (p). We can define v ps = (s − p), v ps = v ps /||v ps || 2 , as well as the function: where s f is farthest neighbor of p, which has influence 0.01. Given these elements, we can compute the second-order tensor field: which is the isotropic voting field computed through a weighted sum of tensors v ps · v T ps , built from the function (13) and from the vote vectors v ps , s ∈ L k (p).
Let the orthonormal basis generated by the eigenvectors (e 1 (p) , e 2 (p) , e 3 (p)) of T (p) and the corresponding eigenvalues supposed to satisfy λ 3 (p) < λ 2 (p) ≤ λ 1 (p). In this case, the local geometry at the point p can be represented by the Figure 1 where we picture together the following elements: the coordinate system x, y, z oriented through the eigenvectors (e 1 (p) , e 2 (p) , e 3 (p)), the plane π that contains the point p, its neighbor s and the axis z. Moreover, Figure 1 shows the unique ellipse E ⊂ π that is tangent to the x, y plane in p, contains s, and is centered at a point in z. The vector ξ s , that is unitary, parallel to the plane π, and tangent to E at s, gives a way to build a different structuring element that enhances coplanar structures in the sense that the angle β ≈ 0 if s is close to the x, y plane. Specifically, if d e (p, s) is the length of the minor arc from p to s along the ellipse E in Figure 1, we define a new weighting function: where σ 2 (p) is calculated by expression (13), φ s is the angle between v ps = (s − p) and thex,ŷ plane, φ max constrains the influence of points misaligned to thex,ŷ plane, with 45 • an ideal choice, as a mid term between smoother results and robustness to outliers [31], [32].
With the above elements in mind, it is defined the tensor field S (p), that is composed by the weighted sum of the tensors built from the votes received on the point, with weights computed by expression (15) for all the points that have p as a neighbor: The tensor field in expression (16) can be seen as a shape function S : P → R 3×3 whose descriptors at a point p ∈ P are the eigenvalues λ S i (p) , i = 1, 2, 3. Therefore, given two points p, q such that p ∈ P and q ∈ Q, we compare the corresponding (local) geometries using the comparative tensor shape factor (CTSF), defined as: where S 1 : P → R 3×3 and S 2 : Q → R 3×3 are tensors computed following expression (16) and λ S1 i (p) and λ S2 i (q) are the ith eigenvalues calculated in the points p ∈ P and q ∈ Q, respectively.
The CTSF is used side by side with the Euclidean distance to produce a correspondence set that takes into account not only the nearest point (like in expression (2)) but also the shape information: where CT SF (p, q) is given by Equation (17), w m = w 0 b m , with b < 1, and 0 < w m < w 0 . The parameter w 0 is the initial weight given to the CTSF and b controls the update size of the weighting factor. To avoid numerical instabilities we set w m = 0, when w m ≈ 0. This weighting strategy is responsible for its coarse-to-fine behavior when inserted in the matching step of the ICP algorithm, given by expression (2). Specifically, the ICP-CTSF procedure (Algorithm 2) calculates the correspondence relation: and uses it to define the set: which is the correspondence set applied by the ICP-CTSF technique, which is summarized in the Algorithm 2.
Algorithm 2: ICP-CTSF Procedure Apply the transformation to all points of the source: Compute the matching relation C 3 (P s+1 , Q, τ, m) through expression (20). Compute the principal eigenvector v of the matrix M defined in (10). Calculate the matrix rotation matrix R s+1 and translation vector t s+1 using expressions (11)- (12). Compute the error between the two point sets:

C. SWC-ICP Technique
In this technique, besides the correspondence relation (2), we also use the correspondence set: which contains the pairs of points (s i , y i ) ∈ P × Q whose local shapes are the most similar, according to the CT SF criterion calculated by expression (17). In order to combine both correspondence sets, we firstly develop expression (8) to get: So, if we take expression (5) and perform the substitution: with ω n ∈ R, we can write the mean squared error (5) as: Also, by substituting the variable change (23) in expression (6) we get: and, consequently: where µ y is computed by equation (7). We shall notice that the matrix (26) combines the matching relations (2) and (21) being fundamental for the SWC-ICP described in [7]. According to the Theorem 1, the optimum rotation matrix R and translation vector t that minimizes the error in expression (24) are uniquely determined by equations (11)-(12) where corresponding to the maximum eigenvalue. However, the SWC-ICP methodology achieves a coarse-to-fine behavior through the use of the weighting strategy of the ICP-CTSF. The SWC-ICP technique can be summarized in the Algorithm 3.

D. Sparse ICP
The Sparse ICP [8] is formulated as recovering a rigid transformation that maximizes the number of null residuals z i = Rx i + t − y i , where R is the rotation matrix and t is a translation vector. The Sparse ICP uses L p norm, p ∈ [0, 1], to implement this idea. So, given the correspondence set C 1 (P, Q) in expression (2) and the residual vector z = [||z 1 || p 2 , ..., ||z n || p 2 ] T , the objective is to find a large set of inliers, ||z i || p 2 ≈ 0, and a small set of outliers, ||z i || p 2 >> 0. This can be written as: where represents a generic point in the residual space. This constrained problem can be solved using an augmented Lagrangian method, which uses the Lagrangian: with Lagrange multipliers Λ = {λ i ∈ R m , i = 1...n}, penalty weight > 0, and the restriction that R is a rotation matrix. Equation (28) is optimized using an alternating direction method of multipliers (ADMM). The Algorithm 4 summarizes the Sparse ICP procedure.
Algorithm 3: SWC-ICP Technique Compute the matching relations C CT SF (P j , Q) through expression (21).
repeat Apply the transformation to all points of the source: Compute the matching relation C 1 (P s+1 , Q, τ ) through expression (2). Build the covariance matrix from (26) using the shape correspondences (21) and the nearest neighbors (2). Compute the matrix M in expression (10) using (26). Compute the principal eigenvector v of the matrix M . Calculate the rotation matrix R s+1 and translation vector t s+1 using expressions (11)- (12). Compute the error between the two point sets: The Super 4PCS [10] is an improved version of the 4PCS [33] algorithm for global registration, or coarse registration according to Salvi [3]. Both methods follow the same idea of the RANSAC [34], [35], but instead of finding triplets of points, they search for all coplanar 4-points that are approximately congruent. The key property behind 4PCS is the fact that, given a set of coplanar points B = {p 1 , p 2 , p 3 , p 4 } ⊂ P , not all collinear, it is always possible to define two lines such that they cross at an intermediate point e, like in Figure 2.
The intersection point e can be computed considering the lines s (t 1 ) = p 1 +t 1 (p 2 − p 1 ) and s (t 2 ) = p 3 +t 2 (p 4 − p 3 ) and the solution of the linear system defined by the equation s (t 1 ) = s (t 2 ). If t 1 =t 1 and t 2 =t 2 are the obtained solutions, then e = p 1 +t 1 (p 2 − p 1 ) and e = p 3 +t 2 (p 4 − p 3 ) and, consequently, we can compute the two corresponding Algorithm 4: Sparse ICP Method Apply the transformation to all points of the source: Compute the matching relation C 1 (P s+1 , Q, τ ) through expression (2). Solve the problem in (27). Compute the error between the two point sets: ratios: t 2 ≡ r 2 = ||e−p3|| ||p4−p3|| , which are affine invariant because, given an affine trans- ) +t, and perform the same algebra behind the demonstration of expressions (29) to get: which proofs that the ratios r 1 and r 2 are invariants under affine transformations. Then, given the target Q, the main step of 4PCS algorithm is to extract the set U of all 4-points from Q that are approximately congruent to B, up to an approximation level δ. This search is performed by noticing that, for each pair of points q 1 , q 2 ∈ Q, two intermediate points are computed using the affine invariants (29): Whenever we have e 1 e 2 , for any two pairs of points, then probably {q 1 , q 2 } ⊂ Q belongs to a 4-points set that is an affine transformed copy of B. The set U defines a set T of rigid transformations ψ i (x) = R i x + t i that best aligns B with some 4-points set in U . The solution of the registration problem is a rigid transformation ψ ∈ T that brings set P as close as possible to set Q, in the sense defined by the Algorithm 5 that is found in [33].
forall the 4-points coplanar sets U i ∈ U do ψ i ← best rigid transformation that aligns B to U i in the least square sense (minimize (5)).
Although the results of the 4PCS are satisfactory, it has a quadratic time complexity, limiting its applicability. The Super 4PCS [10] solves two of the 4PCS main bottlenecks: finding all points in a given distance threshold in a point set, and removing the redundant 4-points that arise due to affine invariants. These two improvements reduce the time complexity to run in linear time, in the number of data points.

F. GMM Framework
All the previous techniques involve methods that align two point sets based on some procedure for establishing the explicit point set correspondence. The Gaussian mixture model framework (GMM) discards the matching step and thus may achieve more robustness against the missing correspondences and outliers. In this registration framework each input point set is represented using a Gaussian mixture model where the number of Gaussian components is the number of points [9]. Besides, the mean vectors of the components are given by the position of the points and all components share the same spherical covariance matrix. Formally, given a point set X = {x 1 , x 2 , . . . , x n X } ⊂ R m , the mixture of Gaussians used in GMM model is computed by: where: w = (w 1 , w 2 , . . . , w n X ) T is the vector of weights and Ω is the covariance matrix of the model. Without prior knowledge, all mixture components are weighted equally (w 1 = w 2 = . . . = w n X ) and Ω = diag σ σ . . . σ , with σ > 0 being the scale (or variance) of the model.
In this context, given source (P ) and target (Q) point clouds, the problem of point set registration is reformulated through the minimization of a statistical discrepancy measure between the corresponding mixtures, given by expression (31). In the GMM proposed in [9] authors apply L 2 distance for measuring similarity between two Gaussian mixtures G (x, Q, Ω, w) and G x, ψ(P ), RΩR T , w , representing the target Q and the transformed source ψ(P ) = {Rp 1 + t, Rp 2 + t, . . . , Rp n P + t}, respectively, where ψ is the rigid transformation in expression (1), defined by the rotation R and translation t. So: The equation (33) becomes a function f : R 2m → R + of the transformation parameters that can be grouped in a vector θ = (α 1 , α 2 , . . . , α m ; t 1 , t 2 , . . . , t m ) where α i , t i give the parametrization of the rotation matrix R and the translation vector t, respectively. Hence, the registration becomes an optimization problem, where the objective function is f (θ) = d (G (x, Q, Ω, w) , G x, ψ(P ), RΩR T , w ). In practice, this cost function can be expressed by a discrete Gauss transform [36], [37] and the minimization of f could be achieved through traditional gradient-based methods. However, there are no guaranties of convexity for f in the θ domain. To overcome this problem, it is recommended in [9] to start with a relatively large scale σ and a default initial setting of parameters in θ and then performing the numerical optimization to estimate the rigid transformation ψ. Then, we compute the correspondence set C(P, ψ(P )), defined by expression (2). If |C(P, ψ(P ))| is less than a threshold value repeat the process by randomly chosen another initialization for θ. Besides, a multiscale approach can be applied by decreasing the valuer of σ in a coarse to fine strategy. The optimization process stops until a sufficient number of correspondences are obtained. The Algorithm 6 describes the GMM procedure [9]. In this algorithm we follow the original GMM description [9] and summarize the optimization approach as an 'annealing step' .
begin Estimate and initial scale σ from the input point sets.
Specify an initial parameter θ, e.g., from the identity transform.
repeat Set up the objective function f , using expression (33).
Optimize the objective function f with θ as the initial parameter. Update the parameter θ ← arg min θ f . Decrease the scale σ accordingly to an annealing step. until Until some stopping criterion is satisfied; return The transformation parameter θ. end

IV. ROTATION ERROR METRICS
We use four different metrics to measure the rotation error when a ground truth rotation is provided. The metrics are based on the norm of difference between quaternions [38], inner product of unit quaternions [39], Euclidean distance between the Euler angles [40] and deviation from the identity matrix [41]. The paper from Huynh [23] provides more details and comparisons between them. In what follows, we restrict the discussions to rotations in 3D, that are represented by unit quaternions or matrices in SO(3).

A. Norm of the Difference between Quaternions
The first is obtained by using the norm of the difference between unit quaternions q 1 and q 2 representing the provided ground truth and the obtained rotation, respectively: with ||·|| 2 as the Euclidean norm and S 3 = {q ∈ R 4 | ||q|| 2 2 = 1}. We can show that 0 ≤ φ 1 ≤ √ 2 [23]. As pointed out by Huynh [23], φ 1 is a pseudo-metric in S 3 , since φ 1 (q, −q) = 0 ⇒ q = −q, but in SO (3), the group of 3D rotations, φ 1 is a metric.

B. Inner Product of Unit Quaternions
Similarly to the metric given by expression (33), the second metric also uses quaternions as follows: with q 1 and q 2 also representing the provided ground truth and the obtained rotation, respectively, and · is the inner product. Huynh [23] rewrites this function to be more computationally efficient as: This function is also a pseudo-metric in S 3 , but it is a metric in SO (3). Once we consider unit quaternions in expression (36) it is straightforward that φ 2 , φ 3 ∈ [0, 1].

D. Deviation from the Identity Matrix
In this case, ground truth and the computed rotations are represented by matrices R 1 , R 2 ∈ SO(3), respectively, and the metric function is calculated by [41]: where ||·|| F denotes the Frobenius norm of the matrix. We can prove that φ 5 is in fact a metric on SO(3) and that expression (39) gives values in the range 0, 2 √ 2 [23].

V. EXPERIMENTAL RESULTS
We evaluate the performance of the methods described on Section III using two different setups. In the first one we compare the methods using point clouds captured in a controlled scenario. Our model is the Bunny, from the Stanford 3D Scanning Repository [12]. We use four clouds given by the views from 0 • , 45 • , 90 • and 180 • , and align the consecutive pairs. All point clouds lie in a unit bounding box. Figure 3 shows the three cases used, where in black we picture the initial pose (source) and in red the target one. The size of the original clouds are larger than 40000 points, which makes their processing too computational involved. Therefore, we uniformly sample these point clouds, selecting one point at each 10 and discarding the others, in order to reduce the computational time of each method. Fig. 3: The three alignment cases tested in the first experiment.
The web documentation [12] offers the transformation to align the pairs of clouds shown in Figure 3. However, we have noticed that the precision of the translation vector is not suitable to perform a specific evaluation of the translation computed by the registration methods. Also, the models do not have a ground truth correspondence list. Hence, we firstly take the rotation given and compare it with the ones generated by the focused methods. Then, we take the 0 • Bunny model configuration and build a case-study for registration under missing data. Furthermore, for each method, we measure the computational time to calculate the alignment and the rotation error obtained using the metrics described on Section IV.
The second experiments are performed using video sequences with RGB-D information. The first case-study is frame sequence captured using a PrimeSense Carmine camera [43]. This video belongs to the Large Dataset of Object Scans [44], and the sequence used is the #03118, containing 1489 depth frames. The choice was based on how easy it was to segment the background. Figure 4 illustrates the sequence. The next tests are implemented using two videos, named 'freiburg2 xyz' and 'freiburg2 rpy', from database available in [14] that, differently from the sequence #03118, provides information for debugging translations and rotations, which motivates their choices. Kinect is the acquisition hardware and frames have resolution of 640 × 480 pixels, yielding a depth image with 307200 points. All the experiments were carried out using an Intel Core i7-4790 CPU with 16GB RAM.  showing the RGB frame and its respective depth data (source [44]). (b)-(e) RGB and depth fields for frame sequence 'freiburg2 xyz' (source [14]). (c)-(f) RGB and depth information for video 'freiburg2 rpy' (source [14]).

A. Point Cloud Registration
In this section we have the following aims: (a) Analyze the different rotation error metrics (Section IV) to decide the best one(s) for the frame-to-frame registration problem; (b) Use the best error metric to compare the performance of the registration techniques described in Section III. In these tests, we consider the following degrees-of-freedom: (1) Registration technique; (2) Percentage k of neighbors; (3) Error metric; (4) Trimming parameter τ . Moreover, we wish to compare techniques using RMS criterion in a controlled setup with missing data.
The other parameters, besides k and τ , are set as follows. The update size of the weighting factor used in the ICP-CTSF and SWC-ICP is b = 0.1, w 0 = 10 5 . It is an intermediate value that does not take too many updates, and neither finishes the update too soon, without proper exploration of the search space. The Super 4PCS was set with: δ = 0.005, terminate threshold 0.8, without filtering by angle, normals, distance or color. Also, no further sampling of the point cloud is performed. The Sparse ICP and Sparse ICP with CTSF were set with parameters the same parameters used in [13]. The GMM setup follows the default values of the GMM implementation [45].
The SWC-ICP, ICP-CTSF and the Sparse ICP CTSF use tensors to match points through the computation of the C CT SF relation given by expression (21). In these cases, we can evaluate the CTSF criterion using the isotropic voting field T or the anisotropic voting tensor S. According to [7] better results have been obtained by applying the former in the SWC-ICP. However, the ICP-CTSF and Sparse ICP CTSF use the S field to compute the C CT SF correspondence set [6].
To perform the task (a) we choose a pair of consecutive viewpoints, compute the error for each registration method using all the available metrics and visually compare the best alignment obtained according to each error metric. The best error metric is considered as the one which assigns the minimum error to the best visual alignment.
The visual inspection of the point clouds in Figure 3 indicates that the case pictured in Figure 3a is suitable as a case-study for the task (a) because, differently from Figures  3b and 3c, it is the easiest one with a large overlapping region and no discontinuities.
So, considering the degrees-of-freedom listed above, we set the trimming parameter τ = 0 (no trimming) and compute the error metric for each registration technique using k = 1%, 5%, 10%, 25%, 50%, 75%, 100%. In order to allow a fair comparison between the metrics we report the relative error, obtained by dividing the absolute error by the maximum value in the range of the focused rotation metric IV. We shall notice that φ 3 ∈ [0, 1], consequently the absolute and relative errors are the same in this case. Table I shows the minimum relative error according to each metric for 0 • −45 • . The Sparse ICP gives the smaller rotation error when considering all the metrics except φ 4 which achieves the minimum value for the Sparse ICP CTSF with k = 25%. In special, the smallest error in Table I is obtained by the Sparse ICP with value almost null, given by 9.0 × 10 −9 . Figure 5 shows the absolute error obtained for the tests in the case 0 • − 45 • , excluding k = 1% and k = 100% because they do not offer best results and sometimes they generate too large errors bringing scale problems in the visualization of smaller bars. The Sparse ICP and Sparse ICP CTSF algorithms presented the smaller errors, which agree with the results reported in Table I.
The visualization of Figures 5a-5d indicates that the ICP, ICP-CTSF and SWC-ICP achieve the second place in terms of rotation errors. Table II reports the minimum and maximum errors for these methods, according to each metric of Section IV. We can notice that, when considering the change 0 • − 45 • , the variation of the parameter k did not influenced in the rotation error measured.
In order to check the results reported in Table I and Figure  5a-5d we show in Figures 6a-6b the overlapping of the source cloud (0 • view) and the target set (45 • view) after the application of the best transformations obtained. The visual inspection of Figure 6 agrees with the fact that Sparse ICP and Sparse ICP CTSF with k = 25% offer suitable alignments. However, the visualization is not precise enough to decide the best one. However, the Sparse ICP errors were the smallest ones for three of the four considered metrics. Also, according to metric φ 3 , the error of the Sparse ICP is almost null. These observations indicate that Sparse ICP performs better than the  other techniques and favor the choice of the φ 3 to measure the rotation error. Figure 7 shows the errors obtained in the next experiments,  Fig. 6: Visualization of the best cases reported in Table I for the case 45 • -90 • . Differently from the case 0 • − 45 • , we observe that the Sparse ICP CTSF achieves the smallest rotation errors for all the metrics which is significantly smaller than the Sparse ICP rotation error. Table III reports the minimum and maximum relative errors achieved by the Sparse ICP CTSF with respect to the considered metrics. Likewise in the above case, the smallest relative error happens for the metric φ 3 , as well as the smallest error interval [M in, M ax], but now with k = 5% and k = 50%.  Figure 8(a) allows to visually check the alignment obtained by the Sparse ICP CTSF using k = 5%. Also, Figure 8(b) allows to compare that result with the Sparse ICP registration in order to confirm that, different from the case 0 • − 45 • , the alignment of the former is really better than the alignment generated by the latter in this case.
The third registration test is the hardest one, since there is a 90 • variation between the two point sets. It implies also in a smaller overlapping, which is a complicating factor in rigid registration. All methods failed to obtain a correct registration in this case. The minimum and maximum absolute errors of the best two methods are reported in Table IV for the case   The minimum rotation error is achieve by ICP CTSF, with k = 75%, in the metric φ 3 . However, Figure 9 shows that the obtained alignment is not correct.  The influence of the trimming parameter can be discussed through Figure 11, when considering the registration for 45 • -90 • . We calculate the rotation error using function φ 3 , shown on Figure 11a. To complement the information, Figure 11b pictures the error variation. We shall observe that the SWC-ICP with k = 75% undergoes the larger registration improvement (0.112109), for trimming τ = 10% , but it also suffers the larger error increasing if τ = 20%. On the other hand, the Sparse ICP CTSF with k = 5%, that achieves the smallest error without trimming, remains almost unchanged once it gets a difference of −1.411663 · 10 −6 with both τ = 10% and τ = 20%. However, the SWC-ICP, that gets the second place in the 45 • -90 • alignment, increases its efficiency for trimming 10% and k = 25%, 50%, 75% but decreases for all the other cases when incorporating trimming. Therefore, it is not possible to figure out a tendency to the influence of the trimming procedure in the registration error. Missing data is simulated by taking the 0 • view of the Bunny and moving it using the rotation (45 • ) and translation available in the web site [12]. In this way, we have the ground truth for the correspondence set which allows to compare the RMS ( e 2 (R, t)) of the techniques without ambiguities. Then, we remove a set of points inside a ball centered in a specific point in the cloud, with radius = 0.03, and update the correspondence set. Figure 12.(b) shows the two clouds before alignment. Figure 12.(a) allows to analyse the performance of the registration techniques, regarding the RMS, against missing. In this figure we indicate in yellow the best technique and in magenta de worst one. It is noticeable the GMM gets the larger RMS error while ICP-CTSF with k = 10% presents outstanding performance. With exception of GMM, the other methods perform close one to each other. Table  V allows to get a better idea about the numeric differences between the RMS errors. The Sparse ICP-CTSF k = 5% and Sparse ICP, which were the best methods in the previous experiments, achieve the 14th and 16th place in the RMS rank for incomplete Bunny data. We shall be careful because the difference between the Sparse ICP and the best technique in Table 12 is approximately 0.0008, which is not too important, considering that the clouds are normalized in the unitary cube. The Figures 12.(c)-(e) agrees with this observation once it is hard to notice differences between the alignments. However, the relative decrease of performance of Sparse ICP-CTSF and Sparse ICP may indicate some sensitivity against incomplete data, which could impact their efficiency for frame-to-frame registration.  Table VI summarizes the main results reported in this section. All the rotation errors presented in Table VI are computed using the φ 3 (Section IV-B) once the discussion related to Table I, Figures 5 and 6 points out this metric as the more appropriate.

B. Frame-To-Frame Registration
According to Section V-A, the best methods in the performed experiments are Sparse ICP, Sparse ICP CTSF, without considering incomplete data, and ICP-CTSF otherwise, with the parameters reported in Table VI. In this section, we must check the obtained conclusions, but now in the frame-to-frame registration, which composes the second sequence of experiments. These tests are executed using the 640 × 480 pixels of the frames extracted from the #03118 video, as reported at the beginning of this section. The image resolution yields a depth array with 307200 elements which increase the computational cost of the registration algorithms. Therefore, we uniformly sampled each frame of the video, with sampling rates r = 8, to reduce the total number of points, generating new video sequence V . This way, each frame C m , 1 ≤ m ≤ |V | becomes a matrix C m ∈ R M1×M2×4 , where M 1 = integer (640/r), M 2 = integer (480/r), C m (i, j, 1) , · · ·, C m (i, j, 3) hold the R,G and B channels, respectively, and C m (i, j, 4) corresponds to the depth information, captured with 8-bit resolution.
The sequence #03118 was chosen because of how easy it is to segment the background. In this video, the sign is the only meaningful object in the scene, with respect to the depth information (see Figure 4). The grass in the background is too deep to be captured and yields null depth values. Hence, we take the set S m = {(i, j, C m (i, j, 4)) ; C m (i, j, 4) > 0, 1 ≤ i ≤ M 1 and 1 ≤ j ≤ M 2 } and interpret it as a point cloud in R 3 . Besides, a temporal sampling was made, selecting one frame at each ς consecutive frames. This approach pushes the difficulty of the registration, as a simulated larger camera movement, that implies in a smaller overlapping region. So, we set P = S ς·i , Q = S ς·(i+1) as the pair source/target in each iteration of the frame-to-frame registration that generates the pair (R ς·i , t ς·i ) that best aligns the source cloud S ς·i with the target one S ς·(i+1) .
The root mean squared error (RMS) after the registration of the clouds S ς·i and S ς·(i+1) is given by: with e 2 being the error computed by expression (5). The equation (40) allows to compute the average root mean squared error, denoted by M RM S, through the expression: which can be used to measure the quality of the whole sequence registration.
Since the choice of the parameter k of the ICP-CTSF, the SWC-ICP and the modified Sparse ICP with CTSF impacts on the results, we show how they change with k = 75%, k = 50%, k = 25%, k = 10% and k = 5% of the total number of points. All methods were set with the same parameters of the previous experiment. From the Figure 13, which shows the registration error along the video sequence for ICP, we notice that the higher errors occur near the end of the video sequence, where there is a rough movement unlike in the rest of the video. The same happens for all the considered methods. In the corresponding frames, there is another complication because of the low number of points that the depth sensor was able to sample, increasing the chance of a bad alignment. Figure 14 illustrates this case.   (14b, 14d). Note the small area in the depth frames that the sensor was able to capture, yielding fewer points than those of Figure 4a, 4b and 4c, in comparison.
Missing data is a frequent problem when using raw depth data, due to uncertainty caused by reflections in the acquisition process. Figure 15 shows an example of a pair, in which one of the point clouds (Figure 15f) misses some points.   Figure 16 shows the M RM S obtained for each method when varying the temporal sampling parameter ς and fixing the spacial sampling r = 8. We notice that the M RM S errors of the Sparse ICP and the Sparse ICP CTSF are the highest, contrasting with the results of the previous experiments. Table  VII reports the lower M RM S values for results in Figure 16a with ICP-CTSF having the best score in this experiment for k = 75%.  Table VII indicates that variation of the parameter k does not have much effect for ICP-CTSF. In fact, Figure 16 shows that the same happens for the other methods, except for a small trend on the SWC-ICP, where smaller values of k yield higher M RM S errors. Figure 17 shows the M RM S of the methods when an image sampling rate r = 16 is used. In this case we fixed the temporal sampling as ς = 1, i.e., every frame i is registered with its consecutive i+1. A change of scale is perceived when comparing with Figure 16a, as some methods almost doubled its error. However, this result is expected, as with a higher image sampling, the pixels (and corresponding points) are farther from each other. Points without an exact correspondent, then, will increase the error value.
Since the M RM S values presented some inconsistencies with the previous experiment regarding the Sparse ICP and Sparse ICP CTSF, we decided to discard the segment at the end of the video where all methods perform bad. So, we take  the first 820 frames and recompute the M RM S for the registration algorithms. In order to allow a more complete analysis and comparison with the M RM S for the whole video, we report in Table VIII the M RM S for both experiments. When comparing the values in the second and third columns, we notice that all methods improve the M RM S if only the first 820 frames are used. However, the Sparse ICP and Sparse ICP-CTSF errors dropped to half when we took the first 820 frames only. This fact points towards the sensitivity of these methods against incomplete (and missing) data problems shown in Figures 14 and 15. In the tests of Section V-A, we figure out a similar conclusion when anaysing the results of Table VI. So, if we take a subsequence of frames free of this problem, we expect outstanding performance with Sparse ICP and Sparse ICP CTSF in these cases. We can use a visual inspection to check this conclusion. In the beginning of the video #03118, we notice less occurrence of incomplete data as shown in Figures 18a-d. So, we simulate an attempt to reconstruct the objects using the frames 1 to 4. Specifically, the resulting registration of the pairs (1,2), (1,3) and (1,4) were overlapped. Figure 19 shows the obtained result. The Sparse ICP CTSF with k ∈ {5%, 10%} produces an image cleaner than other methods, like the blurred image produced from the ICP-CTSF with k = 50%. In Figure 20 we highlight the fact that the result obtained by Sparse ICP CTSF with k = 5% is much better then the ICP-CTSF with k = 50%, as the points are completely overlapped in the former (Figures 20c and 20d), differently from the latter.
Hence, the visual inspection considering the pairs (1,2), (1,3), (1,4) and the MRMS in Table VIII indicates that Sparse ICP and Sparse ICP CTSF suffer the influence of incomplete/missing data. If true, considering the requirements of the frame-to-frame registration, they could not be not recommended for such applications. On the other hand, it seems that the ICP-CTSF is more reliable for this application considering that its performance is less sensitive to the mentioned problems, as observed in Tables VI and VIII. Next, we undertake new experiments to evaluate the focused techniques for frame-to-frame registration with availabe groud truth transformation, in order to check these conclusions.

C. Frame-To-Frame Alignment Tests with Ground-Truth
The frame-to-frame registration experiments of Section V-B was not conclusive with respect to the best technique for this application. Hence, in this section, we perform frame-to-frame registration tasks using public data sets, with ground truth available [14], [46], to complete the analysis. The geometry behind the data acquisition process is pictured on Figure  21, which represents the world reference system, denoted by W , and two others coordinate systems, named W 1 and W 2 , attached to the camera and defining its location and orientation respect to the system W . Besides, we have a point p ∈ R 3 , which has coordinates: respect to the systems W 1 and W 2 , respectively. Also, let rotations R 1 , R 2 , and translations t 1 , t 2 , be such that: as well as the rotation R 1,2 and translation t 1,2 , computed by a registration algorithm, which allows to write: Fig. 21: Global (W ) and camera reference systems for two consecutive frames.
Consequently, given a point cloud P = {p 1 , p 2 , . . . , p n P } ⊂ R 3 , the registration error can be computed as: Also, considering expressions (42)-(43), a simple algebra shows that: We interpret the first term of the right-hand side of expression (46) as the ground truth translation and the matrix R −1 2 R 1 as the ground truth rotation, that can be used to quantify the precision of the rigid transformation given by expression (44). The ground truth rigid transformation can be computed if we know the rotations and the translations that appear in equations (42)- (43). The database available in [14] provides these information for the video 'freiburg2 xyz', that contains very clean data for debugging translations, and for the video 'freiburg2 rpy' which contains suitable data for debugging rotations [14]. The former has 3615 RGB-D frames while the later encompasses 3221 RGB-D images, both with resolution 640 × 480 pixels.
We work analogously to the beginning of Section V-B, by setting r = 16 and ς = 3, to generate the set S m = {(i, j, C m (i, j, 4)) ; C m (i, j, 4) > 0, 1 ≤ i ≤ M 1 and 1 ≤ j ≤ M 2 } and perform the necessary transformation [47] to convert the 2D depth data to 3D point clouds in the reference system W m , that defines the Kinect position and orientation when frame C m was acquired. The result is a point cloud Wm is the coordinate vector of the point p ij = (i, j, C m (i, j, 4)) respect to the coordinate system W m . So, we set P = S Wς·i , Q = S W ς·(i+1) as the pair source/target in each iteration of the frame-to-frame registration that generates the pair R ς·i,ς·(i+1) , t ς·i,,ς·(i+1) that best aligns the source cloud P with the target one Q.
All methods were set with the same parameters of the previous experiments. We start with the sequence 'freiburg2 rpy' and compute expression (47)-(48) with ς = 3, whose results are shown in Figure 22. Although, according to the database information [14], in this case we have small translation effects, we decided to show the translation error in Figure 22.(b) in order to complete the analysis. The best techniques are highlighted with yellow bars and the worst with magenta bars. From Figure 22.(a) wee see that Sparse ICP-CTSF with k = 50% achieve the lowest rotation error while Figure  The scale of the mean rotation/translation errors in Figure 22 do not allow to rank the best techniques. To solve this problem, we report in Tables IX-X the best seven methods according to the rotation and translation errors with the corresponding standard deviations calculated by expression (49) and (50), respectively.
Once the metric φ 3 ∈ [0, 1], it is straightforward that the second column of Table IX gives the absolute (M T r ± ST r) and relative mean errors ((M T r ± ST r) /φ max 3 ) for rotation. Also, considering that the clouds are normalized in the unitary cube we can also take M T r ± ST r as both the absolute and relative translation error measure.
From Table IX, it is noticeable that Sparse ICP-CTSF and Sparse ICP are the best techniques for rotation, with a small advantage of former with k = 50% against the latter. We shall  (47)) with standard deviations given by expression (49).

Method
MRot ± SRot Sparse ICP-CTSF k = 50 % 8.31308 × 10  highlight that the mean error and standard deviation regarding rotation, reported in Table IX, are of order 10 −5 and 10 −4 , respectively, which show that the methods perform well in this item.
Regarding to translation, the best techniques reported in Table X are GMM and Super 4PCS. They work equivalent in the translation estimation once both achieve the same values for M T r and for the standard deviation ST r, in the precision used in Table X. The Sparse ICP-CTSF and Sparse ICP do not appear in the list of seven better methods.
The next tests show the performance of the registration methods when using the video 'freiburg2 xyz'. Although the data set documentation [14] assures that this video is indicate for debugging translations, we reported both the rotation (Figure 23.(a)) and translation errors ( Figure 23.(b)) to complete the analysis.
Likewise in the last tests, Sparse ICP-CTSF with k = 50% is the best methods for rotation as emphasized by the yellow bar in Figure 23.(a). In Table XI we also report the best seven methods according to the rotation mean errors for tests with video 'freiburg2 xyz' with the corresponding standard deviations. Considering the error mean M Rot and standard deviation SRot we see that the performance of Sparse ICP is close to Sparse ICP-CTSF with k = 50% while both perform very well if we take into account that φ 3 ∈ [0, 1].
Regarding the errors for translation for video 'freiburg2 xyz' shown in Figure 23.(b), we notice that GMM outperforms all the other methods, likewise in the previous video. Also, the first column of Table XII shows that the GMM and Super 4PCS work equivalently in the precision used in this table.

Method
MTr ± STr GMM 0.00610 ± 0.00301 Super 4PCS 0.00610 ± 0.00301 ICP 0.05807 ± 0.06520 SWC-ICP k = 75 % 0.06107 ± 0.06515 SWC-ICP k = 50 % 0.06140 ± 0.06729 SWC-ICP k = 25 % 0.06182 ± 0.06727 ICP-CTSF k = 75 % 0.06378 ± 0.07572 50% is the best technique for rotation while GMM got outstand results for translation estimation. In order to put all this together to try a final conclusion, we apply the transformation (46) to the set P and take the correspondence relation (2) as the ground truth matching in order to compute the M RM S error using equations (40)- (41) and (45). Tables XIII-XIV reports the obtained results. It is noticeable that ICP-CTSF with k = 75% achieves the best M RM S for both tables. If we return to Figures 22 and 23 we observe that ICP-CTSF was among the best methods, as we can confirm by Tables XII, XI, and X.  If we assemble the results presented in Figures 22-23 and and TablesIX-41, we conclude that the best technique for rotation estimation is Sparse ICP-CTSF with k = 50% while GMM outperforms the other techniques for translation computation. However, considering rotation and translation together in the M RM S, the ICP-CTSF with k = 75% obtains the best results. We shall remember that in the end of Section V-B we pointed out that ICP-CTSF is more reliable for frameto-frame registration applications considering that its performance seems to be less sensitive against missing/incomplete data, as also reported in Table VI. Besides, we must take into account that ICP-CTSF with k = 75% is among the seven best methods reported in Tables XII, XI, and X. So, all this together favor the ICP-CTSF as the best technique for frame-to-frame registration.

VI. CONCLUSION AND FUTURE WORKS
In this paper we consider the frame-to-frame registration problem, in which the point clouds are extracted from a video sequence with depth information. We compare seven techniques, named by the acronyms ICP, ICP-CTSF, SWC-ICP, GMM, Sparse ICP, S4PCS, and Sparse ICP CTSF (Section III). We use both point clouds and a RGB-D video streams in the experimental results. In the former, the ground truth rotation is provided which allows to analyse four different metrics, described on Section III, to measure the rotation error in this case. The results show better performance for Sparse ICP and Sparse ICP CTSF using the inner product of unit quaternions metric. However, when simulating missing data, the experiments show outstanding results for ICP-CTSF. Considering that missing/incomplete data is a common problem in frame-to-frame registration it was expected some influence of this fact in second class of experiments, where video sequences with depth information were segmented and the registration algorithms applied. I fact, the results show that ICP-CTSF is more reliable for frame-to-frame registration.
As further works, we should observe that the CTSF can be used as a dissimilarity factor between any second order tensors and applied in tasks other than rigid registration. Therefore, a new avenue is to apply this criterion in non-rigid alignments problems and compare its performance with counterpart ones [9], [48], [49] in a more general registration scenario.