This paper presents a novel two-step camera calibration method in a GPS/INS/Stereo Camera multi-sensor kinematic positioning and navigation system. A camera auto-calibration is first performed to obtain for lens distortion parameters, up-to-scale baseline length and the relative orientation between the stereo cameras. Then, the system calibration is introduced to recover the camera lever-arms, and the bore-sight angles with respect to the IMU, and the absolute scale of the camera using the GPS/INS solution. The auto-calibration algorithm employs the three-view scale-restraint equations (SRE). In comparison with the collinearity equations (COL), it is free from landmark parameters and ground control points (GCPs). Therefore, the proposed method is computationally more efficient. The results and the comparison between the SRE and COL methods are presented using the simulated and road test data. The results show that the proposed SRE method requires less computation resources and is able to achieve the same or better accuracy level than the traditional COL.

Introduction

The high demand for low-cost multi-sensor kinematic positioning and navigation systems as the core of direct-georeferencing technique in mobile mapping is continuously driving more research and development activities. The effective and sufficient utilization of images is among the most recent scientific research and high-tech industry development subjects. In this particular field, York University’s Earth Observation Laboratory (EOL) is engaging in the study of the image-aided inertial integrated navigation as the natural continuation of its past research in the multi-sensor integrated kinematic positioning and navigation (Qian et al. 2012; Wang et al. 2015).

An image-aided inertial navigation system (IA-INS) implies that the errors of an inertial navigator are estimated via the Kalman filter using measurements derived from images. The image-based navigation algorithms, such as visual odometry (VO) (Konolige et al. 2011; Scaramuzza and Fraundorfer 2011; Gopaul et al. 2014, 2015) or visual Simultaneous Localization and Mapping (SLAM) (Durrant-Whyte and Bailey 2006; Williams and Reid 2010; Lategahn et al. 2011; Alcantarilla et al. 2012), usually assume that a camera system is calibrated prior to its use and the calibration parameters do not change over time. The internal camera parameters (focal length, principal point and lens distortion) and the external camera parameters (baseline and relative orientation between cameras, lever-arms and bore-sight angles with respect to the inertial measurement unit (IMU)) are required to relate the image coordinates with the object coordinates in the scene. The process of estimating these parameters is referred to as the camera calibration.

The traditional camera calibration consists of capturing images containing an array of the reference targets in a laboratory, whose coordinates are accurately known (Wolf and Dewitt 2000). However, these parameters can be invalidated during in-field operations, e.g., during camera assembly/disassembly, replacement, bumps (Teller et al. 2010) or significant temperature variations. Recently many developments have focused on the in-field camera auto-calibration (or self-calibration) for image-inertial systems. An auto-calibration refers to the determination of the camera parameters from a sequence of the overlapping images without necessarily setting up ground control points (GCPs) or special calibration targets. Typically, auto-calibration process is performed in a bundle adjustment (BA) (Triggs et al. 1999) or in the SLAM framework (Civera et al. 2009; Kelly and Sukhatme 2009; Kelly et al. 2011; Keivan and Sibley 2014). It involves the simultaneous estimation of the positions and orientations of the camera, the positions of the stationary landmarks, and the calibration parameters of the camera. The corresponding mathematical equation system, which models the parameters through the available measurements, is usually solved by using non-linear least-squares, the Levenberg-Marquardt algorithm or a Kalman filter. These methods however are computationally expensive due to the very large number of landmark position parameters.

Accordingly, this paper proposes a novel camera calibration method that can precisely calibrate the internal and external camera parameters with a GPS/INS/Stereo camera system exclusive of the landmark position parameters. The method applies the three-view scale-restraint equation (Bethel 2003; Ghosh 2005), with which the measurements are processed exclusively in the image space without landmark parameters. Therefore, it does not allocate large memory and computation resources. The remainder of the paper is organized as follows. Related work section overviews the related work. Then, the novel algorithm is proposed in Two-step camera calibration method section, which is followed by test results using the simulated and real data as Test results and analysis section. Conclusions section ends the paper with discussions, and conclusions.

Related work

Bender et al. (2013) presented an in-flight graph based the BA approach for system calibration between a rigidly mounted camera and an inertial navigation system. Image point features and GPS aided-INS position and orientation solution were used as measurements. Their method simultaneously computed the internal camera parameters as well as the 6-dof transformation (i.e. lever arms and relative orientation) between the two systems. However their method also required at least one GCP in-order to recover the z-component of the lever-arm. Kelly and Sukhatme (2009) proposed a camera-IMU self-calibration method within the SLAM framework implemented by an unscented Kalman filter. The lever-arms and mounting angles, the IMU gyroscope and accelerometer biases, the local gravity vector and landmarks could all be recovered from camera and IMU measurements alone. However, they assumed that the internal camera parameters were known beforehand. (Mirzaei and Roumeliotis 2008) presented a similar tightly-coupled approach using an iterative extended Kalman filter, but, in need of known landmark position.

The methods in (Bender et al. 2013; Kelly and Sukhatme 2009; Kelly et al. 2011) implemented structure-from-motion (SfM) and contains stationary landmark parameters. SfM algorithms, which compute 3D coordinates from 2D image correspondences, have some disadvantages. The 3D Cartesian coordinates of distant objects are biased (Sibley et al. 2005) and are not well represented by Gaussian distributions (Civera et al. 2008). Similar problems arise when the baseline length between the stereo cameras and the distance between the consecutive frames are small in monocular vision (Scaramuzza and Fraundorfer 2011). Furthermore, the inclusion of landmark position in the parameter vector has two main drawbacks. First, the BA and SLAM implementation requires a good initial guess which can be difficult especially in monocular vision and when landmarks that were far away. Second, the number of landmark parameters can be very large which can result in difficult and computationally expensive estimation. Efforts to reduce the computational load were introduced in (Dang et al. 2009) where a 3D landmark position was decomposed in to 1D feature depth parameter by algebraically eliminating the x and y components using equations from the stereo pair. However, it still required the estimation of the landmark depth, a parameter not particularly useful in the calibration procedure.

Auto-calibration algorithms require a minimal constraint to define the network datum, which can be done by applying the minimum constraint, free-network adjustment, or through an explicit minimal control point (Remondino and Fraser 2006). In the free-network adjustment situation, the absolute scale of the camera system cannot be known without additional information. (Kelly et al. 2011) focused on determining the absolute scale of both the scene and the baseline in a stereo rig using GPS measurements. Their approach was similar to photogrammetric BA and the structure from motion algorithms. They could recover the baseline and relative orientation between the two cameras and lever-arms between the GPS antenna and reference camera. Similar to their previous method in (Kelly and Sukhatme 2009), they assumed that the internal camera parameters were known beforehand. Three or more overlapping image frames are required in order to estimate the camera motion on a common scale. Structure-free motion algorithms typically relied on three-view constraints (Yu et al. 2006; Indelman et al., 2013) for the same reason. The advantage is obvious. They could be exclusive of landmark position parameters and result in a more efficient algorithm.

The proposed method consists of two steps; firstly, the three-view scale-restraint equation is used to perform the free-network auto-calibration in a stereo camera system. This enables all images to operate on a common scale. And then the GPS/INS solution is applied to recover the absolute scale, as well as the bore-sight angles and the lever-arms with respect to the IMU.

Methods

In essence, a non-linear least-squares algorithm is employed to estimate the internal camera parameters, the stereo baseline and the relative orientation, the lever-arms and bore-sight angles using image measurements together with the integrated GPS aided-inertial solution. The method proceeds in two steps: (a) the stereo auto-calibration using only image measurements and then (b) the system calibration using the GPS-aided inertial integrated navigation solution (position and orientation) as external measurements. It functions under the assumptions

The object points in the scene are stationary;

The raw measurements from the sensors are synchronized;

The GPS/INS blended solution has been processed;

The GPS/INS position is referenced at the center of the IMU.

Reference frames

The following four coordinate systems are used throughout this paper (Fig. 1).

The navigation frame (n-frame) is a frame that moves with the vehicle with its origin located at a predefine point on the vehicle. Its z-axis is normal to the reference ellipsoid and points downwards while its x and y axes point towards the geodetic North and East, respectively forming a right-handed Cartesian coordinate system.

The (n’-frame) has the same origin as the n-frame. Its orientation is arbitrary but fixed with respect to the n-frame.

The body frame (b-frame) shares the same origin with the n-frame. Its x-axis points along the vehicle’s longitudinal axis and the z-axis points down while its y-axis forms a right-handed coordinate system.

The camera frame (c-frame) is a frame in which the image measurements are taken. Its origin is at the perspective center of the reference camera. Its x-axis and y-axis are parallel to the columns and rows of the charge-coupled device (CCD) sensor while its z-axis points away from the CCD sensor to form a right handed coordinate system. The camera system is assumed to be rigidly mounted on the vehicle. Hereafter, the left camera is set as the reference camera in the stereo system.

Measurement equations

Collinearity equations

The algorithmic development starts with the well-known extended collinearity equation (COL) which relates the object point position vector \( {\mathbf{X}}_i^n\left[\mathrm{m}\right] \), its corresponding image point (x_{
i
}, y_{
i
})[px], the camera’s perspective center X^{n}[m] and the direction cosine matrix \( \left(\mathrm{D}\mathrm{C}\mathrm{M}\right){C}_c^n \) as follows:

Where (x_{0}, y_{0}), f and \( \left({v}_{x_i},{v}_{y_i}\right) \) are the principal point, the focal length and the measurement noises, respectively. In the presence of lens distortion, the image coordinates of a point deviate from its true ones by (Δx_{
d,i
}, Δy_{
d,i
}), which can be modelled by (Brown 1971):

where r is the radial distance from the principal point to the image point \( \left({r}^2={\overline{x}}^2+{\overline{y}}^2={\left({x}_i-{x}_o\right)}^2+{\left({y}_i-{y}_o\right)}^2\right),\varDelta f \) is the focal length error, (Δx_{0}, Δy_{0}) is the principal point error, k_{
i
} and p_{
i
} are the coefficients of radial distortion and decentering distortion, respectively, and A_{
i
} are the affine deformation parameters. Most of the radial lens distortion is generally accounted by second term k_{2}r^{4} (Barazzetti et al. 2011). The terms with k_{3} and even with k_{4} term are typically included in higher-accuracy applications and wide-angle lenses. Here, the decentering distortion and affine deformation models will not be applied because they are generally very small. Furthermore, their errors will be absorbed by other terms, for example, the principal point (Fraser 2013).

Scale restraint equation

The scale restraint equation (SRE) is a robust three-view constraint that forces the images to operate on one common scale. Let’s consider image point vectors \( {\mathbf{x}}_1^n,{\mathbf{x}}_2^n \) and \( {\mathbf{x}}_3^n \) (Fig. 2):

and \( {\mathbf{b}}_{12}^n \), \( {\mathbf{b}}_{23}^n \) are the baselines [m] between images 1 and 2, and between images 2 and 3, respectively. If image 1 is relatively oriented to image 2, and image 2 is relatively oriented to image 3, there is no guarantee that image 1 and image 3 are also relatively oriented to each other (Bethel 2003). As a result, three vectors \( {\mathbf{x}}_1^n,{\mathbf{x}}_2^n \), and \( {\mathbf{x}}_3^n \) will fail to intersect at a common point due the scale variations of the three views. The ‘mismatch’ vector \( {\mathbf{d}}_{12}^n \) is perpendicular to both of \( {\mathbf{x}}_1^n \) and \( {\mathbf{x}}_2^n \), and is computed as \( {\mathbf{d}}_{12}^n={\mathbf{x}}_1^n\times {\mathbf{x}}_2^n \). Similarly the ‘mismatch’ vector \( {\mathbf{d}}_{23}^n={\mathbf{x}}_2^n\times {\mathbf{x}}_3^n \). Midway along the vectors \( {\mathbf{d}}_{12}^n \) and \( {\mathbf{d}}_{23}^n \) is the point where the two adjacent vectors are the closest. With the vectors in Fig. 2, one can give

where \( {k}_1,{k}_2,{k}_{12},{k}_2^{\prime },{k}_3^{\prime } \) and \( {k}_{23}^{\prime } \) are unknown scalars with unique values as follows (Ghosh 2005)

Their analysis in detail can be found in Appendix A. In order for all the vectors to intersect at the same point, \( {k}_2+{k}_2^{\prime } \) must equal to zero (Ghosh 2005), i.e.

where in \( {\mathbf{d}}_{12}^n={\mathbf{x}}_1^n\times {\mathbf{x}}_2^n \) and \( {\mathbf{d}}_{23}^n={\mathbf{x}}_2^n\times {\mathbf{x}}_3^n \). Equation (6) is the scale restraint equation that forces the independent scale factors for the common ray between the stereo pair 1–2 and stereo pair 2–3 to be equal (Bethel 2003). This equation is mainly used in successive relative orientation of image pairs and scale transfer.

Camera auto-calibration

Table 1 lists the relevant calibration parameters. At a given epoch k, the algorithm uses point features matched from four views which are stereo pairs from two consecutive epochs, i.e., \( {\mathbf{x}}_{L,k}^c,{\mathbf{x}}_{R,k}^c,{\mathbf{x}}_{L,k-1}^c \) and \( {\mathbf{x}}_{R,k-1}^c \) as depicted in Fig. 3.

Point features can be extracted using the Harris corner detector (Harris and Stephens 1988) and matching can be performed using the Sum of Absolute Differences (SAD) in an 11 × 11 window. To improve the matching results between stereo pairs, the search is performed along the epipolar lines (Bin Rais et al. 2003). Furthermore, to improve the matching between the consecutive frames, the locations of the features in the current frame can be predicted from the previous frame using the inertial sensors (Veth et al. 2006).

The matched points are constrained by two SREs. The first SRE relates \( {\mathbf{x}}_{R,k}^c,{\mathbf{x}}_{L,k}^c \) and \( {\mathbf{x}}_{L,k-1}^c \) while the second one relates \( {\mathbf{x}}_{L,k-1}^c,{\mathbf{x}}_{R,k-1}^c \) and \( {\mathbf{x}}_{R,k}^c \) (Fig. 4).

where \( {\mathbf{x}}_{L,k}^{n^{\prime }}={C}_{c,k}^{n^{\prime }}{\mathbf{x}}_{L,k}^c,{\mathbf{x}}_{R,k}^{n^{\prime }}={C}_{c,k}^{n^{\prime }}{C}_{cR}^c{\mathbf{x}}_{R,k}^{cR} \) and \( \varDelta {\mathbf{X}}_{R,k,k-1}^{n^{\prime }}=\varDelta {\mathbf{X}}_{L,k,k-1}^{n^{\prime }}+\left({C}_{c,k}^{n^{\prime }}-{C}_{c,k-1}^{n^{\prime }}\right){\mathbf{b}}_{LR}^c \). In auto-calibration, the orientation of the camera with respect to the n-frame is not required and can be put aside. At this point the global frame is set to the n’-frame. In order to obtain a free-network adjustment, one component of the baseline vector \( {\mathbf{b}}_{LR}^c \) must be free (i.e. 2 dof) and one of the \( {\boldsymbol{\theta}}_c^{n^{\prime }} \) orientation parameters must be fixed (ideally \( {\boldsymbol{\theta}}_{c,k=1}^{n^{\prime }}=\mathbf{0} \)). This fixes both the orientation and scale of the system. Note that \( \varDelta {\mathbf{X}}_{L,k}^{n\hbox{'}} \) and \( {C}_{c,k}^{n\hbox{'}} \) contain transport rate error in this formulation. Under the assumption that the calibration area is within a few hundred meters, this error effect is negligibly small.

Camera system calibration

The camera system calibration determines the lever-arms \( \mathbf{l}{\mathbf{a}}_L^b \), the absolute scale s_{
c
} of the camera and the bore-sight angle vector \( {\boldsymbol{\theta}}_c^b \). The relationship between the GPS/INS and the left camera can be expressed as the following 7-parameter transformation:

where in \( {C}_{b, GPSINS,k}^n \) is the GPS/INS DCM as the function of attitude angles (roll, pitch and heading) and \( {C}_{n^{\prime}}^n \) is determined by the orientation of the camera system with respect to the n-frame at the first epoch

where \( {C}_c^b \) is the DCM from the camera to body represented by the bore-sight angle vector \( {\boldsymbol{\theta}}_c^b \). Differencing (8) between epochs k − 1 and k gives

Equations (11) and (12) equate the GPS/INS information \( \left(\varDelta {\mathbf{X}}_{GPSINS,k}^n,{C}_{b, GPSINS,k}^n\right) \) and the auto-calibration estimates \( \varDelta {\mathbf{X}}_{L,k}^{n\hbox{'}} \) and \( {C}_{c,k}^{n\hbox{'}} \). All of seven parameters can be solved by using the least-squares.

Computation complexity of COL and SRE

This section compares the number of parameters and the number of floating point operations (flops) between COL and SRE auto-calibration algorithms.

Table 2 shows the number of parameters with COL and SRE auto-calibration system with n_{
x
} stereo image frames and n_{
lm
} visible landmarks. One component of the stereo baseline is left free (i.e., only two stereo baseline parameters). The number of the camera position and orientation parameters is 6 (n_{
x
}−1) in total because the first camera position and orientation is fixed (practically they are set to zero). The advantage of employing SRE is to have the number of the estimated parameters far less than the one with COL.

The flop count is the total number of textbook multiplication and addition operations required to obtain a least squares (LS) solution. The factors taken into account in the analysis are the number of the matched stereo points (i.e. number of measurements), the number of the image frames, the number of the landmarks in view and the percentage overlap between consecutive frames. The percentage overlap encompasses camera rate, the velocity and the angular rate of the camera. Furthermore, COL employs the LS algorithm in the explicit form (i.e., z = h(x)) to estimate the parameters while the SRE uses implicit LS (i.e., h(x, z) = 0)); where x is the parameter vector, z is the measurement vector and h(.) is the functional model. The flop counts between the two will be different under a given number of measurements and parameters.

In order to simplify the analysis, the number of image frames is kept constant (here set to 92, the same number used in the test results). Furthermore, the number of measurements, landmarks and the overlapping percentage is assumed to have the following predictive relationship

where m′ is the average number of the matched stereo pairs per landmark, m is the total number of the matched stereo pairs and p is the average overlapping percentage. For instance, if p = 75% then m′ = 4. This means on average a landmark is viewed on four images. By keeping the average percentage overlap constant, the number of landmarks in the system can be predicted with a given number of stereo points. The number of measurements and parameters in the LS are now known and therefore the flop count can be predicted. Figure 5 shows the number of flops vs. number of stereo points required in COL and SRE with overlapping of 70, 80 and 90%.

As expected, the plot shows that COL uses more flops than SRE. As percentage overlap increases, the number of flops in COL decreases because the number of the matched stereo pairs per landmark becomes larger. Therefore, given the same number of measurements, the number of the landmark parameters becomes smaller. As percentage overlap increases, the number of flops in SRE increases because more matrix inversion operations are needed in the implicit LS algorithm. The accuracy analysis is presented in Auto-calibration results section.

Results and discussion

In this section test results from the simulated, laboratory and real data are presented. Simulations were performed to validate the proposed SRE auto-calibration algorithm and to show how its performance (both computation and accuracy) in comparison with the one from the COL auto-calibration method. Finally results from land vehicle data are presented.

Results from the simulated data

The simulations were conducted to compare the performance of COL and SRE auto-calibration algorithms based on a typical land vehicle trajectory (i.e. large horizontal motion and heading variation). Figure 6 shows the vehicle’s trajectory and the landmarks. The vehicle’s height and attitude profiles are given in Fig. 7.

The camera resolution and the field of view (FOV) were set to 640 × 480 pixels and 50° (equivalent to 686.2 pixels), respectively. The baseline between the two cameras is 0.65m long. Figure 8 describes the simulation parameters. The number of the epochs is 92. To simulate the urban scenario, the landmarks between the ranges 15 and 25 were selected to be in view of the camera. Furthermore, the measurement noise was set to zero mean Gaussian noise with the standard deviation of 0.5px. The initial value for each camera calibration parameter was set to zero except the baseline component \( {b}_{LR.y}^c \) as the free parameter and equal to 0.65m.

Auto-calibration accuracy analysis

The accuracy analysis on COL and SRE algorithms is presented in this section. Auto-calibration results from one COL and two SREs are presented (i.e. SRE1 and SRE2). The estimates from COL and SRE1 were obtained using the same number of the measurements, i.e., 17,074 stereo points (m) whilst SRE2 used 4.5 times more measurements. The average percentage overlap for all three cases was 74%. Figure 9 shows the number of stereo points per frame and the number of accumulated stereo points.

Figure 10 shows the estimated standard deviation of the focal length error and principal point obtained from the three estimates w.r.t. epoch. The results showed that with a given number of measurements, SRE1 performed worse than COL. However SRE2 required 4.5 times more to be equivalent or better than COL. Figure 11 shows the estimated standard deviation w.r.t. the number of flops. The results showed that both SRE1 and SRE2 required less computation resources to achieve the same level of the accuracy as COL.

Auto-calibration results

This section presents the final results from COL, SRE1 and SRE2. Tables 3, 4 and 5 show the true values, estimates and their standard deviations for the left, right and relative camera orientation calibration parameters, respectively. The results showed that the accuracy of the focal length error from COL and SRE2 were similar and better than SRE1. SRE2 estimated the best principal point error, but SRE1 gave the worst. The radial distortion coefficients from COL and SRE1 were similar. However, SRE2 delivered the best coefficients.

The relative orientation parameters from SRE2 were the best whilst the ones from COL and SRE1 were similar to each other. Table 6 shows the total number of the used stereo points, the number of parameters and the flop count for each of the auto-calibration algorithm. COL and SRE1 employed the same number of measurements, as SRE2 used 4.5 times more measurements. SRE1 and SRE2 estimated the same number parameters, but COL estimated 7854 more parameters. Even though SRE2 processed more measurements than COL, it still used 1000 times less flops since less number of parameters were estimated. Furthermore the accuracy is higher since more measurements were employed.

Results from road test data

Test results from both the camera auto-calibration and system calibration using road test data are presented in this section. The road data were collected by the land vehicle navigation system developed at the Earth Observation Laboratory of York University (Qian et al. 2012) with two newly integrated cameras. The system consists of two NovAtel OEM GPS receivers, one Crossbow IMU440CA and the two PointGrey Flea3 cameras as described in the lab test section. Two GPS receivers provided the absolute heading measurements and a third GPS receiver was used as the base station so that RTK level GPS positioning accuracy was achieved. The lever-arm vectors of the GPS receivers and cameras with respect to the IMU unit were measured beforehand at the accuracy of 0.5cm. The observation rates were set to 1.0, 100 and 7.5 Hz for GPs, IMU and cameras, respectively. The IMU specification is in Table 7.

The data was collected in Vaughan, Ontario and was 518 s long. Figure 12 shows the top view of the vehicle’s trajectory, the velocity profile and the attitude profiles.

One hundred images in the interval shown in Fig. 13 (left) were used to test the camera calibration algorithms. This section of the trajectory was chosen because the vehicle dynamics favor lever-arm estimation. Furthermore, the images were highly textured which could be important in detecting point features. Figure 13 (right) shows the 53rd stereo image with the matched feature points. Similar to the tests performed in Results from road test data section, the calibration results from one COL and two SREs are presented (i.e. still as SRE1 and SRE2). COL and SRE1 estimates were obtained using the same number of stereo points whilst SRE2 used 5.1 times more measurements. The average percentage overlap for all three cases is 73%.

Auto-calibration results

The estimated lens distortion parameters together with their standard deviations of the left and right cameras are shown in Tables 8 and 9, respectively. The results showed the focal length error estimates from COL and SRE2 were similar, but SRE1 performed the worst. SRE2 estimated the best principal point error, followed by COL and SRE1. Noticeably, ∆x_{
0
} is better estimated than ∆y_{
0
} because the distribution of the points on the images in the x component was more varied and thus benefited the ∆x_{
0
} estimation. The results showed that the coefficients k_{
1
} and k_{
2
} accounted for most of the radial distortion. Their standard deviations were the lowest from SRE2, while they were similar in COL and SRE1.

Table 10 shows the estimated relative orientation parameters and their standard deviations. The results showed that the estimated baseline vectors were similar. The angles in the y and z components were also similar, whereas the x component was similar from COL and SRE2. The worst came from SRE1. Table 11 shows the total number of the used stereo points, number of parameters and the flop count for each of the auto-calibration algorithms. COL used 3162 times more flops than SRE2. The most important finding in the tests was that despite the fact SRE2 used more measurements; it outperformed COL both in terms of accuracy and computational efficiency.

System calibration results

Table 12 shows the estimated lever-arms, absolute scale factor and bore-sight angles of the stereo camera system. The results showed that the lever-arms from COL and SRE2 were comparable, whereas the one from SRE1 was the worst. SRE2 estimated the best bore-sight angles followed by COL and SRE1. The absolute scale factors were similar in all three calibration results.

The standard deviation of the lever-arm in the z-component was almost three times larger than the ones in x and y components as there was little variation in the vehicle’s pitch angle and the accuracy of the GPS position was worse vertically than horizontally. The lever-arms were measured beforehand using a measuring tape at the accuracy of 0.1cm. Table 13 shows the difference between the estimated and the measured lever-arms, and the standard deviations. The differences were within two times the standard deviations (95%) which showed that the lever-arms were correctly estimated.

Conclusions

This paper presented a novel two-step camera calibration method in a GPS/INS/Stereo camera integrated kinematic positioning and navigation system. The first step performs the camera auto-calibration for a stereo system by employing two scale-restraint equations to constrain the matched features from two consecutive stereo pairs. The lens distortion parameters, the up-to-scale baseline length and the relative orientation between the two cameras are estimated using the least-squares method. The second step performs system calibration where the auto-calibration estimates are fused with the blended GPS/INS solution to recover the camera lever-arms, the absolute scale of the camera and the bore-sight angles. The main advantage of the proposed novel method lies that it is free from landmark parameters and results in computation and memory savings. There are two main drawbacks in employing the scale-restraint equation over the collinearity equations for stereo auto-calibration. Firstly the accuracy cannot be increased by performing loop closures when the same scene is revisited. Secondly the scale-restraint equation is highly non-linear and therefore the LS estimator can diverge if a good approximation of the parameters is not available.

The results from the simulated and real road test data were presented and showed that the proposed auto-calibration method requires less computation resources to achieve equal or better accuracy than applying the traditional collinearity equations despite the fact it using more measurements.

References

Alcantarilla PF, Yebes JJ, Almazàn J, Bergasa LM (2012) On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In: Proceedings at the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, Minnesota, USA, 14–18 May 2012:1290–1297. doi:10.1109/ICRA.2012.6224690.

Barazzetti L, Mussio L, Remondino F, Scaioni M (2011) Targetless Camera Calibration, International Archives of the Photogrammetry. In: ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXVIII-5/W16, 2011. pp 335–342. doi:10.5194/isprsarchives-XXXVIII-5-W16-335-2011.

Bender D, Schikora M, Sturm J, Cremers D (2013) Graph-based bundle adjustment for INS-camera calibration. In: International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-1/W2, 2013UAV-g2013, Rostock. doi:10.5194/isprsarchives-XL-1-W2-39-2013.

Bethel JS (2003) Photogrammetry and Remote Sensing. In: Chen WF, Liew JYR (ed) The Civil Engineering Handbook, 2nd Edition, Chapter 56. CRC Press, Boca Raton, London, New York, Washington D.C, p 2170.

Brown DC (1971) Close-Range Camera Calibration. Photogramm Eng 37(8):855–866

Bin Rais N, Khan HA, Kamran F, Jamal H (2003) A new algorithm of stereo matching using epipolar geometry. In: Proceedings at the Multi Topic Conference, 2003. INMIC 2003. 7th International, Islamabad, Pakistan, 9–9 December 2003:21–24. doi:10.1109/INMIC.2003.1416609.

Civera J, Davison AJ, Montiel JMM (2008) Inverse depth parametrization for monocular SLAM. IEEE Trans Robot 24(5):932–945. doi:10.1109/TRO.2008.2003276

Civera J, Bueno DR, Davison AJ, Montiel JMM (2009) Camera self-calibration for sequential Bayesian structure from motion. In: Proceedings at the IEEE International Conference on Robotics and Automation, ICRA2009, Kobe. doi:10.1109/ROBOT.2009.5152719.

Dang T, Hoffmann C, Stiller C (2009) Continuous stereo self-calibration by camera parameter tracking. IEEE Trans Image Process 18(7):536–1550. doi:10.1109/TIP.2009.2017824.

Durrant-Whyte H, Bailey T (2006) Simultaneous localization and mapping (SLAM): part I the essential algorithms. IEEE Robot Autom Mag 13(2):99–110. doi:10.1109/MRA.2006.1638022

Ghosh SK (2005) Fundamentals of Computational Photogrammetry. Concept Publishing Company, New Delhi, p 117-119

Gopaul NS, Wang JG, Hu B (2014) Discrete EKF with pairwise time-correlated measurement noise for image-aided inertial integrated navigation. ISPRS Ann Photogramm, Remote Sens Spat Inf Sci 2(2):61–66. doi:10.5194/isprsannals-II-2-61-2014.

Gopaul NS, Wang JG, Hu B (2015) Multi-frame Visual Odometry in Image-Aided Inertial Navigation System. J. Sun at al. (eds) China Satellite Navigation Conference (CSNC) 2015 proceedings. Lecture Notes Electr Eng 342(3):649–658. doi:10.1007/978-3-66-46632-2_57.

Indelman V, Melim A, Dellaert F (2013) Incremental Light Bundle Adjustment for Robotics Navigation. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo. doi:10.1109/IROS.2013.6696615.

Kelly J, Sukhatme GS (2009) Visual-inertial simultaneous localization, mapping and sensor-to-sensor self-calibration. In: Proceedings at the Computational Intelligence in Robotics and Automation (CIRA), 2009 IEEE International Symposium, Daejeon, South Korea, 15–18 December 2009: 360–368 doi:10.1109/CIRA.2009.5423178.

Kelly J, Matthies LH, Sukhatme GS (2011) Simultaneous mapping and stereo extrinsic parameter calibration using GPS measurement. In: Proceedings at the Robotics and Automation (ICRA), 2011 IEEE International Conference, Shanghai, China, 9–13 May 2011:279–286. doi:10.1109/ICRA.2011.5980443.

Keivan N, Sibley G (2014) Constant-time Monocular Self-Calibration. In: Proceedings at the IEEE International Conference on Robotics and Biomimetics (ROBIO) 2014. Bali, Indonesia, 5–10 December 2014:1590–1595. doi:10.1109/ROBIO.2014.7090561.

Konolige K, Agrawal M, Solà J (2011) Large scale visual odometry for rough terrain, In: The International Journal of Robotics Research 66:201-212. doi:10.1007/978-3-642-14743-2_18.

Lategahn H, Geiger A, Kitt B (2011) Visual SLAM for autonomous ground vehicles. In: Proceedings of the Robotics and Automation (ICRA), 2011 IEEE International Conference, Shanghai, China, 9–13 May 2011:1732–1737. doi: 10.1109/ICRA.2011.5979711.

Mirzaei FM, Roumeliotis SI (2008) A Kalman Filter-Based Algorithm for IMU-Camera Calibration: Observability Analysis and Performance Evaluation. IEEE Trans Robot 24(5):1143–1156. doi:10.1109/TRO.2008.2004486.

Qian K, Wang JG, Gopaul NS, Hu B (2012) Low Cost Multi-sensor Kinematic Positioning and Navigation System with Linux/RTAI. J Sens Actuator Netw 1(3):166–182

Remondino F, Fraser C (2006) Digital Camera Calibration Methods: Considerations and Comparisons. The International Archives of the Photogrammetry and Remote Sensing Commission V Symposium. Image Engineering and Vision Metrology. IAPRS 36(5):266–272, Dresden 25–27 September 2006

Sibley G, Matthies L, Sukhatme G (2005) Bias Reduction and Filter Convergence for Long Range Stereo. In: 12th International Symposium of Robotics Research, San Francisco 28:285-294. doi:10.1007/978-3-540-48113-3_26.

Teller S, Koch O, Walter MR, Huang AS (2010) Ground robot navigation using uncalibrated cameras. In: Proceedings at the Robotics and Automation (ICRA), 2010 IEEE International Conference, Anchorage, USA, 3–7 May 2010: 2423–2430. doi:10.1109/ROBOT.2010.5509325.

Triggs N, McLauchlan P, Hartley R, Fitzgibbon A (1999) Bundle Adjustment—A Modern Synthesis. In: Proceedings at the International Workshop on Vision Algorithms ICCV 99. Springer-Verlag, pp 298–372. doi:10.1007/3-540-44480-7_21.

Veth M, Raquet J, Pachter M (2006) Stochastic constraints for efficient image correspondence search. IEEE Trans Aerosp Electron Syst 42(3):973–982. doi:10.1109/TAES.2006.4439212.

Wang JG, Qian K, Hu B (2015) An Unconventional Full Tightly-Coupled Multi-Sensor Integration for Kinematic Positioning and Navigation, J. Sun at al. (eds) China Satellite Navigation Conference (CSNC) 2015 proceedings. Lecture Notes Electr Eng 342(3):753–765. doi:10.1007/978-3-662-46632-2_65.

Williams B, Reid I (2010) On Combining Visual SLAM and Visual Odometry. In: Proceedings at the Robotics and Automation (ICRA), 2010 IEEE International Conference, Anchorage, USA, 3–7 May 2010: 3494–3500. doi:10.1109/ROBOT.2010.5509248.

Wolf PR, Dewitt BA (2000) Elements of Photogrammetry with Applications in GIS, 3rd edn.. ISBN 0-07-292454-3

Yu YK, Wong KH, Chang MMY, Or SH (2006) Recursive Camera-Motion Estimation With the Trifocal Tensor. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 36(5):1081–1090. doi:10.1109/TSMCB.2006.874133.

The authors would like to acknowledge the financial support through research grants provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada under its RGPIN Program. We would also like to thank Applanix Corp. for using their POSGNSS software.

Authors’ contributions

NG carried out the camera auto-calibration and system calibration research. NG conceived, designed, implemented and tested the algorithms presented in this paper. JW and NG collected the GPS, IMU and image data used for the tests. NG drafted the manuscript. JW reviewed and edited the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Lassonde School of Engineering, York University, Toronto, Canada

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Gopaul, N.S., Wang, J. & Hu, B. Camera auto-calibration in GPS/INS/stereo camera integrated kinematic positioning and navigation system.
J. Glob. Position. Syst.14, 3 (2016). https://doi.org/10.1186/s41445-016-0003-7