 Original article
 Open Access
 Published:
Camera autocalibration in GPS/INS/stereo camera integrated kinematic positioning and navigation system
The Journal of Global Positioning Systems volume 14, Article number: 3 (2016)
Abstract
This paper presents a novel twostep camera calibration method in a GPS/INS/Stereo Camera multisensor kinematic positioning and navigation system. A camera autocalibration is first performed to obtain for lens distortion parameters, uptoscale baseline length and the relative orientation between the stereo cameras. Then, the system calibration is introduced to recover the camera leverarms, and the boresight angles with respect to the IMU, and the absolute scale of the camera using the GPS/INS solution. The autocalibration algorithm employs the threeview scalerestraint equations (SRE). In comparison with the collinearity equations (COL), it is free from landmark parameters and ground control points (GCPs). Therefore, the proposed method is computationally more efficient. The results and the comparison between the SRE and COL methods are presented using the simulated and road test data. The results show that the proposed SRE method requires less computation resources and is able to achieve the same or better accuracy level than the traditional COL.
Introduction
The high demand for lowcost multisensor kinematic positioning and navigation systems as the core of directgeoreferencing technique in mobile mapping is continuously driving more research and development activities. The effective and sufficient utilization of images is among the most recent scientific research and hightech industry development subjects. In this particular field, York University’s Earth Observation Laboratory (EOL) is engaging in the study of the imageaided inertial integrated navigation as the natural continuation of its past research in the multisensor integrated kinematic positioning and navigation (Qian et al. 2012; Wang et al. 2015).
An imageaided inertial navigation system (IAINS) implies that the errors of an inertial navigator are estimated via the Kalman filter using measurements derived from images. The imagebased navigation algorithms, such as visual odometry (VO) (Konolige et al. 2011; Scaramuzza and Fraundorfer 2011; Gopaul et al. 2014, 2015) or visual Simultaneous Localization and Mapping (SLAM) (DurrantWhyte and Bailey 2006; Williams and Reid 2010; Lategahn et al. 2011; Alcantarilla et al. 2012), usually assume that a camera system is calibrated prior to its use and the calibration parameters do not change over time. The internal camera parameters (focal length, principal point and lens distortion) and the external camera parameters (baseline and relative orientation between cameras, leverarms and boresight angles with respect to the inertial measurement unit (IMU)) are required to relate the image coordinates with the object coordinates in the scene. The process of estimating these parameters is referred to as the camera calibration.
The traditional camera calibration consists of capturing images containing an array of the reference targets in a laboratory, whose coordinates are accurately known (Wolf and Dewitt 2000). However, these parameters can be invalidated during infield operations, e.g., during camera assembly/disassembly, replacement, bumps (Teller et al. 2010) or significant temperature variations. Recently many developments have focused on the infield camera autocalibration (or selfcalibration) for imageinertial systems. An autocalibration refers to the determination of the camera parameters from a sequence of the overlapping images without necessarily setting up ground control points (GCPs) or special calibration targets. Typically, autocalibration process is performed in a bundle adjustment (BA) (Triggs et al. 1999) or in the SLAM framework (Civera et al. 2009; Kelly and Sukhatme 2009; Kelly et al. 2011; Keivan and Sibley 2014). It involves the simultaneous estimation of the positions and orientations of the camera, the positions of the stationary landmarks, and the calibration parameters of the camera. The corresponding mathematical equation system, which models the parameters through the available measurements, is usually solved by using nonlinear leastsquares, the LevenbergMarquardt algorithm or a Kalman filter. These methods however are computationally expensive due to the very large number of landmark position parameters.
Accordingly, this paper proposes a novel camera calibration method that can precisely calibrate the internal and external camera parameters with a GPS/INS/Stereo camera system exclusive of the landmark position parameters. The method applies the threeview scalerestraint equation (Bethel 2003; Ghosh 2005), with which the measurements are processed exclusively in the image space without landmark parameters. Therefore, it does not allocate large memory and computation resources. The remainder of the paper is organized as follows. Related work section overviews the related work. Then, the novel algorithm is proposed in Twostep camera calibration method section, which is followed by test results using the simulated and real data as Test results and analysis section. Conclusions section ends the paper with discussions, and conclusions.
Related work
Bender et al. (2013) presented an inflight graph based the BA approach for system calibration between a rigidly mounted camera and an inertial navigation system. Image point features and GPS aidedINS position and orientation solution were used as measurements. Their method simultaneously computed the internal camera parameters as well as the 6dof transformation (i.e. lever arms and relative orientation) between the two systems. However their method also required at least one GCP inorder to recover the zcomponent of the leverarm. Kelly and Sukhatme (2009) proposed a cameraIMU selfcalibration method within the SLAM framework implemented by an unscented Kalman filter. The leverarms and mounting angles, the IMU gyroscope and accelerometer biases, the local gravity vector and landmarks could all be recovered from camera and IMU measurements alone. However, they assumed that the internal camera parameters were known beforehand. (Mirzaei and Roumeliotis 2008) presented a similar tightlycoupled approach using an iterative extended Kalman filter, but, in need of known landmark position.
The methods in (Bender et al. 2013; Kelly and Sukhatme 2009; Kelly et al. 2011) implemented structurefrommotion (SfM) and contains stationary landmark parameters. SfM algorithms, which compute 3D coordinates from 2D image correspondences, have some disadvantages. The 3D Cartesian coordinates of distant objects are biased (Sibley et al. 2005) and are not well represented by Gaussian distributions (Civera et al. 2008). Similar problems arise when the baseline length between the stereo cameras and the distance between the consecutive frames are small in monocular vision (Scaramuzza and Fraundorfer 2011). Furthermore, the inclusion of landmark position in the parameter vector has two main drawbacks. First, the BA and SLAM implementation requires a good initial guess which can be difficult especially in monocular vision and when landmarks that were far away. Second, the number of landmark parameters can be very large which can result in difficult and computationally expensive estimation. Efforts to reduce the computational load were introduced in (Dang et al. 2009) where a 3D landmark position was decomposed in to 1D feature depth parameter by algebraically eliminating the x and y components using equations from the stereo pair. However, it still required the estimation of the landmark depth, a parameter not particularly useful in the calibration procedure.
Autocalibration algorithms require a minimal constraint to define the network datum, which can be done by applying the minimum constraint, freenetwork adjustment, or through an explicit minimal control point (Remondino and Fraser 2006). In the freenetwork adjustment situation, the absolute scale of the camera system cannot be known without additional information. (Kelly et al. 2011) focused on determining the absolute scale of both the scene and the baseline in a stereo rig using GPS measurements. Their approach was similar to photogrammetric BA and the structure from motion algorithms. They could recover the baseline and relative orientation between the two cameras and leverarms between the GPS antenna and reference camera. Similar to their previous method in (Kelly and Sukhatme 2009), they assumed that the internal camera parameters were known beforehand. Three or more overlapping image frames are required in order to estimate the camera motion on a common scale. Structurefree motion algorithms typically relied on threeview constraints (Yu et al. 2006; Indelman et al., 2013) for the same reason. The advantage is obvious. They could be exclusive of landmark position parameters and result in a more efficient algorithm.
The proposed method consists of two steps; firstly, the threeview scalerestraint equation is used to perform the freenetwork autocalibration in a stereo camera system. This enables all images to operate on a common scale. And then the GPS/INS solution is applied to recover the absolute scale, as well as the boresight angles and the leverarms with respect to the IMU.
Methods
In essence, a nonlinear leastsquares algorithm is employed to estimate the internal camera parameters, the stereo baseline and the relative orientation, the leverarms and boresight angles using image measurements together with the integrated GPS aidedinertial solution. The method proceeds in two steps: (a) the stereo autocalibration using only image measurements and then (b) the system calibration using the GPSaided inertial integrated navigation solution (position and orientation) as external measurements. It functions under the assumptions

The object points in the scene are stationary;

The raw measurements from the sensors are synchronized;

The GPS/INS blended solution has been processed;

The GPS/INS position is referenced at the center of the IMU.
Reference frames
The following four coordinate systems are used throughout this paper (Fig. 1).
The navigation frame (nframe) is a frame that moves with the vehicle with its origin located at a predefine point on the vehicle. Its zaxis is normal to the reference ellipsoid and points downwards while its x and y axes point towards the geodetic North and East, respectively forming a righthanded Cartesian coordinate system.
The (n’frame) has the same origin as the nframe. Its orientation is arbitrary but fixed with respect to the nframe.
The body frame (bframe) shares the same origin with the nframe. Its xaxis points along the vehicle’s longitudinal axis and the zaxis points down while its yaxis forms a righthanded coordinate system.
The camera frame (cframe) is a frame in which the image measurements are taken. Its origin is at the perspective center of the reference camera. Its xaxis and yaxis are parallel to the columns and rows of the chargecoupled device (CCD) sensor while its zaxis points away from the CCD sensor to form a right handed coordinate system. The camera system is assumed to be rigidly mounted on the vehicle. Hereafter, the left camera is set as the reference camera in the stereo system.
Measurement equations
Collinearity equations
The algorithmic development starts with the wellknown extended collinearity equation (COL) which relates the object point position vector \( {\mathbf{X}}_i^n\left[\mathrm{m}\right] \), its corresponding image point (x _{ i }, y _{ i })[px], the camera’s perspective center X ^{n}[m] and the direction cosine matrix \( \left(\mathrm{D}\mathrm{C}\mathrm{M}\right){C}_c^n \) as follows:
Where (x _{0}, y _{0}), f and \( \left({v}_{x_i},{v}_{y_i}\right) \) are the principal point, the focal length and the measurement noises, respectively. In the presence of lens distortion, the image coordinates of a point deviate from its true ones by (Δx _{ d,i }, Δy _{ d,i }), which can be modelled by (Brown 1971):
where r is the radial distance from the principal point to the image point \( \left({r}^2={\overline{x}}^2+{\overline{y}}^2={\left({x}_i{x}_o\right)}^2+{\left({y}_i{y}_o\right)}^2\right),\varDelta f \) is the focal length error, (Δx _{0}, Δy _{0}) is the principal point error, k _{ i } and p _{ i } are the coefficients of radial distortion and decentering distortion, respectively, and A _{ i } are the affine deformation parameters. Most of the radial lens distortion is generally accounted by second term k _{2} r ^{4} (Barazzetti et al. 2011). The terms with k _{3} and even with k _{4} term are typically included in higheraccuracy applications and wideangle lenses. Here, the decentering distortion and affine deformation models will not be applied because they are generally very small. Furthermore, their errors will be absorbed by other terms, for example, the principal point (Fraser 2013).
Scale restraint equation
The scale restraint equation (SRE) is a robust threeview constraint that forces the images to operate on one common scale. Let’s consider image point vectors \( {\mathbf{x}}_1^n,{\mathbf{x}}_2^n \) and \( {\mathbf{x}}_3^n \) (Fig. 2):
where
and \( {\mathbf{b}}_{12}^n \), \( {\mathbf{b}}_{23}^n \) are the baselines [m] between images 1 and 2, and between images 2 and 3, respectively. If image 1 is relatively oriented to image 2, and image 2 is relatively oriented to image 3, there is no guarantee that image 1 and image 3 are also relatively oriented to each other (Bethel 2003). As a result, three vectors \( {\mathbf{x}}_1^n,{\mathbf{x}}_2^n \), and \( {\mathbf{x}}_3^n \) will fail to intersect at a common point due the scale variations of the three views. The ‘mismatch’ vector \( {\mathbf{d}}_{12}^n \) is perpendicular to both of \( {\mathbf{x}}_1^n \) and \( {\mathbf{x}}_2^n \), and is computed as \( {\mathbf{d}}_{12}^n={\mathbf{x}}_1^n\times {\mathbf{x}}_2^n \). Similarly the ‘mismatch’ vector \( {\mathbf{d}}_{23}^n={\mathbf{x}}_2^n\times {\mathbf{x}}_3^n \). Midway along the vectors \( {\mathbf{d}}_{12}^n \) and \( {\mathbf{d}}_{23}^n \) is the point where the two adjacent vectors are the closest. With the vectors in Fig. 2, one can give
where \( {k}_1,{k}_2,{k}_{12},{k}_2^{\prime },{k}_3^{\prime } \) and \( {k}_{23}^{\prime } \) are unknown scalars with unique values as follows (Ghosh 2005)
Their analysis in detail can be found in Appendix A. In order for all the vectors to intersect at the same point, \( {k}_2+{k}_2^{\prime } \) must equal to zero (Ghosh 2005), i.e.
where in \( {\mathbf{d}}_{12}^n={\mathbf{x}}_1^n\times {\mathbf{x}}_2^n \) and \( {\mathbf{d}}_{23}^n={\mathbf{x}}_2^n\times {\mathbf{x}}_3^n \). Equation (6) is the scale restraint equation that forces the independent scale factors for the common ray between the stereo pair 1–2 and stereo pair 2–3 to be equal (Bethel 2003). This equation is mainly used in successive relative orientation of image pairs and scale transfer.
Camera autocalibration
Table 1 lists the relevant calibration parameters. At a given epoch k, the algorithm uses point features matched from four views which are stereo pairs from two consecutive epochs, i.e., \( {\mathbf{x}}_{L,k}^c,{\mathbf{x}}_{R,k}^c,{\mathbf{x}}_{L,k1}^c \) and \( {\mathbf{x}}_{R,k1}^c \) as depicted in Fig. 3.
Point features can be extracted using the Harris corner detector (Harris and Stephens 1988) and matching can be performed using the Sum of Absolute Differences (SAD) in an 11 × 11 window. To improve the matching results between stereo pairs, the search is performed along the epipolar lines (Bin Rais et al. 2003). Furthermore, to improve the matching between the consecutive frames, the locations of the features in the current frame can be predicted from the previous frame using the inertial sensors (Veth et al. 2006).
The matched points are constrained by two SREs. The first SRE relates \( {\mathbf{x}}_{R,k}^c,{\mathbf{x}}_{L,k}^c \) and \( {\mathbf{x}}_{L,k1}^c \) while the second one relates \( {\mathbf{x}}_{L,k1}^c,{\mathbf{x}}_{R,k1}^c \) and \( {\mathbf{x}}_{R,k}^c \) (Fig. 4).
The equations can be expressed as
where \( {\mathbf{x}}_{L,k}^{n^{\prime }}={C}_{c,k}^{n^{\prime }}{\mathbf{x}}_{L,k}^c,{\mathbf{x}}_{R,k}^{n^{\prime }}={C}_{c,k}^{n^{\prime }}{C}_{cR}^c{\mathbf{x}}_{R,k}^{cR} \) and \( \varDelta {\mathbf{X}}_{R,k,k1}^{n^{\prime }}=\varDelta {\mathbf{X}}_{L,k,k1}^{n^{\prime }}+\left({C}_{c,k}^{n^{\prime }}{C}_{c,k1}^{n^{\prime }}\right){\mathbf{b}}_{LR}^c \). In autocalibration, the orientation of the camera with respect to the nframe is not required and can be put aside. At this point the global frame is set to the n’frame. In order to obtain a freenetwork adjustment, one component of the baseline vector \( {\mathbf{b}}_{LR}^c \) must be free (i.e. 2 dof) and one of the \( {\boldsymbol{\theta}}_c^{n^{\prime }} \) orientation parameters must be fixed (ideally \( {\boldsymbol{\theta}}_{c,k=1}^{n^{\prime }}=\mathbf{0} \)). This fixes both the orientation and scale of the system. Note that \( \varDelta {\mathbf{X}}_{L,k}^{n\hbox{'}} \) and \( {C}_{c,k}^{n\hbox{'}} \) contain transport rate error in this formulation. Under the assumption that the calibration area is within a few hundred meters, this error effect is negligibly small.
Camera system calibration
The camera system calibration determines the leverarms \( \mathbf{l}{\mathbf{a}}_L^b \), the absolute scale s _{ c } of the camera and the boresight angle vector \( {\boldsymbol{\theta}}_c^b \). The relationship between the GPS/INS and the left camera can be expressed as the following 7parameter transformation:
where in \( {C}_{b, GPSINS,k}^n \) is the GPS/INS DCM as the function of attitude angles (roll, pitch and heading) and \( {C}_{n^{\prime}}^n \) is determined by the orientation of the camera system with respect to the nframe at the first epoch
where \( {C}_c^b \) is the DCM from the camera to body represented by the boresight angle vector \( {\boldsymbol{\theta}}_c^b \). Differencing (8) between epochs k − 1 and k gives
The relationship between \( {C}_{b, GPSINS,k}^n \) and the camera DCM \( {C}_{c,k}^{n^{\prime }} \) (from Eqs. (7) and (8)) can be written as
Equations (11) and (12) equate the GPS/INS information \( \left(\varDelta {\mathbf{X}}_{GPSINS,k}^n,{C}_{b, GPSINS,k}^n\right) \) and the autocalibration estimates \( \varDelta {\mathbf{X}}_{L,k}^{n\hbox{'}} \) and \( {C}_{c,k}^{n\hbox{'}} \). All of seven parameters can be solved by using the leastsquares.
Computation complexity of COL and SRE
This section compares the number of parameters and the number of floating point operations (flops) between COL and SRE autocalibration algorithms.
Table 2 shows the number of parameters with COL and SRE autocalibration system with n _{ x } stereo image frames and n _{ lm } visible landmarks. One component of the stereo baseline is left free (i.e., only two stereo baseline parameters). The number of the camera position and orientation parameters is 6 (n _{ x }−1) in total because the first camera position and orientation is fixed (practically they are set to zero). The advantage of employing SRE is to have the number of the estimated parameters far less than the one with COL.
The flop count is the total number of textbook multiplication and addition operations required to obtain a least squares (LS) solution. The factors taken into account in the analysis are the number of the matched stereo points (i.e. number of measurements), the number of the image frames, the number of the landmarks in view and the percentage overlap between consecutive frames. The percentage overlap encompasses camera rate, the velocity and the angular rate of the camera. Furthermore, COL employs the LS algorithm in the explicit form (i.e., z = h(x)) to estimate the parameters while the SRE uses implicit LS (i.e., h(x, z) = 0)); where x is the parameter vector, z is the measurement vector and h(.) is the functional model. The flop counts between the two will be different under a given number of measurements and parameters.
In order to simplify the analysis, the number of image frames is kept constant (here set to 92, the same number used in the test results). Furthermore, the number of measurements, landmarks and the overlapping percentage is assumed to have the following predictive relationship
where m′ is the average number of the matched stereo pairs per landmark, m is the total number of the matched stereo pairs and p is the average overlapping percentage. For instance, if p = 75% then m′ = 4. This means on average a landmark is viewed on four images. By keeping the average percentage overlap constant, the number of landmarks in the system can be predicted with a given number of stereo points. The number of measurements and parameters in the LS are now known and therefore the flop count can be predicted. Figure 5 shows the number of flops vs. number of stereo points required in COL and SRE with overlapping of 70, 80 and 90%.
As expected, the plot shows that COL uses more flops than SRE. As percentage overlap increases, the number of flops in COL decreases because the number of the matched stereo pairs per landmark becomes larger. Therefore, given the same number of measurements, the number of the landmark parameters becomes smaller. As percentage overlap increases, the number of flops in SRE increases because more matrix inversion operations are needed in the implicit LS algorithm. The accuracy analysis is presented in Autocalibration results section.
Results and discussion
In this section test results from the simulated, laboratory and real data are presented. Simulations were performed to validate the proposed SRE autocalibration algorithm and to show how its performance (both computation and accuracy) in comparison with the one from the COL autocalibration method. Finally results from land vehicle data are presented.
Results from the simulated data
The simulations were conducted to compare the performance of COL and SRE autocalibration algorithms based on a typical land vehicle trajectory (i.e. large horizontal motion and heading variation). Figure 6 shows the vehicle’s trajectory and the landmarks. The vehicle’s height and attitude profiles are given in Fig. 7.
The camera resolution and the field of view (FOV) were set to 640 × 480 pixels and 50° (equivalent to 686.2 pixels), respectively. The baseline between the two cameras is 0.65m long. Figure 8 describes the simulation parameters. The number of the epochs is 92. To simulate the urban scenario, the landmarks between the ranges 15 and 25 were selected to be in view of the camera. Furthermore, the measurement noise was set to zero mean Gaussian noise with the standard deviation of 0.5px. The initial value for each camera calibration parameter was set to zero except the baseline component \( {b}_{LR.y}^c \) as the free parameter and equal to 0.65m.
Autocalibration accuracy analysis
The accuracy analysis on COL and SRE algorithms is presented in this section. Autocalibration results from one COL and two SREs are presented (i.e. SRE1 and SRE2). The estimates from COL and SRE1 were obtained using the same number of the measurements, i.e., 17,074 stereo points (m) whilst SRE2 used 4.5 times more measurements. The average percentage overlap for all three cases was 74%. Figure 9 shows the number of stereo points per frame and the number of accumulated stereo points.
Figure 10 shows the estimated standard deviation of the focal length error and principal point obtained from the three estimates w.r.t. epoch. The results showed that with a given number of measurements, SRE1 performed worse than COL. However SRE2 required 4.5 times more to be equivalent or better than COL. Figure 11 shows the estimated standard deviation w.r.t. the number of flops. The results showed that both SRE1 and SRE2 required less computation resources to achieve the same level of the accuracy as COL.
Autocalibration results
This section presents the final results from COL, SRE1 and SRE2. Tables 3, 4 and 5 show the true values, estimates and their standard deviations for the left, right and relative camera orientation calibration parameters, respectively. The results showed that the accuracy of the focal length error from COL and SRE2 were similar and better than SRE1. SRE2 estimated the best principal point error, but SRE1 gave the worst. The radial distortion coefficients from COL and SRE1 were similar. However, SRE2 delivered the best coefficients.
The relative orientation parameters from SRE2 were the best whilst the ones from COL and SRE1 were similar to each other. Table 6 shows the total number of the used stereo points, the number of parameters and the flop count for each of the autocalibration algorithm. COL and SRE1 employed the same number of measurements, as SRE2 used 4.5 times more measurements. SRE1 and SRE2 estimated the same number parameters, but COL estimated 7854 more parameters. Even though SRE2 processed more measurements than COL, it still used 1000 times less flops since less number of parameters were estimated. Furthermore the accuracy is higher since more measurements were employed.
Results from road test data
Test results from both the camera autocalibration and system calibration using road test data are presented in this section. The road data were collected by the land vehicle navigation system developed at the Earth Observation Laboratory of York University (Qian et al. 2012) with two newly integrated cameras. The system consists of two NovAtel OEM GPS receivers, one Crossbow IMU440CA and the two PointGrey Flea3 cameras as described in the lab test section. Two GPS receivers provided the absolute heading measurements and a third GPS receiver was used as the base station so that RTK level GPS positioning accuracy was achieved. The leverarm vectors of the GPS receivers and cameras with respect to the IMU unit were measured beforehand at the accuracy of 0.5cm. The observation rates were set to 1.0, 100 and 7.5 Hz for GPs, IMU and cameras, respectively. The IMU specification is in Table 7.
The data was collected in Vaughan, Ontario and was 518 s long. Figure 12 shows the top view of the vehicle’s trajectory, the velocity profile and the attitude profiles.
One hundred images in the interval shown in Fig. 13 (left) were used to test the camera calibration algorithms. This section of the trajectory was chosen because the vehicle dynamics favor leverarm estimation. Furthermore, the images were highly textured which could be important in detecting point features. Figure 13 (right) shows the 53rd stereo image with the matched feature points. Similar to the tests performed in Results from road test data section, the calibration results from one COL and two SREs are presented (i.e. still as SRE1 and SRE2). COL and SRE1 estimates were obtained using the same number of stereo points whilst SRE2 used 5.1 times more measurements. The average percentage overlap for all three cases is 73%.
Autocalibration results
The estimated lens distortion parameters together with their standard deviations of the left and right cameras are shown in Tables 8 and 9, respectively. The results showed the focal length error estimates from COL and SRE2 were similar, but SRE1 performed the worst. SRE2 estimated the best principal point error, followed by COL and SRE1. Noticeably, ∆x _{ 0 } is better estimated than ∆y _{ 0 } because the distribution of the points on the images in the x component was more varied and thus benefited the ∆x _{ 0 } estimation. The results showed that the coefficients k _{ 1 } and k _{ 2 } accounted for most of the radial distortion. Their standard deviations were the lowest from SRE2, while they were similar in COL and SRE1.
Table 10 shows the estimated relative orientation parameters and their standard deviations. The results showed that the estimated baseline vectors were similar. The angles in the y and z components were also similar, whereas the x component was similar from COL and SRE2. The worst came from SRE1. Table 11 shows the total number of the used stereo points, number of parameters and the flop count for each of the autocalibration algorithms. COL used 3162 times more flops than SRE2. The most important finding in the tests was that despite the fact SRE2 used more measurements; it outperformed COL both in terms of accuracy and computational efficiency.
System calibration results
Table 12 shows the estimated leverarms, absolute scale factor and boresight angles of the stereo camera system. The results showed that the leverarms from COL and SRE2 were comparable, whereas the one from SRE1 was the worst. SRE2 estimated the best boresight angles followed by COL and SRE1. The absolute scale factors were similar in all three calibration results.
The standard deviation of the leverarm in the zcomponent was almost three times larger than the ones in x and y components as there was little variation in the vehicle’s pitch angle and the accuracy of the GPS position was worse vertically than horizontally. The leverarms were measured beforehand using a measuring tape at the accuracy of 0.1cm. Table 13 shows the difference between the estimated and the measured leverarms, and the standard deviations. The differences were within two times the standard deviations (95%) which showed that the leverarms were correctly estimated.
Conclusions
This paper presented a novel twostep camera calibration method in a GPS/INS/Stereo camera integrated kinematic positioning and navigation system. The first step performs the camera autocalibration for a stereo system by employing two scalerestraint equations to constrain the matched features from two consecutive stereo pairs. The lens distortion parameters, the uptoscale baseline length and the relative orientation between the two cameras are estimated using the leastsquares method. The second step performs system calibration where the autocalibration estimates are fused with the blended GPS/INS solution to recover the camera leverarms, the absolute scale of the camera and the boresight angles. The main advantage of the proposed novel method lies that it is free from landmark parameters and results in computation and memory savings. There are two main drawbacks in employing the scalerestraint equation over the collinearity equations for stereo autocalibration. Firstly the accuracy cannot be increased by performing loop closures when the same scene is revisited. Secondly the scalerestraint equation is highly nonlinear and therefore the LS estimator can diverge if a good approximation of the parameters is not available.
The results from the simulated and real road test data were presented and showed that the proposed autocalibration method requires less computation resources to achieve equal or better accuracy than applying the traditional collinearity equations despite the fact it using more measurements.
References
Alcantarilla PF, Yebes JJ, Almazàn J, Bergasa LM (2012) On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In: Proceedings at the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, Minnesota, USA, 14–18 May 2012:1290–1297. doi:10.1109/ICRA.2012.6224690.
Barazzetti L, Mussio L, Remondino F, Scaioni M (2011) Targetless Camera Calibration, International Archives of the Photogrammetry. In: ISPRS  International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXVIII5/W16, 2011. pp 335–342. doi:10.5194/isprsarchivesXXXVIII5W163352011.
Bender D, Schikora M, Sturm J, Cremers D (2013) Graphbased bundle adjustment for INScamera calibration. In: International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL1/W2, 2013UAVg2013, Rostock. doi:10.5194/isprsarchivesXL1W2392013.
Bethel JS (2003) Photogrammetry and Remote Sensing. In: Chen WF, Liew JYR (ed) The Civil Engineering Handbook, 2nd Edition, Chapter 56. CRC Press, Boca Raton, London, New York, Washington D.C, p 2170.
Brown DC (1971) CloseRange Camera Calibration. Photogramm Eng 37(8):855–866
Bin Rais N, Khan HA, Kamran F, Jamal H (2003) A new algorithm of stereo matching using epipolar geometry. In: Proceedings at the Multi Topic Conference, 2003. INMIC 2003. 7th International, Islamabad, Pakistan, 9–9 December 2003:21–24. doi:10.1109/INMIC.2003.1416609.
Civera J, Davison AJ, Montiel JMM (2008) Inverse depth parametrization for monocular SLAM. IEEE Trans Robot 24(5):932–945. doi:10.1109/TRO.2008.2003276
Civera J, Bueno DR, Davison AJ, Montiel JMM (2009) Camera selfcalibration for sequential Bayesian structure from motion. In: Proceedings at the IEEE International Conference on Robotics and Automation, ICRA2009, Kobe. doi:10.1109/ROBOT.2009.5152719.
Dang T, Hoffmann C, Stiller C (2009) Continuous stereo selfcalibration by camera parameter tracking. IEEE Trans Image Process 18(7):536–1550. doi:10.1109/TIP.2009.2017824.
DurrantWhyte H, Bailey T (2006) Simultaneous localization and mapping (SLAM): part I the essential algorithms. IEEE Robot Autom Mag 13(2):99–110. doi:10.1109/MRA.2006.1638022
Fraser CS (2013) Automatic camera calibration in closerange photogrammetry. Photogramm Eng Remote Sens 79(4):381–388
Ghosh SK (2005) Fundamentals of Computational Photogrammetry. Concept Publishing Company, New Delhi, p 117119
Gopaul NS, Wang JG, Hu B (2014) Discrete EKF with pairwise timecorrelated measurement noise for imageaided inertial integrated navigation. ISPRS Ann Photogramm, Remote Sens Spat Inf Sci 2(2):61–66. doi:10.5194/isprsannalsII2612014.
Gopaul NS, Wang JG, Hu B (2015) Multiframe Visual Odometry in ImageAided Inertial Navigation System. J. Sun at al. (eds) China Satellite Navigation Conference (CSNC) 2015 proceedings. Lecture Notes Electr Eng 342(3):649–658. doi:10.1007/978366466322_57.
Harris C, Stephens M (1988) A combined corner and edge detection. In: Proceedings of the Fourth Alvey Vision Conference (1998)., pp 147–151
Indelman V, Melim A, Dellaert F (2013) Incremental Light Bundle Adjustment for Robotics Navigation. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Tokyo. doi:10.1109/IROS.2013.6696615.
Kelly J, Sukhatme GS (2009) Visualinertial simultaneous localization, mapping and sensortosensor selfcalibration. In: Proceedings at the Computational Intelligence in Robotics and Automation (CIRA), 2009 IEEE International Symposium, Daejeon, South Korea, 15–18 December 2009: 360–368 doi:10.1109/CIRA.2009.5423178.
Kelly J, Matthies LH, Sukhatme GS (2011) Simultaneous mapping and stereo extrinsic parameter calibration using GPS measurement. In: Proceedings at the Robotics and Automation (ICRA), 2011 IEEE International Conference, Shanghai, China, 9–13 May 2011:279–286. doi:10.1109/ICRA.2011.5980443.
Keivan N, Sibley G (2014) Constanttime Monocular SelfCalibration. In: Proceedings at the IEEE International Conference on Robotics and Biomimetics (ROBIO) 2014. Bali, Indonesia, 5–10 December 2014:1590–1595. doi:10.1109/ROBIO.2014.7090561.
Konolige K, Agrawal M, Solà J (2011) Large scale visual odometry for rough terrain, In: The International Journal of Robotics Research 66:201212. doi:10.1007/9783642147432_18.
Lategahn H, Geiger A, Kitt B (2011) Visual SLAM for autonomous ground vehicles. In: Proceedings of the Robotics and Automation (ICRA), 2011 IEEE International Conference, Shanghai, China, 9–13 May 2011:1732–1737. doi: 10.1109/ICRA.2011.5979711.
Mirzaei FM, Roumeliotis SI (2008) A Kalman FilterBased Algorithm for IMUCamera Calibration: Observability Analysis and Performance Evaluation. IEEE Trans Robot 24(5):1143–1156. doi:10.1109/TRO.2008.2004486.
Qian K, Wang JG, Gopaul NS, Hu B (2012) Low Cost Multisensor Kinematic Positioning and Navigation System with Linux/RTAI. J Sens Actuator Netw 1(3):166–182
Remondino F, Fraser C (2006) Digital Camera Calibration Methods: Considerations and Comparisons. The International Archives of the Photogrammetry and Remote Sensing Commission V Symposium. Image Engineering and Vision Metrology. IAPRS 36(5):266–272, Dresden 25–27 September 2006
Scaramuzza D, Fraundorfer F (2011) Visual Odometry Part I&II: The First 30 Years and Fundamentals. IEEE Robot Autom Mag 18(4):80–92
Sibley G, Matthies L, Sukhatme G (2005) Bias Reduction and Filter Convergence for Long Range Stereo. In: 12th International Symposium of Robotics Research, San Francisco 28:285294. doi:10.1007/9783540481133_26.
Teller S, Koch O, Walter MR, Huang AS (2010) Ground robot navigation using uncalibrated cameras. In: Proceedings at the Robotics and Automation (ICRA), 2010 IEEE International Conference, Anchorage, USA, 3–7 May 2010: 2423–2430. doi:10.1109/ROBOT.2010.5509325.
Triggs N, McLauchlan P, Hartley R, Fitzgibbon A (1999) Bundle Adjustment—A Modern Synthesis. In: Proceedings at the International Workshop on Vision Algorithms ICCV 99. SpringerVerlag, pp 298–372. doi:10.1007/3540444807_21.
Veth M, Raquet J, Pachter M (2006) Stochastic constraints for efficient image correspondence search. IEEE Trans Aerosp Electron Syst 42(3):973–982. doi:10.1109/TAES.2006.4439212.
Wang JG, Qian K, Hu B (2015) An Unconventional Full TightlyCoupled MultiSensor Integration for Kinematic Positioning and Navigation, J. Sun at al. (eds) China Satellite Navigation Conference (CSNC) 2015 proceedings. Lecture Notes Electr Eng 342(3):753–765. doi:10.1007/9783662466322_65.
Williams B, Reid I (2010) On Combining Visual SLAM and Visual Odometry. In: Proceedings at the Robotics and Automation (ICRA), 2010 IEEE International Conference, Anchorage, USA, 3–7 May 2010: 3494–3500. doi:10.1109/ROBOT.2010.5509248.
Wolf PR, Dewitt BA (2000) Elements of Photogrammetry with Applications in GIS, 3rd edn.. ISBN 0072924543
Yu YK, Wong KH, Chang MMY, Or SH (2006) Recursive CameraMotion Estimation With the Trifocal Tensor. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 36(5):1081–1090. doi:10.1109/TSMCB.2006.874133.
Acknowledgement
The authors would like to acknowledge the financial support through research grants provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada under its RGPIN Program. We would also like to thank Applanix Corp. for using their POSGNSS software.
Authors’ contributions
NG carried out the camera autocalibration and system calibration research. NG conceived, designed, implemented and tested the algorithms presented in this paper. JW and NG collected the GPS, IMU and image data used for the tests. NG drafted the manuscript. JW reviewed and edited the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Appendix A
Appendix A
From the principles of vector analysis, any four vectors a, b, c and d in threedimensional space can be related to each other through
This implies,
Where the scalar multipliers a, b and d are
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Camera autocalibration
 Lens distortion
 Relative orientation
 Lever arms
 Boresight angles
 GPS
 IMU
 Scale restraint equation