0:47

And the black points which is the feature points we used.

And then we superimposed the trajectory on a map of Philadelphia.

The only input we used was this panoramic camera,

the panoramic video you see on the top left.

1:22

In biological perception, we talk about path integration.

This is what animals and humans do when they don't have reference points,

when they don't place cognition.

They just integrate their path, and they know approximately how far they went.

1:46

What is visual odometry?

Visual odometry is really the process of incrementally

estimating your position to orientation with respect to an initial reference

by tracking only visual features.

It sounds very similar to the bundle adjustment.

The difference is that the bundle adjustment can have very large baselines,

it can be from different cameras, it can be random images in the web,

while the visual odometry is usually from a camera which you either hold.

Or it is mounted on a robot and because it is taken as a video,

we can really exploit this trajectory.

2:35

We also use the term visual slam.

And many people use it interchangeably but when we say visual slams,

we also put the focus not on your projectory but also on the feature map.

Like the map of the visual features when they triangulated in the world.

2:58

Is like widen your fields with many advances in the last 15 years,

it's not getting textbooks but there is a very good reference tutorial by

David Raskino Musa, and very recently on December 2015,

there is the IC big workshop on the future of realtime slum and they really held you

to visit the website of this workshop and look at all the slides.

3:25

The most successful application of visual odometry is probably on the planet Mars.

NASA has sent already three vehicles.

Even from the very first one the speed and

the protunity will head to solve the following problem.

Even there was some remote control from the earth, you know that move

the vehicles, the delay of sending a command to the Mars might take up to

20 minutes, so there is no way to really drive this vehicle with a joy stick.

3:55

Now, how can a vehicle navigate on the Mars?

There is no GPS there, so

the only thing we can do is really apply visual optometry.

So we might send some wave points where the robot has to go.

But between these two wave points the robot has to solve the visual

optometry problem.

Another big success with visual optometry is this vacuum cleaner called

Dyson 360 which uses implementation of Andrew Davison visual slam,

it uses an on directional system at 360 degrees eye system which

captures this panoramic picture.

And then using features natural features in the environment.

It can find its position and then transverse a regular pattern while

knowing at every point where it is with respect to the first frame.

4:50

Now let's go back again to our equations, to the multiple views setting.

We had calibrated projection points xp and yp for

frame f and we have it is a non deposes, rt and the 3D points, xyz.

In the visual odometry given

an estimate rk tk of the current camera pose as well as the 3D points.

And having also the correspondence to calibrated point projections

we really need to update every time.

So we have a point at kdk, we have a time point dk

dispose and one to updated to the next time point.

5:35

And when we say visual odometry by default we refer to monocular visual

odometry just using one camera and this means that when we don't use any

other censor we're still having unknown global scale.

There's is done in two steps.

One for the rotation and one for the translation.

5:56

First when an incoming email so time to k plus one is in and

we find features, we try to find the right correspondences.

These correspondences have many outlier so we need to apply a RANSAC in order

to select the inliers and usually we do i t with what we called a minimal problem.

In this case by choosing five points and

a sampling over five points and then applying the five point algorithm.

6:27

After we find the inliers, we solve for

the polar geometry, which means we find the center of matrix A.

And then we can obtain a rotation estimate.

And the rotation estimate is really sufficient in order to update for

the rotation.

6:43

And we find also a translation as to be made but

is really not enough though because we don't know it's scale.

So we cannot really apply this last equation.

For the translation what we really need is an estimate of the 3D points.

So we need first a triangulation of the 3D points and then we

can update the translation by just using a PNP algorithm, a 2D to 3D algorithm.

In this point of a 2D to 3D we have the option

why they update together the rotation translation or only translation.

7:19

So this is the main cycle of visual odometry,

we always have essential matrices between the points.

We compute relative translation rotation with the two successive images and

then we need to integrate because we use this pairs of subsequent frames,

depending on the base line, depending on how many features we can track.

This might become very vulnerable to what we call the drift.

The main problem of visual odometry and to really address this drift, what we do.

As we group a window of frames like the last end frames and

we apply a bundle adjustment.

8:15

And this will create excellent local map.

Using the same back projection error that we have used in battle adjustment.

Using pretty much these two equations in a non-linear linear square setup.

Now when we apply it as global filter over the whole sequence we have.

8:49

And we use the exponential of the symmetrical of

angular velocity update the rotation.

And then we have an update for the translation using a velocity,

we assume that both velocities, the angular and the translation, are constant.

9:06

For any filter approach, like the Kalman filter,

we need to also update the covariances, which are really estimates of the error.

We really need a good propagation of the error.

9:17

First to make sure that we have some idea about how uncertain we are.

You might have seen these big circles around the GPS position

when your GPS measurement is unknown.

Is the same what we're going to do with the visual geometry but

also first we really want to know how uncertain is our structure as we will see

in the next slide.

So if big sigma is our covariance and sigma k,

k minus one is the covariance between frame k minus frame k our updates

that then usually by pre and post-multiplying the Jacobian,

where Jacobian has exactly the same meaning as the parallel adjustment.

We can update the covariance and

we can really visualize within an ellipsoid for the 3D points, and

ellipsoid for the position and some other presentation for the rotation.

10:54

We have a quite large uncertainty depth.

But when we move forward, if we are lucky, to really

still track the same point, we can have with this very large baseline,

we can have a very small uncertainty ellipsoid.

And in this case, this really the point to update our translation estimate.

The frames when these are called keyframes.

And they are very important in the visual odometry implementations.

11:33

Outliers appear because of illumination changes,

because of occlusions, because we might are moving very very fast.

And you see here an example of the trajectory that we constructed

if we don't do any inlier selection, which is the blue, and

after we really select good inliers, which is the red trajectory.

11:56

They're inliers because also a drift because they really cause biases and

in the rotation translation estimation and for

the first time in 2004 David Nested who invented the 5-point

algorithm provided us also with a solution to solve for

the inlier problem in the two of your case.

Now, choosing a quintuples,

these groups of five points appearing ransack might be very expensive,

so we need some alternative and the alternative different game with

an invented here at the at the University of Minnesota.

Which is that if you know a direction, like the gravity from the IMU, or

just the point at infinity, then you already know two degrees of freedom

of the rotation, and the remaining problem has three degrees of freedom,

the yaw angle and two for the translation direction.

So what you do is every time before you solve for answer you're lying with

this direction, for example the gravity and then you solve for

constraint problem which has in the rotation part only one angle is unknown.

The way you see it here and then you have the asymmetric matrix to translation

which is only two unknown section y.

And because you know only direction of this xy you just set it

as cosine filtered sign theta, and you have to solve a system

with the four equations and forum knowns for the three quotes.

This can be solved much faster than the five point algorithm and

we can obtain a much better run section solutions.

Without spending most of our time in the inlier selection.

14:02

And you see with your eyes that they are the same point.

Then you really have to enforce in your system that this image has been seen and

actually that any error that you have like your estimated pose in this picture

has to be corrected and come to the same position where you started, for example.

14:22

This is an essential element of every visual odometry algorithm.

And it has two steps.

The first step is that you look in the vicinity of every

pose you are whether there is somewhere around there in.

Which means whether you're visiting the same place.

And we do it with feature level.

For example the vocabulary trace.

And then, we just apply geometric consistency.

And might be also a bundle adjustment in order to correct all our poses so

that we are again at the correct pose and

we don't create a phantom by hallucinated that we are in a different position.

15:08

So these are the basic, actually,

ingredients of every digital odometry algorithms to repeat them here.

That we would bundle adjustment over a window to minimize the drift.

That we do a keyframe selection to really minimize the triangulation error.

That we apply RANSAC for five points or

three points in order to select the inliers.

And last, that if we are revisiting places,

we really have to adjust our position with what we call visual loop closing.

15:39

Now newer systems use additional information.

One of the systems here from the University of Minnesota

uses a combination of inertial and visual elements.

And you can see on the left, the incoming emails and on the right, the trajectory.

This is using just the regular cell phone and

the inertial measurement unit inside the cell phone.

16:11

And inertial measurement units measures the acceleration and the angular velocity.

And the acceleration which is really meters per second square when combined

in the integration with the velocity allows us to estimate our post in terms

of meters, not in terms of an unknown global scale, as we have seen before.

17:01

And in this case, you can see that we have the feature trucks on the right.

And they will project their reconstructed trajectory on the left, including,

in blue, all the features which were reconstructed up to this point.

17:17

Another recent development, aimed at visual odometry,

is the semi direct approach, where in addition to features we use

directly the whole image when we have the projection error for the motion.

This is quite impressive video for a quadrotor and

we see on the right that a construction of the projector of the vehicle and

also the points that are constructed from the ground and

the points the way they're seen and tracked in the picture in the bottom left.

Probably the most recent and

successful application is the realization of visual inertial odometry

on the Project Tango,which started as a small cell phone from Google.

And now it is a tablet.

And captured in within on the directional image and

initial information of the trajectory of this tablet.

You see on the left the initial measurements and

on the right you see that a constructed trajectory the features are not that

many, they are here like 70 or 80 or 100 and

we can produce a quite accurate trajectory off of the tablet.

18:39

Now what is the future of Visual Slam?

In the future, Visual Slam, in addition to features and this information,

we can really include some semantic information.

For example, we recognize the doors.

And we have some model about door recognition.

In this case here, taken inside the computer science building,

we see the construction of the trajectory of the camera using a features,

but as well as a doors which you see the recognition and

these bounding boxes as well as actually any chairs in the environment.

19:17

Symantec information does not only help on for

using a symantec mapping where are the doors and where are the chairs.

But also allows us to solve very efficiently the visual look important.

Visual adometry is application that we're going to use everywhere

inertial navigation wherever there is no GPS and probably also in many virtual

reality set ups where we're in a the head of a user.