About Me

I am CS PhD student in robotics & machine learning at the Robotics & Embedded Systems Lab (RESL) , advised by  Gaurav Sukhatme . ...

Sep 30, 2015

Localization and Path Planning

This year, I did not do ET Robocon for my summer project, and wanted to try something else. I took an opportunity to use my OMSCS course projects to do some fun robotics.

Localisation (Summer '15)

For my summer OMS course on mobile robotics, our team did a course project where a robot localised itself in a quasi-symmetric maze. We built the 9' x 9' maze and robot, shown below.

Actual maze

Maze schematic

Note that the maze is nearly symmetric. The only difference is that one block is boxed out in the top-right quadrant of the maze, whereas it is not boxed out in the bottom-left quadrant.

Custom built robot

First, we tried localisation using particle filters, where each particle was a pose for the robot. Since it is a localisation problem, the map is already known to the robot. I implemented my own particle filtering algorithm in MATLAB (and later, C++), using encoders, gyroscope and range sensor measurements. The motion model used the gyro and encoders to update the particle states. The measurement model used the range sensor measurements to calculate the likelihood of observation for each particle.

This is what it looked like for particle filters

Because of the nearly-symmetric nature of the maze, the robot has two likely locations for its actual position (bimodal distribution) until it reaches the unique point in the bottom-left quadrant at 3:32. The maze was made this way especially to test the algorithm on such challenging cases.

Once the robot converges to a unimodal distribution, its tracking is fairly straightforward from that point on, and it moves on to the top-left quadrant of the maze, from where it exits.

However, this took a fairly long time to compute, and had to be done offline. We addressed two things to make it faster. First, we switched MATLAB to C++.

Secondly, I optimised the particle filter by dynamically updating the number of particles generated at every resampling step. That way, I reduce the number of particles once the robot has localised itself - this is known as particle filters with KLD sampling, or Adaptive Monte-Carlo Localisation (AMCL). Here are the results from AMCL (video is sped up, please watch at 0.25x)

Notice the numParticles parameter in the bottom-right. It goes from ~40000 particles (very high uncertainty), to only 3000! It could've gone lower, but I set a lower limit for robust tracking.

After AMCL, I went a step further to improve the bimodal distribution we get in the initial stages. I implemented a clustering algorithm to get the two modes of the distribution, and return the two possible locations of the robot.

It worked satisfactorily, and you can see the two clusters coloured differently in the bimodal case.

Of course, this is all simulation - but we also ran our robot with PF localisation. My teammate created an path-planning solution using an MDP formulation with value iteration (In the fall, I created my own MDP-based path planner). Using localisation and path-planning, this is how the robot performed.

Pretty sweet, right? We put all this stuff together, and created a snazzy final submission video. Our TA told us that our instructor Sebastian Thrun showed this video to his colleagues at work!

Path-Planning (Fall '15)

Finally, I wanted to do the path planning thing myself and learn a thing or two about MDPs. So in the fall, I took machine learning and used an assignment in reinforcement learning to formulate the path planning in our maze  as an MDP, using BURLAP.

I then used techniques such as policy iteration and value iteration to solve the MDP. Here is the resulting policy map.

Policy map for value iteration

I then used reinforcement learning tools in BURLAP to solve the same problem using Q-learning and SARSA learning. Here is the Q-learning polcy that I submitted in the assignment.

Policy map for Q learning

Notice that the Q learning policy is not as good as value iteration. This is because in RL we don't know all rewards and transitions for all states beforehand, unlike value/policy iteration. I won't get into much more detail here, but I used this MDP to showcase the difference between value iteration and Q learning results, performance, etc.

This is how I was able to extend my project into my coursework and actually apply the course theory on my robots! Thanks for reading!