Where am I

Where am I? Project Writeup


The second project in term 2 of the Udacity Robotics Nano Degree program requires students to complete a hijacked robot scenario using ROS and Gazebo.

Students are to initially follow instructions for a building a reference robot model, before tuning the localisation parameters satisfactorily such that the robot within the Gazebo maze simulation can reach an end goal.

After completion of that, a new robot model is created with alterations to the base and position of sensors. Whereby it uses the same simulation and must reach the same end goal.


In this project a robot model has to use localisation to work out where it is. It creates a rolling local map using laser range sensors. The local map in turn is used to navigate towards a navigation goal.

The navigation stack utilised move_base. It provides a local cost map, as the robot moves, in relation to a global cost map to define a continuous path for the robot to move along.

This project utilises Gazebo to create a simulation with a map provided by Clearpath Robotics

Once the robot has reached the navigation goal, the objective of the project has been achieved.


A robot needs to understand where it is in a world, to be able to make navigation plans to get from point a, to point b, whilst avoiding obstacles. The process of understanding “where am I?” (from a robots perspective) is called localisation.

This project uses a localisation package built into ROS called Adaptive Monte Carlo Localisation (AMCL) to assist with the robot in a scenario to work out where it is. Hence the project name “Where am I?”.

AMCL is a variant of the Monte Carlo Localisation (MCL) which was learnt in the course material. MCL uses particles to localise the robot pose. It has several advantages over using Extended Kalman Filters (EKF) such as uses raw measurements (ie from lasers), is not reliant on gaussian noise, is memory and time efficient, and can perform global localisation.

The AMCL package adaptively alters the number of particles used, which has the advantage of reducing the computational overhead required.


We first completed this exercise using the class room example which is named udacity_bot. Then a different version was created where the base and sensor locations were changed. This was named nick_bot and was launched using udacity_world_nick.launch.

The final results of when it reached the goal state follow.


udacity_bot rviz
udacity_bot rviz


nick_bot rviz
nick_bot rviz

Model Configuration

This discussion is for the configuration of nick_bot.

The nick_bot was a square version of the rectangle based udacity_bot. The laser sensor was moved to the front of the robot.

The amcl_nick.lauch. Wherever possible the same configuration and parameters as the udacity_bot used amcl.launch.

min_particles was set to 25 and max_particles to 200 to not heavily utilise CPU. A higher max_particles did not improve initial ability of the robot to find itself with certainty.

odom_alpha1 to odom_alpha4 were trial & error values and changed from the default of 0.2 (there was not much documentation about).

The laser model parameters were left as default. There appeared no reason to change them as they were clearly visible on the above rviz visualisations and aligned to the barriers.

The yaw_goal_tolerance and xy_goal_tolerance were doubled from the default values to allow for additional flexibility in trajectory planning.


transform_tolerance was set 1.25 and update_frequency to 3.0 for both local and global.

obstacle_range was set to 1.5, raytrace_range to 4.0 and inflation_radius to 0.65 to enable sufficient space on the cost map for the robot to navigate.

robot_radius was set to 0.4 in this model to allow for the larger square design.


The local publish_frequency was set to 3.0 with the global to 5.0 This configuration in conjunction with 15.0 x 15.0 sized local cost map was able to function within the performance constraints of the system used. Increasing the size of the local cost map utilise significant more computer power and missed the time windows for publish_frequency.


the update and publish frequency were set per above. In addition the width and height were set to the map size.


sim_time was set to 4 as there appeared sufficient compute resource to estimate a trajectory out 4 seconds.

meter_scoring was enabled to ensure pdist_scale used meters.

pdist_scale was set to 0.5 being less than the default of 0.6. Default and higher values appeared to cause the robot to sometimes get stuck.


The robot model was able to on most runs navigate successfully to the goal. The route taken at times could have been shortened. However further research is required into the ROS packages used to be able to tune it to achieve such. Often the observed path taken did not appear to be the most cost affective and when it missed the target goal, it would do a large sweep before re-approaching the target.

The AMCL routine appeared to quickly gain certainty about the locality of the robot. It was the rest of the navigation stack and move_base that needed further tuning.

In the kidnapped robot problem where by a robot is positioned in an arbitrary location, AMCL would be able to adapt the number of particles used to gain certainty of the robot’s location. In addition AMCL does not rely on landmarks, but on laser based maps, laser scans and transform messages to output pose estimates.

Moreover in an environment, where by there weren’t known landmarks, the AMCL has advantages over the Extended or Unscented Kalman Filter based approaches. These environments would include those with no known map (ie its the first time being navigated) or in highly unstructured environments with lots of moving structures over time eg shopping centres with popup shops in the aisles with significant foot traffic.

Future Work

The size of the local cost map had a significant impact on performance. Higher values decreased the ability to publish within the frequency required. In addition higher particle numbers whilst increasing CPU load did not reduce the time taken for the AMCL to be certain about the robots locality.

A square designed robot model as opposed to a rectangular design appeared to enable the robot to rotate more affectively around its base. Other wheel components are available but they did not publish ODOM information. Hence additional sensors might be required with further investigation into the impact of the removal of ODOM readings required.

A laser GPU gazebo component was available. This may enable reduced CPU load by moving the workload to the GPU.

A LIDAR unit would give a more complete map of the world around the robot. One could also look to include more laser sensors to map behind and to either side of the robot. This should facilitate with the creation of a more complete motion plan without the need for the robot to map it first.

When implementing this type of project on real hardware, the mobile nature of a robot, requiring it also to contain its own power sources, means that efficient usage of a CPU and GPU are a must. The higher the utilisation, the less effective time the robot will have to perform activities.

Thus whilst implementing more sensors may provide a more detailed and accurate map, in the field this would further drain the power source. Hence careful consideration is needed for the number of sensors, the total power consumption of the sensors and compute work loads, as well as the impact the quantity of data has to the compute utilisation rates whilst mobile.

If there no significant improvement in performance for the design objective and operational goals of the robot, then a more minimalist sensor and localisation cost map configuration may be appropriate.


Robotic Inference Project Writeup


The first project in term 2 of the Udacity Robotics Nano Degree program requires students to initiate their own inference project inclusive of data acquisition. The project builds on the initial reference project for digit image recognition inside the supplied Nvidia Digits environment.

The project ideas are the student’s own and must have at least 3 classification categories eg defective item vs normal item with classes (no item, defective item, normal item).


Pedestrian and bicycle lanes are often crowded with many people not aware of or selectively ignoring the signage. It can lead, to an unsafe or hazardous environment, for all that use it with police officers reluctant to enforce the rules via fines.

The concept selected, in this project, was to classify an image as either containing a pedestrian, not-pedestrian or background.

The goal being, that some sought of visual representation via a screen with a smile or a frown, could be given to act, as a robotic traffic controller. Other potential instantiation could include a torso using upper body movement to signal good or bad behaviour.

Background / Formulation

During the initial inference task, on supplied data, GoogLeNet was chosen as it had a good inference rate per image with reasonable accuracy. Using the Adam Optimiser with an initial learning rate of .001, it was able to meet the numerical requirements of inference time below 10 ms with accuracy > 75%. The input used for this reference model in Nvidia’s DIGITS was 256×256 3 channel colour images.

Similar requirements of accuracy would be required for this inference project. It was not necessary to be a 100% accurate as a smiling or frowning face at least makes people think about what they are presently doing. It was not going to be used to issue fines or other enforcement notices. Video cameras would stream image data between 24-30fps which means that an Inception, VGG model and some ResNet models may be too slow for inference in real time.

With the additional perception that colour could also be useful in detecting pedestrian vs not-pedestrian, GoogLeNet was again chosen for this project using the Adam Optimiser with an initial learning rate of .001.

Other experimentation with using a higher initial learning rate of 0.01 with the above configuration over 5 epochs did not improve validation accuracy which remained around 50%. Similarly AlexNet over 5 epochs with the same Adam optimiser and learning rate did not increase accuracy. One experiment was performed using GoogLeNet with RMSProp optimiser and an initial learning rate 0.001 which did not result in improvement of validation accuracy but did have significantly higher training loss so was also not progressed.

Data Acquisition

A GoPro mounted on a tripod was used. It was positioned on the side of a pedestrian esplanade at Surfers Paradise, Gold Coast, Australia. As it was summer holidays a reasonable amount of varying traffic was expected. The background looked over the ocean to have a consistent image where there was not going to be movement (besides cloud) other then what was on the esplanade.

GoPro Data Acquisition
GoPro Data Acquisition

Three angles (facing left, centre and right) were used for capture per the following graphic:

Camera Background Angles
Camera Background Angles

The GoPro was setup in wifi mode for time lapse capture, which was controlled via an iPhone. Initially 2 seconds elapsed was used, which eventually was dropped to 0.5. Using the GoPro time-lapse feature, meant that individual jpeg files were captured as opposed to a MP4 video.

Using the iPhone to control the control, meant that I could visualise what was coming before starting the next capture batch.

After the capture the images were manually placed into a directory for each category.

Image data was captured for the three categories background (322), pedestrian (349) and not-pedestrian (94). It became apparent at this time that not enough not-pedestrian image data had been captured. This was mainly due to the initial 2 seconds elapsed time used. Due to high heat and humidity of the Australian summer, in the following afternoons, it was not practical to capture more data from the same spot.

Skateboarders were placed in the pedestrian category.

An example of a pedestrian and not-pedestrian follows. Background examples are as above.

not-pedestrian example
not-pedestrian example
pedestrian example
pedestrian example

A jupyter notebook was used to create a generator to supplement the data by randomising the image brightness, randomly flipping the images vertically and jittering the images randomly
in the x (by or – 25 pixels),y (by or – 50 pixels) planes to create supplemental image data.

The images were also resized to 256 x 256 and saved as PNGs.

The final generated supplemental data had 2000 not-pedestrian, 1000 pedestrian with 1000 background images.


The initial inference task, on supplied data, GoogLeNet was chosen as it had a good inference rate per image with reasonable accuracy. Using the Adam Optimiser with an initial learning rate of .001, it was able to meet the numerical requirements of inference time below 10 ms (~5 ms actual) with accuracy > 75% (75.40984% actual).

During training of the initial inference task 100% validation was achieved per the following training graph after 5 epochs.

Training Graph
Training Graph

However similar training results were not achieved for this inference project on captured data. The following training graph after 10 epochs follows

Project Training Graph
Project Training Graph

This had a validation accuracy of ~50%.

The following are the results of two randomly selected images per classification category uploaded as original high-res jpeg images from the GoPro.

Inference Background Sample
Inference Background Sample
Inference Not-Pedestrian Sample
Inference Not-Pedestrian Sample
Inference Pedestrian Sample
Inference Pedestrian Sample

The indicative inspection suggests that there was insufficient data to get a result > 75% for this project at this time. It also appears that the model as trained can not distinguish between pedestrian and not-pedestrian but can distinguish a background image.

Inference times were not tested separately as GoogLeNet is known to have a fast inference time which would be sufficient for this project.


The dataset collected did not have enough sample images. This was as a result of using time-lapse with too high a value. In hind site, a combination of 0.5-1 sec time-lapse for slow moving pedestrians with >30 FPS video for higher speed moving non-pedestrians would of allowed for more data.

Of note is when there is a combination of non-pedestrian and pedestrian in the same frame led to the though of potentially using object detection first to find a window to classify. This was not implemented in this version however it would have led to a more accurate ability to classify as the background would be eliminated ie if no objects detected it must be a back ground image.

For this project the duration of a consistent display of say 2 to 3 seconds to the passing pedestrian and non-pedestrian traffic would drive the ultimate inference time required. It would suggest that it needs to be an average classification of a 1 second or two when leading up to where the video is captured.

Of note were skateboarders. There is only the skateboard (which has a low profile in the image) that distinguishes it from a pedestrian as velocity is not taken account with single images.

In addition the depth (away from the camera) of the traffic passing by changes the size of the object that needs to be classified. Pre-filtering and zooming these to a consistent size may improve accuracy.

Future Work

Whilst the project did not achieve a good train validation result, it has laid the foundation for future iterations. There is potential to capture more data and use object detection to refine the training and inference steps of the project.

Providing soft means to monitor and influence peoples decisions regarding signage and the associated rules for safe usage of (non-vehicle) transit paths would potentially be received positively by the community. Police forces lack the budget or people to enforce these rules and they are reluctant to issue minor infringement notices (with potential to destroy good will in the community). Hence a more subtle robotic person could improve the situation in a cost effective means.

The size of this market is unknown. However speed cameras using smiley and sad faces are used around the local area where I live. They are having a positive impact on driver behaviour in the areas deployed. Thus there is a market for a more community friendly and automated means to impact on behaviour.

3D Perception Project


Gazebo PR2 3D Perception
Gazebo PR2 3D Perception

The goal of this project was to create a 3D Perception Pipeline to identify and label the table objects using the PR2 RGBD (where D is Depth) camera.

Exercise 1, 2 and 3 Pipeline Implemented

For this project the combined pipeline was implemented in perception_pipeline.py

Complete Exercise 1 steps. Pipeline for filtering and RANSAC plane fitting implemented.

The 3D perception pipeline begins with a noisy pc2.PointCloud2 ROS message. A sample animated GIF follows:

Noisy Camera Cloud
Noisy Camera Cloud

After conversion to a PCL cloud a statistical outlier filter is applied to give a filtered cloud.

The cloud with inlier ie outliers filtered out follows:

Cloud Inlier Filtered
Cloud Inlier Filtered

A voxel filter is applied with a voxel (also know as leaf) size = .01 to down sample the point cloud.

Voxel Downsampled
Voxel Downsampled

Two passthrough filters one on the ‘x’ axis (axis_min = 0.4 axis_max = 3.) to remove the box edges and another on the ‘z’ axis (axis_min = 0.6 axis_max = 1.1) along the table plane are applied.

Passthrough Filtered
Passthrough Filtered

Finally a RANSAC filter is applied to find inliers being the table and outliers being the objects on it per the following

# Create the segmentation object
seg = cloud_filtered.make_segmenter()

# Set the model you wish to fit

# Max distance for a point to be considered fitting the model
max_distance = .01

# Call the segment function to obtain set of inlier indices and model coefficients
inliers, coefficients = seg.segment()

# Extract inliers and outliers
extracted_inliers = cloud_filtered.extract(inliers, negative=False)
extracted_outliers = cloud_filtered.extract(inliers, negative=True)
cloud_table = extracted_inliers
cloud_objects = extracted_outliers

Complete Exercise 2 steps: Pipeline including clustering for segmentation implemented.

Euclidean clustering on a white cloud is used to extract cluster indices for each cluster object. Individual ROS PCL messages are published (for the cluster cloud, table and objects) per the following code snippet:

    # Euclidean Clustering
    white_cloud = XYZRGB_to_XYZ(cloud_objects)
    tree = white_cloud.make_kdtree()

    # Create a cluster extraction object
    ec = white_cloud.make_EuclideanClusterExtraction()

    # Set tolerances for distance threshold
    # as well as minimum and maximum cluster size (in points)
    # Search the k-d tree for clusters
    # Extract indices for each of the discovered clusters
    cluster_indices = ec.Extract()

    # Create Cluster-Mask Point Cloud to visualize each cluster separately
    #Assign a color corresponding to each segmented object in scene
    cluster_color = get_color_list(len(cluster_indices))

    color_cluster_point_list = []

    for j, indices in enumerate(cluster_indices):
        for i, indice in enumerate(indices):

    #Create new cloud containing all clusters, each with unique color
    cluster_cloud = pcl.PointCloud_PointXYZRGB()

    # Convert PCL data to ROS messages
    ros_cloud_objects = pcl_to_ros(cloud_objects)
    ros_cloud_table = pcl_to_ros(cloud_table)
    ros_cluster_cloud = pcl_to_ros(cluster_cloud)

    # Publish ROS messages

Complete Exercise 3 Steps. Features extracted and SVM trained. Object recognition implemented.

Features were captured in the sensor_stick simulator for [‘biscuits’, ‘soap’, ‘soap2’, ‘book’, ‘glue’, ‘sticky_notes’, ‘snacks’, ‘eraser’] model names with 40 sample of each captures.

hsv color space was used a combination of color and normalised histograms per

# Extract histogram features
chists = compute_color_histograms(sample_cloud, using_hsv=True)
normals = get_normals(sample_cloud)
nhists = compute_normal_histograms(normals)
feature = np.concatenate((chists, nhists))
labeled_features.append([feature, model_name])

The colour histograms where produced with 32 bins in the range (0, 256) and the normal values with 32 bins in the range (-1, 1.).
The full training.set was used in train_svm.py where I replaced the standard sum.SVC(kernel='linear') classifier with a Random Forest based classifier.

clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
                           min_samples_split=2, random_state=0)

It dramatically improved training scores per the following normalised confusion matrix:

normalised confusion matrix
normalised confusion matrix

The trained model.sav was used as input into the perception pipeline where for each cluster found in the point cloud, histogram features were extracted as per the training step above and used in prediction and added to a list of detected objects.

# Make the prediction, retrieve the label for the result
# and add it to detected_objects_labels list
prediction = clf.predict(scaler.transform(feature.reshape(1, -1)))
label = encoder.inverse_transform(prediction)[0]

Pick and Place Setup


labeled test1 output
labeled test1 output



labeled test2 output
labeled test2 output



labeled test3 output
labeled test3 output



It was interesting to learn about using point clouds and to learn this approach. I found occasionally there was some false readings. In addition few of the objects were picked and placed in the crates (the PR2 did not seem to grasp them properly). This may mean that further effort is needed to refine the centroid selection of each object.

Whilst I achieved average ~90% accuracy, across all models, on the classifier training, with more time spent, I would have liked to have achieved closer to 97%. This would also improve those false readings. I’m also not sure I fully understand the noise introduced in this project from the PR2 RGBD camera.

If I was to approach this project again, I’d be interested to see how a 4D Tensor would work via deep learning using YOLO and/or CNNs. Further research is required.

Not all software development is the same

Having completed, as one of the first, the Udacity Self Driving Car Nano Degree in October 2017, I thought I’d share some of the things I learnt along the way.

Rather than recap the projects, over the three terms, I’m going to focus in this post, on the philosophy and attitude I developed, to complete the nano degree program.

When I was first accepted, my initial reaction was geez have I bitten off more than I can chew. How am I going to cope with the mathematics and the theoretical side. It was a major concern.

Whilst at school I always had excelled at maths, and it was what led me initially into computing at a young age. I used to love writing graphics routines and optimising them. As my maths skills improved I learnt new ways of drawing circles and objects. I can’t remember if I got into Vectors. Yet past school, having started working in corporate IT, I had little use for maths skills besides that which was needed for Accounting. Yes for a number of years, IT dumbed down my maths skills.

IT was more focused on entering data, storing it and reporting on it at some monthly and yearly aggregate levels. Sure I worked on near realtime and mission critical systems but the need for very strong maths skills was limited. It was not a choice of my own, it was just that the technology that did leverage Maths, was perceived as scientific or too risky to adopt by business. It just didn’t have priority or urgency. Or if it was implemented it was a black box, that you supplied some input to, and you just consumed the output.

Getting back to the Self Driving Car Nano Degree, it was these black boxes that were our projects. In the project we needed to create the black boxes, to understand the theory and the mathematics.

Before starting the Nano Degree, I brushed up on matrices and vectors using the Kahn Academy.

Occasionally I got a little stuck on the mathematical proofs but once I understood the code for the maths, I normally was ok. Yes my brain now works off of code, not maths. We experienced some numerical instability, which was normally solved by interacting with others on the slack channels.

It was hard at times being the first going through the material. However with patience, with continuously reviewing the material, it reinforced what was being taught. You had to be methodical and test each assertion you were making about your code. Sometimes it required taking the algorithm and implementing in a repeatable test case inside a Jupyter Notebook. I found visualising the data improved understanding and helped to identify if anything was erroneous.

You could spend ages looking at the code and not see any obvious mistake. Without visualising the output, an easy mistake such as an incorrect sign in a rotation matrix, was not easy to observe.

The most valuable tool for when you got stuck was slack and your fellow students. These fellow students were online at all hours of the day, from across the globe.

After a few projects, I soon found an approach, that worked for me. It boiled down to learning, writing some code, seeing what happened, fixing what was broken, validating my learning and repeating until I had a project that met requirements.

Getting stuck, sometimes meant having a break, or having a late night. If I was really into tuning, it often meant the late night. Tweaking and trying different settings to get the Neural Network or Algorithm to achieve what you needed, was addictive. It was so much better, then reading or watching a video. The impact of changing your code was visible, in most projects in the simulator.

Your code didn’t produce a report, it produced observable action! It was like when I was a kid programming graphics for the first time.

So if your the type that likes to write those black boxes that other programmers use, you’ll excel at this Nano Degree. If your the type that consumes black boxes, that others have written, you may need to change your outlook.

Search and Sample Return

Robotics Nano Degree

Udacity - Robotics NanoDegree Program

Rover simulator output
Rover simulator output

The goal of this project were to use perception and decision steps to control a rover in a simulator. Perception occurs via using computer vision techniques to determine navigable terrain and then make decisions to take Action on the rover.

Its the first project of the Robotics Nano Degree program. I ran my simulator in 1600×1200 resolution. Different resolution may impact on the performance of the model in this project.

Notebook Analysis

The first step was to perform some analysis in a jupyter notebook on sample/calibration data.

Run the functions provided in the notebook on test images (first with the test data provided, next on data you have recorded). Add/modify functions to allow for color selection of obstacles and rock samples.

This step involved loading the data

Calibration data with grid
Rock Sample

and then doing a perspective transform to get a birds eye view.

Warped Example
Warped Rock

A function color_thresh was provided to do color thresholding (defaulted to RGB channels > 160). It was used as the basis to create an obstacle_thresh method (which selected the inverse ie RGB color channels <= 160). A rock_thresh method was created that selected between min and max color channels. The image color channels are converted from RGB to YUV before being used via warped_rock_yuv=cv2.cvtColor(warped_rock, cv2.COLOR_RGB2YUV).

Warped Threshed (white shows what is navigatable)
Warped Threshed (white shows what is navigatable)
Obstacle Threshed (white shows obstacle)
Obstacle Threshed (white shows obstacle)
Rock Threshed (white shows rock)
Rock Threshed (white shows rock)

Populate the process_image() function with the appropriate analysis steps to map pixels identifying navigable terrain, obstacles and rock samples into a worldmap. Run process_image() on your test data using the moviepy functions provided to create video output of your result.

1) Define source and destination points for perspective transform
dst_size = 5 
bottom_offset = 6
source = np.float32([[14, 140], [301 ,140],[200, 96], [118, 96]])
destination = np.float32([[img.shape[1]/2 - dst_size, img.shape[0] - bottom_offset],
              [img.shape[1]/2   dst_size, img.shape[0] - bottom_offset],
              [img.shape[1]/2   dst_size, img.shape[0] - 2*dst_size - bottom_offset], 
              [img.shape[1]/2 - dst_size, img.shape[0] - 2*dst_size - bottom_offset],
2) Apply perspective transform

a warped image is created using the source and destination points from above warped = perspect_transform(img, source, destination)

3) Apply color threshold to identify navigable terrain/obstacles/rock samples

The thresh_min and thresh_max values were determined via an interactive cell in the notebook.

threshed = color_thresh(warped)
obstacle_threshed = obstacle_thresh(warped)
warped_yuv=cv2.cvtColor(warped, cv2.COLOR_RGB2YUV)
thresh_min=(0, 38, 153)
thresh_max=(145, 148, 170)
rock_threshed = rock_thresh(warped_yuv, thresh_min, thresh_max)
4) Convert thresholded image pixel values to rover-centric coords
xpix, ypix = rover_coords(threshed)
xpix_obst, ypix_obst = rover_coords(obstacle_threshed)
xpix_rock, ypix_rock = rover_coords(rock_threshed)
5) Convert rover-centric pixel values to world coords
world_size = data.worldmap.shape[0]
scale = 12
xpos = data.xpos[data.count]
ypos = data.ypos[data.count]
yaw = data.yaw[data.count]

xpix_world, ypix_world = pix_to_world(xpix, ypix, xpos, ypos, yaw, world_size, scale)
xpix_world_obst, ypix_world_obst = pix_to_world(xpix_obst, ypix_obst, xpos, ypos, yaw, world_size, scale)
xpix_world_rock, ypix_world_rock = pix_to_world(xpix_rock, ypix_rock, xpos, ypos, yaw, world_size, scale)

note: data.count contains the current position in index for the video stream.

6) Update worldmap (to be displayed on right side of screen)
for obstacle_x_world, obstacle_y_world in zip (xpix_world_obst, ypix_world_obst):
    data.worldmap[obstacle_y_world, obstacle_x_world, 0]  = 1
for rock_x_world, rock_y_world in zip (xpix_world_rock, ypix_world_rock):
    data.worldmap[rock_y_world, rock_x_world, 1]  = 1
for navigable_x_world, navigable_y_world, in zip(xpix_world, ypix_world):
    data.worldmap[navigable_y_world, navigable_x_world, 2]  = 1
7) Make a mosaic image

A mosaic image was created showing the rover camera image, warped image, ground truth (with rover location and direction arrow) and another ground truth (showing the current obstacle and navigable mapping)

Test video follows

Test Mapping Video

Test Mapping Video MP4

Autonomous Navigation and Mapping

Fill in the perception_step() (at the bottom of the perception.py script) and decision_step() (in decision.py) functions in the autonomous mapping scripts and an explanation is provided in the writeup of how and why these functions were modified as they were.


This step utilised the efforts from the notebook analysis, described above. The Rover Worldmap was not updated if there was observable pitch or roll (eg > or – 1 degree).

In addition rover polar coordinates were derived and saved against the passed Rover object for both navigable areas and observed rocks (if no rocks observed set to None).


This is the challenging part of the project.

stop and forward were the two default rover modes supplied. For this project stuck, rock and reverse were added.

forward was modified to have a left hugging biases by adding 65% of the standard deviation of the navigable angles, as long as there had been some travel time either initially or after being stuck.

The rover enters stuck mode if the rover stays in the same position, whilst not picking up a rock, for 5 seconds. If still stuck after 10 seconds, then reverse mode is tried. After 15 seconds, stuck and reverse are reset before trying stop mode.

stuck mode tries rotating if there is an obstruction in front, moving forward if steering not locked full left or right whilst going slow, and breaking if steering is locked full left or right. It will reset to forward if movement is restored.

reverse mode rotates randomly between 30 and 180 degrees after setting the brakes and reducing velocity to zero. Once its within or – 15 degrees it sets mode to forward. If reverse mode is in-affective to sets it to stop mode.

If a rock is observed, some false positives are ignored, as well as distant rocks before being placed into rock mode. Whilst the rock is not close, it tries to navigate closer towards it before breaking or coasting closer. The algorithm still requires more refinement.

Note: All my testing and running in Autonomous mode was done at 1600×1200 resolution.

Path Planning

In this Path Planning project, we guide a vehicle to drive down a highway in a simulator. The vehicle must do so, without collision, staying in the right three lanes, without speeding, minimising acceleration & jerk and effectively staying within its current lane, unless overtaking.

lane change manoeuvre
lane change manoeuvre

Valid Trajectories

The program was able to drive for the simulator for over an hour without incident. After that point I terminated the project. Occasionally there are incidents when you turn into a lane and your vehicle may be side swiped or rear ended. This seemed to be an issue with the robotic cars.

The speed limit was set at 50 MPH. Above that causes a violation. In the program, speed was set slight below at 47.5 MPH.

The spline technique supplied in the walkthrough was effective at ensuring max acceleration in m/s/s and Jerk in m/s/s/s are not exceeded.

The car in normal driving does not collide with other vehicles. Whilst in a lane, it tries to match either the speed of the vehicle in front or the average for the lane. Whichever is lowest.

The vehicle stay consistently in the centre of each 4 meter wide lane. Frenet coordinates greatly assisted with this.

A cost function was implemented that had a bias towards staying in the existing lane, matching the average lane speed, avoiding collisions and having a buffer for other vehicles in the proposed lanes. If the best lane found, was still the existing lane with a closely vehicle, then speed was adjusted to match.


I spent significant time, before the walkthrough video was released, creating another solution. This used the behaviour planning and Jerk Minimising Trajectory approaches in the lessons. Code can be found on github here. I couldn’t quite get all the components working together. I mainly had issues generating jerk free trajectories in global coordinates from a pipeline of freenet coordinates even after creating waypoint splines. At some point I may review this approach again. I found it somewhat difficult to debug using a simulator. In hindsight if I started it again, more focus would be placed into unit test cases.

The final approach I used to submit the project for review was based on the walkthrough video. I added some lane speed & nearest approach calculations to provide input for a basic cost function as described above.

I really like the spline approach, using coordinates converted into car space and sampling to create a smooth trajectory. The code for this project although not the neatest or elegant, is able to effectively drive the vehicle in the simulator to meet requirements.

Model Predictive Controller Project

This MPC (Model Predictive Controller) project, was the last in term 2 of the Udacity Self Driving Car Engineer Nanodegree.

Simulator output
Simulator output

This MPC (Model Predictive Controller) project, was the last in term 2 of the Udacity Self Driving Car Engineer Nanodegree.


The Model

For this project (github repo) we used a global kinematic model, which is a simplification of a dynamic model that ignores tire forces, gravity and mass.

The state model is represented by the vehicles position, orientation angle (in radians) and velocity.
State Model from Course notes

A cross track error (distance of vehicle from trajectory) and an orientation error (difference of vehicle orientation and trajectory orientation) were also included in the state model.

Two actuators were used, delta – to represent the steering angle (normalised to [-1,1]) and a – for acceleration corresponding to a throttle, with negative values for braking.

The simulator passes via a socket, ptsx & ptsy of six waypoints (5 in front, 1 near the vehicle), the vehicle x,y map position, orientation and speed (mph).

This data after being transformed into the vehicle map space, with new cross track error and orientation error calculated, is then passed into the MPC (Model Predictive Control) solve routine. It returns, the two new actuator values, with steering and acceleration (i.e. throttle) and the MPC predicted path (plotted in green in the simulator).

Constraint costs were applied to help the optimiser select an optimal update. Emphasis was placed on minimising orientation error and actuations, in particular steering (to keep the lines smooth).

   // Reference State Cost
    // TODO: Define the cost related the reference state and
    // any anything you think may be beneficial.
    // The part of the cost based on the reference state.
    for (int i = 0; i < N; i  ) {
      fg[0]  = CppAD::pow(vars[cte_start   i] - ref_cte, 2);
      fg[0]  = 2 * CppAD::pow(vars[epsi_start   i] - ref_epsi, 2);
      fg[0]  = CppAD::pow(vars[v_start   i] - ref_v, 2);

    // Setup Constraints
    // NOTE: In this section you'll setup the model constraints.
    // Minimize the use of actuators.
    for (int i = 0; i < N - 1; i  ) {
      fg[0]  = CppAD::pow(vars[delta_start   i], 2);
      fg[0]  = CppAD::pow(vars[a_start   i], 2);

    // Minimize the value gap between sequential actuations.
    for (int i = 0; i < N - 2; i  ) {
      fg[0]  = 20000 * CppAD::pow(vars[delta_start   i   1] - vars[delta_start   i], 2);
      fg[0]  = 10 * CppAD::pow(vars[a_start   i   1] - vars[a_start   i], 2);

Timestep Length and Frequency

The MPC optimiser has two variables to represent the horizon into the future to predict actuator changes. They are determined by N (Number of timesteps) and dt (timestep duration) where T (time) = N * dt.

To help tune these settings, I copied the mpc_to_line project quiz, to a new project mpc_to_waypoint, and modified it to represent the initial state model to be used with the Udacity simulator. I was able to get good results looking out 3 seconds, with N = 15 and dt = 0.2. The following output are plots of 50 iterations from the initial vehicle state:
initial tuning program output

It seemed to be tracking quite nicely but speed was very slow.

However what I found, is that a horizon out 3 seconds in the simulator seemed to be too far. The faster the vehicle, the further forward the optimiser was looking. It shortly started to fail and the vehicle would end up in the lake or even worse airborne.

I tried reducing N and increasing dt. Eventually, via trial and error, I found good results where N was 8 to 10 and dt between ~0.08 to ~0.105. I eventually settled on calculating dt based on Time/N (with time set at ~.65 seconds and N on 8). If I saw the plotted MPC line coming close to the 2nd furthest plotted waypoint at higher speeds, it started to correspond, with the MPC optimiser failing.

The reference speed also played a part. To drive safely around the track, to ensure the project meets requirement, I kept it at 60 MPH.

Polynomial Fitting and MPC Preprocessing

An example plot of the track with the first way points, vehicle position and orientation follows:
Waypoints plotted with vehicle

To make updating easier and to provide data to be able to draw, the waypoints and the predicted path from the MPC solver, coordinates were transformed into vehicle space. This meant also that the initial position of the vehicle state, for the solver was (0 velocity in KPH * 100 ms of latency,0), which included a projection of distance travelled to cover latency, with a corresponding angle orientation of zero. These coordinates were used in the poly fit. It had an added benefit of simplifying, the derivative calculation required for the orientation error.

The following plot is the same waypoints transformed to the vehicle space map, with the arrow representing the orientation of the vehicle:
waypoints in vehicle space

Model Predictive Control with Latency

Before sending the result back to the simulator a 100ms latency delay was implemented.


This replicated the actuation delay that would be experienced in a real-world vehicle.

I experimented with trying to understand if the ratio of dt (time interval) to latency in seconds, being near 1 (i.e. the time interval was close to the latency value), had an impact on the ability of the MPC algorithm to handle latency. Anecdotal evidence supported that; but in reality ratio values of < 1 (for this project, I had (.65/8)/.100 = .0.8125) were the reality to ensure the optimiser was able to find a solution.

As described in the previous section, the vehicle position was projected forward, the distance it would travel, to cover 100ms of latency.

However before I implemented the forward projection for latency, you could see in places where the vehicle lagged a little in its turning. The MPC, however predicted the path correctly back onto the centre line of the track per following image:
steering lag

After I implemented the latency projection calculation, the vehicle was able to stay closer to center, more readily per this image:
latency projected

Over all the drive around this simulator track, was smoother and lacked steering wobbles, when compared to using a PID controller.