Hexapod Robot
Categories: robotics, reinforcement learning
This is a hexapod robot that I am in the process of building.
This project is far from done, so the following is just a record of what I have done so far, what I am currently working on, and the plans for the future.
Mechanical Design
In order to limit the scope of the project, I decided to use off-the-shelf, quasi-direct-drive actuators, since designing actuators from scratch is a deep rabbit hole. I completed an initial version of the design using one actuator, but then I found the "CyberGear" actuators, which were by far the best performance for the price (12Nm max, 4Nm continuous, driver included, and only 85USD). Actuator price is particularly important for a hexapod because a total of 18 actuators are needed (compared to 12 for a standard quadruped design). Although 6 legs is technically redundant, the reduntant legs present opportunities for interesting robotics, such as extra stable locomotion (a hexapod can maintain 3 points of contact with the ground at all times with a decently fast gait), or hybrid locomotion and object manipulation with the extra legs (for example walking on 4 legs while using the remaining 2 to carry a box).
I decided to use a combination of laser cut metal parts and 3D printed plastic parts. The main structure is comprised of the sheet metal parts bolted directly to the mounting hole pattern on the actuators, while 3D printed parts add stiffness by filling gaps between sheet metal parts and serve as covers for wiring and other components. Another simplification I made was to co-locate the actuators with the joints they are driving. This design trades increased leg inertia for design simplicity - it removes the need for the complexity of some kind of transmission (e.g. a linkage or cables), but as a result there is more mass further from the base of the leg, which makes it harder to accelerate. I think the tradeoff is worth it, especially since the actuator isn't too far from the base of the leg, and the ends of the legs are also very lightweight already.
I got the sheet metal parts fabricated by SendCutSend, which is a great, affordable option for small quantity sheet metal fabrication if you are in the US. In order to optimize the cost of the sheet metal parts, there are only 3 unique sheet metal designs, since you can get discounted unit prices for multiple parts of the same design. This didn't require making any compromises in the design, since each of the legs are already identical, and hexapods are very symmetrical overall. Additionally, two of the designs are normal flat sheet metal parts, and the 3rd design - the structural parts connected to the actuator output flange - has a single bend in the design. The only post processing I had to do was to add some countersinks and tap some threads in the angle brackets.
Taking advantage of the flexibility of 3D printing, there are a lot more unique 3D printed parts. The 3D printed parts between sheet metal parts form a structure similar to a lamination, which has a high stiffness to weight ratio. The other 3D printed parts cover up rough edges as well as provide protected cable routing channels.
For the last link of the legs, I used a single carbon fiber tube held under compression between the ends with a few pieces of threaded rods. This construction method results in a very rigid and lightweight structure on the ends of the legs, where it is important to have low mass and inertia. Finally, to complete the legs, I used the squash balls as feet, as is traditional for medium-sized legged robots because they can cushion impacts, conform to uneven surfaces, and increase friction.
Software and Control
To control my hexapod, I plan on using deep reinforcement learning in simulation to train a proprioceptive locomotion policy. Proprioceptive is just a fancy term meaning that the control policy only knows information about The main inspiration for this project was the work by the Robotic Systems Laboratory at ETH Zurich developing locomotion policies for quadrupedal robots with deep reinforcement learning (a good sample of their work is available on their YouTube channel). The Robotic Systems Lab trains their control policies entirely within a simulation, and then deploys the trained policy directly onto the physical robot without any further training. Compared to model based methods, such as model predictive control (MPC), the deep RL models can be very robust to disturbances and can traverse surprisingly complex terrain without any knowledge of the environment or hardcoded reflexes. Simple proprioceptive control does have limitations, however; for example, it doesn't do well planning precise footsteps needed to traverse terrain such as stairs or gaps.
Using the full CAD model of the hexapod, I also created a simplified version of the geometry. I used this to create a mesh model that would be inexpensive to render many copies of in simulation. The simulator I decided to use was MuJoCo, a simulator designed for deep RL by DeepMind. It exposed a simple C API, didn't require any additional programs, services, or accounts, and is open source. As input, MuJoCo only requires a robot and scene description in a relatively simple file format that describes the joints, topology, and physical properties of the robot and the environment. Conveniently, CAD programs can calculate most of the mechanical properties you need relatively close to the physical robot, but proper system identification (creating an accurate simulation model) is a more involved process that I haven't explored yet. I had to manually create the robot and scene description file, but in some cases the process of converting a CAD model to a simulation model can be completely automated.
For the deep learning control policy, I implemented Proximal Policy Optimization (PPO) using the Torch library for neural net and gradient calculations. The control policy is a multi-layer perceptron (MLP) that takes proprioceptive inputs (i.e. joint positions, velocities, and accelerations, plus accelerometer and gyro sensor inputs), and outputs desired joint positions, similar to the simpler control policies used by the ETH Zurich lab. I am still working out a few bugs in the PPO implementation and tuning hyperparameters, and I haven't been able to train a successful control policy yet.
One issue I encountered while using MuJoCo is that while it is easy to simulate multiple copies of the same robot in parallel on multiple threads, it only supports rendering a single simulation. The there are also some optimizations I can do over the built-in renderer, such as instanced rendering. I wrote a custom renderer that collects all copies of identical meshes into batches and used an instanced renderer to render those batches all at once. There are actually only 4 unique meshes used to render all the robots, since all the robots are identical, and within each robot, all the legs are identical. Therefore, only 4 draw calls are required to draw any number robots. Although not strictly necessary, a fast renderer reduces the compute load on the GPU, which is already heavily abused for inference and training the control policy.
Simulating, running the policy inference, and running the policy optimization are the most computationally expensive parts of the program, and greatly benefit from parallelization. However, the parallelization works out in a slightly annoying way because of the API boundaries of MuJoCo and libtorch. The MuJoCo C API does not have any built-in parallelization, but each robot instance is separate, so the workload of stepping every robot instance can be split evenly among some worker threads. On the other hand, it is quite slow to evaluate the policy for each model separately because of the libtorch and GPU API overhead. Instead, its much more performant to combine the model inputs for all the robot instances together into a single tensor and then let the GPU parallelize over the whole thing when the model is evaluated. I alternate between batching all the robot instances to run the model inference and stepping the physics simulation separately for each instance in a thread pool to maximize performance.
After a specified number of simulation steps are performed, the data from those steps is collected, processed to calculate time-delayed rewards, and then used to update the model with gradient descent. Currently, the actual gradient descent calculations take up most of the computation time. My intuition says that my implementation is slow for some reason, but I haven't been able to figure out what I am doing wrong yet.
Electronics
The "CyberGear" actuators communicate over CANBUS, so for most of the wiring I can run just two power and two signal wires along each leg. That made it pretty simple to include small channels in the 3D printed covers to accomadate the minimal wiring needed for each leg. Unfortunately, a hexapod is a perfect star-topology network, which is not ideal for a CANBUS network, so instead I will split the legs into 3 pairs, each of which is a linear CANBUS network with the central microcontroller between a leg on each end. The microcontroller will only be used as an IO device between the sensors/actuators and the main computer; I chose a Pico-ICE combination microcontroller/FPGA because it has a lot of pins and can interface with pretty much any peripheral because of the FPGA.
Since I plan on deploying a deep learning model to control the robot, and eventually do some vision processing, I plan on using a NVIDIA Jetson TX2 I've had lying around for a while as the main computer. The TX2 will connect to the microcontroller via USB, and will send motor commands and receive motor status reports and sensor readings over that connection. I'm currently working on understanding the communication protocol for the actuators and writing a library to construct and parse messages.
For the time being, I am planning on running the robot tethered for power and control. This saves me from having to worry about building a battery system and keeping the battery charged while testing stuff.