This is only a preview of the November 2025 issue of Silicon Chip. You can view 37 of the 104 pages in the full issue, including the advertisments. For full access, purchase the issue for $10.00 or subscribe for access to the latest issues. Items relevant to "RP2350B Computer":
Items relevant to "Power Rail Probe":
Items relevant to "Modules: Large OLED Panels":
Items relevant to "Digital Preamp & Crossover, Pt2":
Purchase a printed copy of this issue for $14.00. |
HUMANOID & AN
Agility Digit
www.agilityrobotics.com
Boston Dynamics Atlas
https://bostondynamics.com/atlas
Unitree H1
www.unitree.com/h1
Tesla Optimus
www.tesla.com/en_eu/AI
Like many ideas that started as science fiction, humanoid and android robots are
now a reality. They have not yet been perfected – but they are here. We’ll likely
see them entering widespread use over the next couple of decades.
V
ideo phones, vertically landing rockets, artificial intelligence (AI) – not
long ago, these things were purely
in the realm of science fiction. But
now they are everyday technologies. Humanoid robots aren’t very far
behind.
Traditional robots, typically found
in factories, are mostly stationary and
perform repetitive tasks. In contrast,
humanoid robots functionally resemble people.
Android robots are humanoids
designed to very closely resemble
humans, to the point of being almost
indistinguishable from us. So far, no
robot has been developed that is truly
indistinguishable from a human, but
some can pass superficial inspection.
Examples of such androids include the
14
Silicon Chip
Japanese Actroid-DER and the South
Korean EveR-4, both of which we will
discuss later.
This series comprises two articles;
this first one will discuss the general
aspects of and technology behind
humanoid robots, while the follow-up
next month will cover a range of robots
that are currently in development,
being demonstrated or in use.
Why humanoid robots?
Humanoid robots are ideal for working in spaces designed for humans.
Unlike conventional robots that are
designed for a specific range of tasks
and are often stationary, humanoid
robots can, if sufficiently advanced,
do anything a human can do. They
can have many flexible joints and
Australia's electronics magazine
high mobility. They don’t have to be
the same size as a human; they can
be smaller or larger as required for
their job.
Some examples of jobs that humanoid robots are ideal for are:
O Caring for hospital patients
O Construction work
O Customer service (eg, retail)
O Handling inquiries in public
places like airports and train stations
O Hotel check-in staff
O Domestic duties (eg, housework)
O Factory floor work (assembly,
moving objects and inspections)
O Warehouse work
O Risky, dangerous or unpleasant
tasks
The rate of advancement of
humanoid robots is rapid due to the
siliconchip.com.au
NDROID ROBOTS
Part 1: by Dr David Maddison, VK3DSM
Figure 02
www.figure.ai
1X NEO Gamma
www.1x.tech/neo
convergence of improved mechanical
design, artificial intelligence, faster
computer chips and advances in computer and chip architectures.
Humanoid robots can address
labour shortages and our ageing population, as well as perform dirty, undesirable, repetitive tasks that humans
don’t want to. They’ll do it 24/7, more
precisely and for no pay. This has
resulted in a greatly increased demand
for such robots.
The future use of humanoid robots
raises ethical concerns, but that has
always been the case with the introduction of more advanced automation, even since the time of the Industrial Revolution. People tend to move
on to other forms of employment if
displaced. Also, despite incredible
advances, the robots are not taking
over; not yet, anyway...
What is a humanoid robot?
There is no strict definition, but
siliconchip.com.au
Apptronik Apollo
https://apptronik.com/apollo
typically a humanoid robot features a
human-like appearance, including two
arms, two legs, a head, a torso and a
size similar to humans.
They are designed to mimic human
behaviour. This mimicry stems from
their ability to move, converse and
provide information, express emotions through facial expressions and
perform natural language processing (NLP) using artificial intelligence
(AI), enabling conversations and
instruction-giving.
Their movements are designed to
enable useful tasks, such as picking
up, carrying and placing objects, while
AI allows them to receive instructions
or engage in conversation, distinguishing socially engaging robots from those
used purely for industrial purposes.
Parts of a humanoid robot
The main components of a humanoid robot are:
1. The body structure incorporating
Australia's electronics magazine
Booster Robotics T1
www.boosterobotics.com/robots/
limbs, a torso and a head, usually made
from aluminium or plastic composites.
2. Motors (actuators) and joints,
with the motors acting as the ‘muscles’.
3. Sensors, such as cameras (eyes),
microphones (ears), gyroscopes and
accelerometers (as in vestibular part
of the human ear) and touch sensors.
4. A ‘brain’ comprising three key
parts.
a The main computer processor,
which acts as the central hub. It
is responsible for overall control,
coordinating the robot’s actions
by running AI software.
b AI software serves as the ‘mind’,
enabling advanced tasks like
recognising objects, perception,
learning from experience, making decisions and planning movements. This can include a large
language model (LLM) and/or
vision language model (VLM).
This AI software might use the
main processor or may also run on
November 2025 15
specialised hardware like GPUs
(graphics processing units) or
TPUs (tensor processing units) for
efficiency and speed, and is typically trained using machine learning rather than just programmed.
c Microcontrollers are distributed
throughout the robot, managing
specific hardware subsystems like
motors in the arms and sensors in
the hands, in real time, ensuring
precise control under the main
processor’s guidance
5. A power source, such as a battery pack.
6. Wireless communications systems.
Actuators and joints
Actuators for humanoid robots may
be hydraulic, pneumatic or electric.
There are also small actuators for facial
‘muscles’, for robots capable of facial
expression.
Electric actuators are the favoured
types these days due to their compactness, lightness, simplicity, quietness and good power-to-weight ratio.
They usually use DC motors or servo
motors, often with reduction gears to
increase torque.
Fig.1 shows a typical commercially
available actuator from ASBIS that
could be employed in a humanoid
robot. It has an EtherCAT Ethernet
controller, a pair of 19-bit encoders
to enable precise rotation accuracy, a
high-torque brushless DC motor, clutch
brakes that lock the actuator in the
event of power loss, a harmonic reducing gear, and a cross roller to ensure
rigidity for axial and radial loads.
Fig.2 shows the RH5 experimental humanoid robot and the range
of movements of its joints possible
with its particular types of actuators,
along with the symbols used to represent them. This robot has a total of
34 degrees of freedom (DoF) – see the
panel at lower right.
Communication and
networking
Humanoid robots need to communicate for a variety of reasons,
such as to receive updated instructions, updated software, teleoperation (remote control by a human),
remote processing for complex tasks,
progress tracking, fault monitoring or
other reasons.
They can connect wirelessly via
common means such as 5G, Bluetooth,
Zigbee, IoT, WiFi and MQTT (Message
Queuing Telemetry Transport). Voice
communication with humans is possible using a speaker and microphone;
spoken instructions can be interpreted
using natural language processing by
an LLM.
Power supplies
Humanoid robots are mostly powered by lithium-ion batteries. Some
are in the form of a removable battery pack that is quickly swappable
to avoid significant downtime while
the robot recharges.
Some robots, such as early Boston
Dynamics robots intended for military use, used on-board petrol or
diesel generators, as a military robot
cannot be quickly recharged in the
field. However, we are not aware of
any military humanoid robots under
development that use internal combustion engines.
Robot designers take care to ensure
robots use power efficiently to maximise their use time between charges
Fig.1: a commercially available actuator that can be used in a humanoid robot.
Source: www.asbis.com/aros-robotic-actuators
16
Silicon Chip
Australia's electronics magazine
or battery swaps. Systems are being
developed to allow humanoid robots
to connect to a charger or change battery packs themselves.
Processors
Neural networks are the basis of
human and animal brains, and are
important for artificial intelligence
and humanoid robots. They are flexible and can learn and model new
and changing relationships that are
non-linear and complex. They are thus
highly suitable for tasks like speech
and image recognition.
Artificial neural networks (ANNs)
can be either modelled in software
or hardware (or as biological circuits
in some experimental arrangements).
There are several types of processors
that can be used to power AI for robots
(or in general) including:
O CPUs (central processing units),
as used in regular computers
O GPUs (graphics processing units)
O TPUs (tensor processing units)
O Neuromorphic processors
In addition to AI functions, hardware subsystems may be controlled
by other processors.
The CPUs used in humanoid robots
are very powerful and, while not specifically designed for AI purposes, can
still satisfactorily run AI software. A
CPU may also be used in combination
with another type of processor.
GPUs, or graphics processing units,
were originally developed for graphics applications but have been adapted
for neural networks in artificial intelligence due to their ability to handle
many calculations at once. This parallel processing is essential for training AI to perform tasks like vision in
humanoid robots.
Widely recognised as the world
leader in AI chips, NVIDIA uses GPUs
as the foundation of its AI technology.
NVIDIA’s AI chips, such as the H100,
A100, RTX series GPUs, Grace Hopper Superchip GH200, and Blackwell
accelerator architecture, are optimised
with NVIDIA’s CUDA (compute unified device architecture) software for
parallel computing.
A popular choice for humanoid
robots is the NVIDIA Jetson series
platforms. These processor modules
integrate an energy-efficient ARM
CPU and GPU, and can be used for AI
tasks, such as image recognition and
deep learning.
A TPU, or Tensor Processing Unit,
siliconchip.com.au
Fig.2: the actuation and morphology of an RH5 humanoid robot. The red, green and yellow symbols represent the
type of joints: S: Spherical, R: Revolute, P: Prismatic, U: Universal. Source: https://cmastalli.github.io/publications/
rh5robot21ichr.pdf
is a specialised chip designed and
developed by Google, optimised for
machine learning. Unlike CPUs and
GPUs, which evolved for AI from general computing and graphics roles,
TPUs were built from scratch for this
purpose.
They excel at matrix operations, a
core component of neural networks,
and demonstrate superior performance in tasks like training large models, outperforming CPUs and GPUs in
specific low-precision workloads.
TPUs are used in applications such
as natural language processing, image
recognition for navigation and recommendation systems, powering Google’s AI services. They show great
promise for use in humanoid robots,
where their efficiency could enhance
real-time vision and decision-making.
Still, only one current humanoid
robot, Gary (described next month) is
known to use them.
Neuromorphic processors are
designed to emulate the structure and
function of a human brain, although
they are not nearly as complex. They
employ mixed analog and digital processing to generate neural signals,
providing radically different computational outcomes than traditional digital computing using von Neumann
architectures.
siliconchip.com.au
The experimental iCub humanoid
robot (described next month) is a robot
said to use such a processor.
This biological-style approach
results in a more energy efficient processor, with some architecture like
that of a brain.
Examples of neuromorphic chips
include Intel’s Loihi, IBM’s TrueNorth
and NorthPole, BrainChip’s Akida and
the SENeCA (Scalable Energy-efficient
Neuromorphic Computer Architecture) research chip.
Neuromorphic processors use spiking neural networks (SNNs), where
information is processed as discrete
spikes in a manner similar to biological neurons, rather than continuous
activation, as with artificial neural networks (ANNs).
Degrees of freedom (DoF)
Degrees of freedom means the number of independent motions a robotic
appendage such as an arm or a leg can make. The more degrees of
freedom it has, the more flexible and useful it is.
Consider a very simple robot arm. A single robot joint such as a wrist that
can rotate about one axis (yaw) represents one DoF. Shoulder joints that can
move on two axes (pitch and yaw) add two more DoF. A hinged elbow joint
allowing flexion/extension is another DoF. So a simple robot arm that has a
shoulder, elbow and wrist would have four DoF.
The hand (or other gripping mechanism) does not count, as it is
considered the ‘end-effector’, the component that is being manipulated. DoF
typically only refers to joint motions, not the internal components of the endeffector.
For robot arms, six DoF is the minimum required for full position
and orientation control of the end-effector. A count of seven or more
is considered ideal. The more DoF a robot has, the more mechanically
complex it becomes and the more advanced the required control algorithms
and training become.
A human arm has seven DoF: three in the shoulder (flexion/extension,
abduction/adduction & internal/external rotation), one from the elbow
(flexion/extension), one from the forearm (pronation/supination) and two
from the wrist (flexion/extension & radial/ulnar deviation).
Australia's electronics magazine
November 2025 17
Fig.3: a model of a proposed humanoid robot with an ‘organoid’ brain.
Source: www.datacenterdynamics.com/en/news/chinese-scientists-developartificial-brain-to-control-brain-on-chip-organoid-robot/
Neuromorphic processors are not
yet widely adopted currently due to a
lack of hardware maturity, challenges
integrating them with existing ecosystems, the need for new programming
paradigms and the lack of computational power compared to other processors.
Other processors
To relieve the computational burden from the rest of the robot’s ‘brain’,
control of some hardware such as a
hand or knee joint may be performed
by small integrated computer chips
called microcontrollers.
In some cases, field-programmable
gate arrays (FPGAs) and application-
specific integrated circuits (ASICs) are
used for very high-performance tasks,
such as complex motion control algorithms. These offer specialised hardware acceleration for particular tasks,
improving efficiency and real-time
performance.
Organic ‘brains’
Neural networks can also be
built with biological neurons. One
Melbourne-based company has developed an experimental “wetware” computer, although it has no current application in humanoid robots (www.abc.
net.au/news/science/104996484).
Researchers are also looking at neural networks made from human cells.
For example, researchers at Tianjin
University and the Southern University of Science and Technology in
China have interfaced human brain
cells onto a neural interface chip to
make a neural network ‘brain’ that can
18
Silicon Chip
be trained to perform tasks.
This brain has not yet been incorporated into a robot as proposed
(Fig.3), but brain cells on a chip were
stimulated and trained to navigate
environments and grip objects when
interfaced to an external robot. The
collection of brain cells is called an
‘organoid’, and is not a real human
brain, but possesses the neural network architecture of one and is about
3mm to 5mm in diameter.
The size limit is imposed due to the
inability to vascularise the cells (incorporate blood vessels). If this hurdle is
overcome, much large structures can
be fabricated. Of course, there are ethical implications of using human cells
for such applications.
Skin materials
Silicone elastomers (rubbers) are
commonly used for the skin of humanoid robots with realistic facial and
other features. They are soft and highly
deformable, like real human flesh and
skin, plus they can be readily coloured
and moulded and formulated for particular properties. Fig.4 shows an
example of a silicone skin on a humanoid robot chassis.
Many companies make silicone
products. One that we came across
that might have suitable products is
Smooth-On (www.smooth-on.com).
A team of researchers at Aalto University and the University of Bayreuth
have developed hydrogel skin materials. Hydrogel is a gel material that
contains a high proportion of water.
It is soft, pliable and moist, much like
skin and flesh.
Australia's electronics magazine
Fig.4: an example of silicone skin
on a humanoid robot chassis, the
discontinued Robo-C2 from Promobot.
Source: https://promo-bot.ai/robots/
robo-c/
These researchers developed hydrogel materials suitable for the skin of
realistic humanoid robots. They are
even self-healing, so any cut or other
minor damage will repair itself; see
www.nature.com/articles/s41563025-02146-5
Another concept under development is ‘electronic skin’. This emulates
human skin, with the ability to sense
pressure, temperature, deformation etc
using flexible electronics embedded
into a silicone or similar matrix (see
our article on Organic Electronics in
the November 2015 issue; siliconchip.
au/Article/9392).
Incredibly, as a proof-of-
concept
project, scientists from the University of Tokyo, Harvard University and
the International Research Center for
Neurointelligence (Tokyo) have made
a robot face using cultured living
human skin (Fig.5), although it would
no doubt soon die without associated
nourishment. That is scarily reminiscent of The Terminator.
Operating systems and
frameworks
Operating systems (OS) for humanoid robots are specialised software
that extend beyond traditional computer operating systems. They integrate real-time processing and AI for
the robot’s ‘brain’.
These systems orchestrate critical
tasks, including the real-time control of actuators, sensors and power
systems, as well as balance, locomotion, environmental interaction and
task planning. They rely on real-time
operating systems (RTOS) like the
siliconchip.com.au
open-source FreeRTOS or RTEMS
to ensure low-latency, deterministic
responses for precise sensor-actuator
coordination.
Complementing these operating
systems, ‘middleware’ facilitates communication between diverse software
components. For instance, the data
distribution service (DDS) in opensource ROS 2 (Robot Operating System 2), a widely used robotics framework, enables modular, scalable, and
interoperable data exchange.
Frameworks like ROS 2 and NVIDIA
Isaac (based on ROS 2) provide structured environments to integrate AI and
manage robotic functions.
Most humanoid robots use opensource Linux-based operating systems,
such as Ubuntu with ROS 2 or RTLinux with built-in real-time capabilities, due to their flexibility and compatibility with AI frameworks.
These systems support advanced AI
algorithms, including LLMs like various GPT models, for natural language
understanding; VLMs, like CLIP (Contrastive Language-Image Pretraining),
for scene and object recognition; and
reinforcement learning for optimising
movement and acquiring new skills.
This enables continuous learning and
adaptation in dynamic environments.
For example, the ROS 2 framework,
running on Linux, powers robots like
Boston Dynamics’ Atlas for dynamic
locomotion and manipulation. NVIDIA’s Isaac platform, built on ROS 2,
supports AI-driven perception and
control in robots like Tesla’s Optimus
and Figure’s Figure 01 for human-robot
collaboration.
Together, Linux-based operating
systems and frameworks like ROS 2
enable humanoid robots to perform
diverse tasks, from industrial automation to assistive caregiving, with precision and adaptability.
Fig.5: human skin grown for proposed use on humanoid robot. The mould is
on left, the skin on the right; the eyes are not real. Source: www.cell.com/cellreports-physical-science/fulltext/S2666-3864(24)00335-7
Simulation platforms
NVIDIA’s Isaac Sim (see https://
developer.nvidia.com/isaac/sim) is a
robotics simulation platform built on
the Omniverse framework. It can be
used to create digital ‘twins’, ie, virtual
replicas of physical robots, including
humanoids, to train AI and test software as well as avoiding damage to
people or robots if a real robot was
used – see Fig.6.
Digital twins help train neural networks (eg, those in foundation models like RT-2X; discussed later) on
siliconchip.com.au
Fig.6: a robot simulation from NVIDIA Isaac Lab, which is related to NVIDIA
Isaac Sim. Source: https://developer.download.nvidia.com/images/isaac-lab1980x1080.jpg
simulated sensor data. For operating
systems, they test software stability
(eg, real-time control loops), and validate algorithms (eg, path planning).
Isaac Sim simulates sensor inputs
(eg, cameras and gyroscopes) and
interactions with objects, both crucial for AI development. It integrates
Australia's electronics magazine
with robotics frameworks like ROS 2,
aligning with operating systems and
software used in robots like Tesla’s
Optimus or NASA’s Valkyrie.
These digital twins enable robot
learning and simulation by replicating real-world physics and sensor
data, supporting the development
November 2025 19
of operating systems and algorithms
for tasks like movement and object
interaction.
Besides Isaac Sim, other notable
alternative simulation platforms that
we don’t have space to delve into
include:
O Gazebo (open source)
https://gazebosim.org/home
O Webots (open source)
https://cyberbotics.com
O CoppeliaSim (commercial)
www.coppeliarobotics.com
O MuJoCo (open source)
https://mujoco.org
Each excels in specific areas. Gazebo
has great community support; Webots
is perfect for industry, education and
research; CoppeliaSim is flexible, with
diverse capabilities; and MuJoCo has
advanced physics simulation.
Other software
The Python programming language
is widely used for robot control and AI
implementation in humanoid robots. It
simplifies managing actuators, sensors
and motion planning, often alongside
C++ in frameworks like ROS. Python’s
extensive libraries, like TensorFlow
and PyTorch, support developing and
deploying AI models for tasks like
vision and decision-making.
Besides Python, other programming
languages used for humanoid robots
include C++ and C for control, MATLAB for research, Java for middleware,
and the emerging Rust. Each complements Python, addressing specific
needs in AI training, OS stability and
software validation.
Other operating systems used with
humanoid robots worth mentioning
include:
O HarmonyOS 6, an operating system developed by Huawei, with its AI
Agent Framework, is showing promise for operating and training robots.
Examples of variations or adaptations
of HarmonyOS in robotics include
Kuavo, with possible variants like
M-Robots OS or iiRobotOS, reflecting
its customisable nature. HarmonyOS
is used to operate and train the Walker
S humanoid robot (described next
month) developed by UBTECH Robotics for tasks like quality inspections at
Nio’s factory.
Fig.7: the SynTouch
BioTac multimodal
tactile sensor for
use in robot fingers
that can detect
force, vibrations
and temperature.
Source: https://wiki.
ros.org/BioTac
Fig.8: some of the uses of foundation models.
Source: https://blogs.nvidia.com/blog/whatare-foundation-models/
20
Silicon Chip
Australia's electronics magazine
O HumaOS (www.humaos.org), a
real-time, pre-emptive operating system designed for advanced humanoid
robotics, enabling human-like cognitive processing and precise motor
control. It is optimised for modern
robotics hardware and neuromorphic
processors, is developer-friendly and
has comprehensive safety protocols
and fail-safes. It runs on a real-time
Linux core.
Sensors and perception
Humanoid robots must be able to
sense and map their environment.
They can use sensors and navigation
systems including cameras, GPS/
GNSS, IMUs (inertial measurement
units), lidar, microphones and tactiles.
Tactiles (Fig.7) are sensors, such as in
the fingertips of the robot, that measure pressure, temperature and vibration (possibly more).
They may be composed of smaller
sensing elements called tactels. A
tactel (tactile element) is an individual sensing element that is part of a
sensor array, analogously equivalent
to an individual nerve on a human
fingertip. Human fingertips have
about 465 sensory nerves per square
centimetre. A tactel sensor array can
provide high-resolution sensing and
could, for example, sense the texture
of an object.
Training robots
The basis of training humanoid
robots lies in the use of foundation
models (Fig.8). These large-scale AI
models are trained on vast amounts of
real-world data, such as videos from
sources like YouTube, to learn specific tasks or a range of activities. This
enables them to perceive and understand their environment, make decisions and perform tasks.
For example, a foundation model
might be trained on thousands of videos of pouring coffee, extracting the
essential generic steps to replicate the
task, even if the exact scenario differs
from its training data.
Foundation models can be trained
with text, images, videos, speech, or
other structured data. Key advantages
include reduced development time
for new applications, greater flexibility and adaptability and the ability
to generalise skills from one task to
another. This is unlike task-specific
programming, which has limited reuse
possibilities.
siliconchip.com.au
Fig.9: the model framework of GO-1. In this case, it is learning to hang a T-shirt. LAM stands for latent action model.
Source: https://agibot-world.com/blog/go1
Foundation models rely on neural
networks, which mimic how human
and animal brains operate. Individual
neurons are relatively simple, but their
collective communication enables
complex behaviours.
Especially in foundation models,
parameters (see panel) are used to
measure the model’s complexity and
learning capacity, acting as ‘adjustment knobs’ that alter the weighting
of connections between neurons and
biases that allow independent operation for generalisation.
A larger number of parameters
enhances the ability to handle complex data but requires more computational resources. However, excessive parameters may cause the model
to memorise training data rather than
learn underlying patterns, necessitating careful design to optimise performance and adaptability to unfamiliar
situations.
Foundation models include large
language models (LLMs), vision language models (VLMs), vision language
action (VLA) systems, image models,
audio models, or multimodal models.
LLMs and VLMs are the most commonly used in humanoid robots due to
their language and vision capabilities.
Examples of LLMs include OpenAI’s
GPT-3 (with 175 billion parameters),
xAI’s Grok, and Google’s Gemini (with
undisclosed parameters), trained on
vast text datasets like books and web
content.
These models enable tasks such as
interpreting commands like “walk to
the door”, with the ‘large’ part reflecting their vast number of parameters
that capture complex language patterns. Not all LLMs qualify as foundation models; for instance, a smaller
siliconchip.com.au
LLM trained only for a specialist task
lacks the broad, general-purpose training or adaptability required.
Vision-language models, such as
OpenAI’s CLIP and Google’s PaLI,
combine image recognition with natural language understanding, allowing them to identify objects like a “red
cup” based on descriptions.
An LLM can work cooperatively
with a VLM, where the VLM provides
the perceptual context of a visual scene
and the LLM interprets and responds
to commands based on the scene’s contents. For example, RT-2X from Google DeepMind uses a VLM for image
understanding and a reasoning module for task execution, enabling actions
like picking up an object based on a
verbal command.
In a robot, the LLM and VLM could
run on separate hardware modules
coordinated by a central controller,
or be combined into a single multimodal foundation model run on one
module. The latter is seen in models
like PaLM-E (https://palm-e.github.
io), which blends language and vision
for action.
Humanoid robots said to incorporate combined LLMs and VLMs
include Tesla Optimus, Figure 02 running Helix, and Walker S.
Examples of foundation models
include:
AgiBot GO-1 is designed to be the
general-purpose ‘brain’ of humanoid
robots, to help them learn and adapt.
GO-1 uses vision language models
(Fig.9), in which massive amounts of
real-world images and videos are fed
to the models, training them how to
perform specific tasks.
The model algorithms then convert the data into a series of steps,
enabling them to perform the required
tasks. The system can form generalisations from the training data (videos
of humans doing things), enabling it
to perform tasks similar to what was
Parameters in artificial intelligence
Parameters in artificial intelligence models are a critical component,
allowing the model to learn and represent associations between concepts.
They include weights, biases, attention scores and embedding vectors. For
example, a weight is adjusted during training to associate “cat” with “meow”
rather than “bark”.
Biases are extra adjustments to weights that set the tone of a sentence,
such as promoting “great day” toward a positive tone based on its typical
associations in the training data.
Attention scores determine which parts of a sentence the model focuses
on. For instance, in “The cat, not the dog, meowed”, the model prioritises
“cat” and “meowed”, ignoring “dog” as the action’s source.
Embedding vectors are numerical representations of words in higherdimensional space. During training, a word like “happy” is shifted closer to
“joy” and farther from “sad” based on how often they appear together in the
training data. AI is only as good as its training data and will incorporate any
of the biases present in its training materials. As the saying goes, “garbage
in, garbage out”.
Australia's electronics magazine
November 2025 21
shown, not just the exact tasks shown.
The overall GO-1 framework comprises the VLM, the MoE (Mixture
of Experts) and the Action Expert.
The MoE contains the Latent Planner, which learns action patterns
from human behaviour (as observed
in videos etc) to build comprehension. The Action Expert is trained
with over a million real-world robot
demonstrations and refines the execution of tasks.
The VLM, Latent Planner and
Action Expert cooperate to perform
actions. The VLM process image data
to provide force signals (to understand the forces involved in various
actions), the required language inputs
to perform tasks, and understand the
scene.
Based on outputs from the VLM, the
Latent Planner generates Latent Action
Tokens and generates a Chain of Planning. The Action Expert then generates fine-grained sequences of action
based on the outputs of the VLM and
the Latent Action Tokens.
GO-1 is a generic platform that can
be used in a variety of robots. In https://
youtu.be/9dvygD4G93c it is possible
to see some of the ‘thought’ processes
the robot goes through as it performs
various tasks.
AutoRT (Fig.10), developed by
Google DeepMind, is a research project and an experimental AI training system for scalable, autonomous
robotic data collection in unstructured real-world environments. It
enables robots to operate in “completely unseen scenarios with minimal human supervision”.
It integrates VLMs for scene and
object interpretation, and LLMs for
proposing tasks (eg, “wipe down the
countertop with the sponge”), plus
robot control models (RT-1 or RT-2)
for task execution. The robot’s tasks
are self-generated and work as follows
(from https://auto-rt.github.io):
1. The robot maps the environment
to generate points of interest, then
samples one and drives to that point.
2. Given an image from the robot
Fig.10: how AutoRT works for a basic group of tasks.
Source: https://auto-rt.github.io/
22
Silicon Chip
Australia's electronics magazine
camera, a VLM produces text describing the scene the robot observes. The
output is forwarded to an LLM to generate tasks the robot could attempt.
3. Tasks are filtered via self-reflection
to reject tasks and categorise them into
those that need human assistance, and
those that do not.
4. A valid task is sampled from the
filtered list and the robot attempts it.
5. The attempt is scored on how
diverse the task and video are compared to prior data, and the list is
repeated.
In trials, AutoRT has utilised multiple robots in multiple buildings, up to
20 simultaneously, with 52 tested over
seven months to perform diverse tasks
like object manipulation, collecting
77,000 trials across 6,650 unique tasks.
A ‘robot constitution’ ensures safety
by filtering tasks to avoid humans or
hazards, complemented by force limits and human-operated kill switches.
This enables robots to gather training data autonomously and safely,
improving their adaptability to novel
scenarios.
NVIDIA’s GR00T (Generalist Robot
00 Technology) is a research initiative
and development platform aimed at
accelerating the creation of humanoid robot foundation models and data
pipelines for managing and generating
training data.
It is not a single model but a framework that includes foundation models, simulation tools and data generation pipelines. It is designed to
make humanoid robots more general-
purpose, capable of adapting to new
environments and tasks like navigating new rooms or handling objects
with minimal retraining.
GR00T features a complete
computer-
in-the-robot solution, the
Jetson AGX Thor computing module, which runs the entire robot stack
(cognition and control). This module
is optimised for robotics, supporting
VLA models (among others). It delivers over 2070 teraflops (2070 trillion
floating point operations per second)
of AI performance (with four-bit floating point precision).
RT-2X from Google DeepMind is a
VLA foundation model built upon the
earlier RT-2 (Robotic Transformer 2)
model. It’s designed to bridge the gap
between language, vision and robotic
action for controlling humanoid or
other robots.
It is trained on vast multi-modal
siliconchip.com.au
Fig.11: examples of an RT-2 model in operation, showing some of the tasks that can be performed. Source: https://
robotics-transformer2.github.io/
datasets (text, images, videos and
robotic action data) using self-
supervised learning, allowing it to
learn patterns without explicit labels.
It has 55 billion parameters and can
generalise instructions like “put the
blue block on the red one”, even with
blocks differing from its training set.
Here is an example of how RT-2X
works:
1. Input: it receives inputs from a
camera feed and a command like “sort
these items”.
2. Processing: using a scaled transformer architecture (a type of neural network), it adjusts parameters
(weights, biases, attention scores) to
interpret the scene, reason through
the task and plan actions, leveraging
its pre-trained knowledge.
3. Output: it generates precise motor
commands, executed at a high frequency, to control the robot’s movements. Some examples of the type
of instructions the earlier RT-2 can
siliconchip.com.au
understand are shown in Fig.11.
There has been no public disclosure
of what exact foundation model Tesla’s Optimus humanoid robot uses, but
it will be discussed in the section on
Optimus next month. It is based on a
similar AI architecture to that used by
Tesla’s Autopilot and full-self-driving
(FSD) systems in their cars.
Transformer models
A transformer model is a type of
neural network that processes the
entire input sequence of data, such
as text or from a vision transformer
model, all at once, rather than stepby-step. A key strength is its ability
to understand context, helping robots
interpret commands like “pick up the
cup” by considering the full scene
before them.
It uses a feature called ‘attention’,
which allows the model to determine
the relative importance of data parts,
such as prioritising “cup” over “table”,
Australia's electronics magazine
enhancing its decision-making for
humanoid robot tasks.
Image recognition
Humanoid robots use image recognition to see and interpret their environment. This requires computer
vision models, often integrated into
the robot’s AI system. Key vision models used include convolutional neural
networks (CNNs), vision transformers (ViTs), and multimodal models
(eg, CLIP).
Convolutional neural networks
(CNNs) are deep learning models
optimised for vision, detecting edges,
shapes and patterns to build object
recognition capabilities. They are
used use by Tesla’s Optimus, Figure AI
robots and Unitree platforms. Popular
architectures like ResNet and YOLO
(You Only Look Once) are trained on
datasets like ImageNet, a benchmark
visual database with over 14 million
pictures.
November 2025 23
Vision Transformers (ViTs) are
another type of neural network that
breaks an image into smaller components called ‘patches’ and establishes
relationships between them using
‘self-attention’, similar to how language models link words in a sentence.
Unlike CNNs, ViTs can understand
the context of a scene and the relationships between parts. However,
they are computationally intensive, a
drawback compared to CNNs.
Multi-modal models like CLIP by
OpenAI recognise objects based on textual descriptions, such as “pick up the
blue cup”. Another example is Gemini-based robotics systems from Google DeepMind, built on the Gemini 2.0
framework, which powers advanced
AI models.
These models are integral to VLA
systems, enhancing a robot’s ability
to act on visual and language inputs
by enabling perception, reasoning
and action.
Foundation models like GPT-3, Grok
and RT-2X are trained on diverse datasets, including images and text. Image
recognition models can be part of these
foundation models; for example, CLIP
and RT-2X incorporate vision components within their multi-modal frameworks.
However, some CNNs trained only
on limited datasets, like items in a certain factory, aren’t considered foundation models due to their lack of broad
adaptability.
Learning to walk
Teaching a robot to walk is one
A mid-level layer facilitates communication between them, relieving
the main processor of the burden of
0 | A robot may not injure
real-time walking tasks. This mirrors
humanity or, through inaction,
human walking, where the spinal
allow humanity to come to harm.
cord’s central pattern generators han1 | A robot may not injure a
dle rhythmic motion, while the brain
human being or, through inaction,
directs overall activity and posture.
allow a human being to come to
Training a humanoid robot to walk is
harm.
a
key
development focus. One method
2 | A robot must obey the orders
involves kinematic models, which are
given it by human beings except
mathematical representations of the
where such orders would conflict
robot’s structure, joint configurations
with the First Law.
and motion constraints. Alone, these
3 | A robot must protect its
models produce a basic, often stiff
own existence as long as such
gait by focusing on geometry without
protection does not conflict with
accounting for forces, addressed by
the First or Second Law.
dynamic models.
You could make the argument
Challenges like walking on uneven
that modern-day autonomous
terrain or adapting to disturbances
military vehicles already
require advanced strategies, effeccontravene these “laws”.
tively tackled by integrating kinematic
models with dynamic simulations and
of many aspects of robot training. AI-driven optimisation.
It involves training it to coordiAI techniques, such as genetic algonate movements to achieve a stable, rithms and reinforcement learning,
human-like gait. This relies on kine- enhance kinematic models to achieve
matic models, dynamic models and more human-like motion. Genetic
AI techniques such as reinforcement algorithms optimise gait parameters
learning, genetic algorithms or imita- (eg, joint angles and torque) by emution learning.
lating an evolutionary approach,
The control algorithms for commer- rewarding closer-to-natural patterns,
cial humanoid robots are typically while reinforcement learning lets
proprietary, but the experimental RH5 robots ‘learn through experience’,
humanoid robot (Fig.12) offers insight adjusting actions based on rewards
into a hybrid approach. This system or penalties.
uses local control loops for lower-
Alternatively, transformer-based
level functions, such as managing
foundation models, pre-trained on
joint torque and balance, and central human motion data and fine-tuned for
controllers for high-level tasks like gait synthesis, offer advanced motion
determining gait direction and speed. prediction. Stability is ensured with
Asimov’s Laws of Robotics
Fig.12: the electronic control units in an RH5 robot. Source:
https://cmastalli.github.io/publications/rh5robot21ichr.pdf
Fig.13: a Simscape Multibody model shown at a high level.
Source: www.mathworks.com/help/sm/ug/humanoid_
walker.html
24
Silicon Chip
Australia's electronics magazine
siliconchip.com.au
‘zero-moment point’ (ZMP) control,
maintaining the centre of pressure
within the support polygon of the
robot and imitation learning mimics human walking from demonstration data.
Commercial tools like MathWorks’
Simscape Multibody (Fig.13, www.
mathworks.com) handle kinematics (motion) and dynamics (forces),
modelling 3D structures with torque-
activated hip, knee, and ankle joints,
and passive shoulder joints for arm
swing to aid balance by counteracting
torso motion.
The contact forces between feet and
the ground are simulated to ensure
stability, with Simulink feedback
controllers adjusting joint stiffness
and damping.
Training with MathWorks’ Global
Optimization Toolbox for genetic algorithms or MathWorks’ Deep Learning
Toolboxes and Reinforcement Learning Toolboxes refines walking, creating a feedback loop where optimised
gaits inform the central controller,
executed by local loops for natural,
robust movement.
In recent years, these combined
approaches have transformed humanoid robot walking from stiff, mechanical motions to fluid, human-like gaits,
paving the way for practical applications in diverse environments.
World Robot Competition
Mecha Fighting Series
China Media Group held the World
Robot Competition Mecha Fighting
Series to showcase humanoid robotics
technology. The robots were teleoperated by humans, but the robots autonomously provided balance and other
basic functions.
For details, see Fig.14 and https://
youtu.be/N7UxGVV_Fwo
Artificial general intelligence
We hear about artificial intelligence
all the time but there is another concept beyond that, called artificial general intelligence (AGI). This is where
a machine can emulate human intelligence in terms of self-learning (far
beyond the ‘training’ of AI), reasoning,
understanding and problem solving;
even understanding and emulating
human emotions.
Humanoid robots endowed with
AGI would be capable of great mischief in the wrong hands; this is the
subject of many dystopian science fiction movies, such as The Terminator
and I, Robot.
To protect against such dystopian
scenarios, in 1942, Isaac Asimov
devised the Three Laws of Robotics
and later added another one, although
these have been criticised as not being
a comprehensive ethical framework
to govern the behaviour of intelligent
robots. Still, they are a good starting
point (see the panel).
Experts agree that AGI has not been
achieved yet, but at the current rate of
progress, who knows when it could
arrive.
In 1949, Alan Turing proposed a test
of intelligence behaviour known as the
Turing test. This involves a human
engaging in a text-based conversation
Glossary of Terms
AI – Artificial Intelligence; machines
simulating human intelligence, such as
learning, reasoning and problem-solving
ANN – Artificial Neural Network;
computational models inspired by
human brains, used in machine learning
ASIC – Application-Specific Integrated
Circuit; a custom-designed chip
optimised for a specific function or task
CNN – Convolutional Neural Network; deep
learning models optimised for vision,
detecting edges, shapes and patterns
CPU – Central Processing Unit; a general-
purpose processor that executes
instructions & manages computing tasks
DoF – Degrees of Freedom; independent
movements a robot joint or mechanism
can perform
End Effector – a tool/device at a robotic
arm’s end that interacts with objects
FPGA – Field-Programmable Gate Array;
a chip programmable for specific
hardware tasks post-manufacturing
GPU – Graphics Proccessing Unit; a
processor specialised for highly parallel
tasks like machine learning
LLM – Large Language Model; an AI model
trained on massive text datasets to
generate or understand language
Multimodal – An AI that processes and
integrates multiple data types (text,
images, audio, video etc)
Neuromorphic Processor – a chip
that uses artificial neurons to mimic the
human brain
NLP – Natural Language Processing; an
AI’s ability to understand, interpret and
generate human language
Organoid – a simplified version of an
organ designed to imitate it
RTOS – Real-Time Operating System; an
operating system that guarantees timely
processing for critical tasks
Tactel – Tactile Element; a sensor element
that detects touch, pressure or texture
information
Teleoperation – operating a machine
remotely
TPU – Tensor Processing Unit; a Google-
designed chip optimised for accelerating
machine learning workloads.
Transformer – a neural network
architecture that uses attention to
process sequential data efficiently
VLA – Vision-Language Action; an AI that
combines visual input and language to
perform actions or tasks
VLM – Vision-Language Model; an AI that
Fig.14: two Unitree G1 robots fighting in the Mecha Fighting Series.
Source: China Media Group.
siliconchip.com.au
Australia's electronics magazine
combines image understanding with text
comprehension and generation
November 2025 25
with either a machine or another
human, and determining if they can
distinguish between the two. If the
human cannot distinguish between
the two, the computer is deemed to
display true intelligence.
In 2022, ChatGPT-4 passed a rigorous implementation of the Turing
test, the first time a computer did so,
leading some to speculate that the Turing test was not a strict enough test
for machine intelligence. Since then,
uncanny valley
moving
still
humanoid robot
healthy
person
bunraku puppet
affinity
stuffed animal
50%
corpse
Ethics
Clearly, AI and robotics are improving by the day, and it won’t always be
for the good of humankind. Consider a
mass-produced army of military robots
produced by a hostile power, or robots
used for crime and violence. As John
Connor said of The Terminator, “It can’t
be bargained with. It can’t be reasoned
with. It doesn’t feel pity or remorse or
fear and it absolutely will not stop”.
We are not at that stage yet, but it
may happen within the lifetimes of
many readers, maybe even within
ten years.
Humanoid robots and
artificial limbs
industrial robot
human likeness
other systems like LLaMa-3.1 and GPT4.5 have also passed Turing tests.
100%
prosthetic hand
zombie
Fig.15: the ‘uncanny valley’ describing the possible emotional response to
various humanoid robots compared to their likeness to humans. One curve
is for a moving robot, the other for a still one. They both become significantly
negative before reaching the positive response to a human. Source: https://w.
wiki/EoPq
The development of humanoid
robots also has benefits for artificial
limbs for humans, as the basic design
of a human-like limb for a robot will
also be suitable for use with humans.
Our article about Artificial/Prosthetic
Limbs in March 2025 (siliconchip.au/
Article/17782) discussed this. The
limbs of Tesla’s Optimus have been
proposed for this purpose.
The uncanny valley
The “uncanny valley” is a psychological response to humanoid robots
at various levels of realism, developed by Japanese roboticist Masahiro
Mori. It speculates that a robot which
is ‘almost human’ in appearance will
elicit an eerie response that a more
human or less human looking robot
would not – see Fig.15.
Another example could be the
Ameca robot by Engineered Arts, due
to its very realistic facial motions (see
Fig.16; https://engineeredarts.com/
robot/ameca). Or the Sophia robot by
Hanson Robotics (see Fig.17; www.
hansonrobotics.com/sophia).
There is some experimental evidence that this phenomenon is real. It
suggests that certain design elements
should be incorporated into humanoid
robots to avoid revulsion (for example, making them look clearly different from people).
Next month
Fig.16: Ameca is a robot with a
realistic-looking head developed
by Engineered Arts. Source: https://
engineeredarts.com/robot/ameca/
26
Silicon Chip
Fig.17: another robot with a realisticlooking head is Sophia by Hanson
Robotics. Source: https://www.
hansonrobotics.com/sophia/
Australia's electronics magazine
The second half of this series will be
published next month. It will describe
notable historical and current humanoid robots, like those shown on the
SC
lead page.
siliconchip.com.au
|