Humanoid Robots, Part 1 - November 2025

Outer Front Cover
Contents
Publisher's Letter: IPv6 is growing in popularity
Feature: Humanoid Robots, Part 1 by Dr David Maddison, VK3DSM
Project: RP2350B Computer by Geoff Graham & Peter Mather
Project: Power Rail Probe by Andrew Levido
Feature: Power Electronics, Part 1 by Andrew Levido
Feature: Modules: Large OLED Panels by Tim Blythman
Project: Digital Preamp & Crossover, Pt2 by Phil Prosser
Project: Over Current Protector by Julian Edgar
Serviceman's Log: Remotely Interesting by Dave Thompson
PartShop
Vintage Radio: Telequipment D52 Oscilloscope by Dr Hugo Holden
Subscriptions
Market Centre
Advertising Index
Notes & Errata: High power H-bridge uses discrete Mosfets, November 2017
Outer Back Cover

This is only a preview of the November 2025 issue of Silicon Chip.

You can view 37 of the 104 pages in the full issue, including the advertisments.

For full access, purchase the issue for $10.00 or subscribe for access to the latest issues.

Purchase a printed copy of this issue for $14.00.

HUMANOID & AN Agility Digit www.agilityrobotics.com Boston Dynamics Atlas https://bostondynamics.com/atlas Unitree H1 www.unitree.com/h1 Tesla Optimus www.tesla.com/en_eu/AI Like many ideas that started as science fiction, humanoid and android robots are now a reality. They have not yet been perfected – but they are here. We’ll likely see them entering widespread use over the next couple of decades. V ideo phones, vertically landing rockets, artificial intelligence (AI) – not long ago, these things were purely in the realm of science fiction. But now they are everyday technologies. Humanoid robots aren’t very far behind. Traditional robots, typically found in factories, are mostly stationary and perform repetitive tasks. In contrast, humanoid robots functionally resemble people. Android robots are humanoids designed to very closely resemble humans, to the point of being almost indistinguishable from us. So far, no robot has been developed that is truly indistinguishable from a human, but some can pass superficial inspection. Examples of such androids include the 14 Silicon Chip Japanese Actroid-DER and the South Korean EveR-4, both of which we will discuss later. This series comprises two articles; this first one will discuss the general aspects of and technology behind humanoid robots, while the follow-up next month will cover a range of robots that are currently in development, being demonstrated or in use. Why humanoid robots? Humanoid robots are ideal for working in spaces designed for humans. Unlike conventional robots that are designed for a specific range of tasks and are often stationary, humanoid robots can, if sufficiently advanced, do anything a human can do. They can have many flexible joints and Australia's electronics magazine high mobility. They don’t have to be the same size as a human; they can be smaller or larger as required for their job. Some examples of jobs that humanoid robots are ideal for are: O Caring for hospital patients O Construction work O Customer service (eg, retail) O Handling inquiries in public places like airports and train stations O Hotel check-in staff O Domestic duties (eg, housework) O Factory floor work (assembly, moving objects and inspections) O Warehouse work O Risky, dangerous or unpleasant tasks The rate of advancement of humanoid robots is rapid due to the siliconchip.com.au NDROID ROBOTS Part 1: by Dr David Maddison, VK3DSM Figure 02 www.figure.ai 1X NEO Gamma www.1x.tech/neo convergence of improved mechanical design, artificial intelligence, faster computer chips and advances in computer and chip architectures. Humanoid robots can address labour shortages and our ageing population, as well as perform dirty, undesirable, repetitive tasks that humans don’t want to. They’ll do it 24/7, more precisely and for no pay. This has resulted in a greatly increased demand for such robots. The future use of humanoid robots raises ethical concerns, but that has always been the case with the introduction of more advanced automation, even since the time of the Industrial Revolution. People tend to move on to other forms of employment if displaced. Also, despite incredible advances, the robots are not taking over; not yet, anyway... What is a humanoid robot? There is no strict definition, but siliconchip.com.au Apptronik Apollo https://apptronik.com/apollo typically a humanoid robot features a human-like appearance, including two arms, two legs, a head, a torso and a size similar to humans. They are designed to mimic human behaviour. This mimicry stems from their ability to move, converse and provide information, express emotions through facial expressions and perform natural language processing (NLP) using artificial intelligence (AI), enabling conversations and instruction-giving. Their movements are designed to enable useful tasks, such as picking up, carrying and placing objects, while AI allows them to receive instructions or engage in conversation, distinguishing socially engaging robots from those used purely for industrial purposes. Parts of a humanoid robot The main components of a humanoid robot are: 1. The body structure incorporating Australia's electronics magazine Booster Robotics T1 www.boosterobotics.com/robots/ limbs, a torso and a head, usually made from aluminium or plastic composites. 2. Motors (actuators) and joints, with the motors acting as the ‘muscles’. 3. Sensors, such as cameras (eyes), microphones (ears), gyroscopes and accelerometers (as in vestibular part of the human ear) and touch sensors. 4. A ‘brain’ comprising three key parts. a The main computer processor, which acts as the central hub. It is responsible for overall control, coordinating the robot’s actions by running AI software. b AI software serves as the ‘mind’, enabling advanced tasks like recognising objects, perception, learning from experience, making decisions and planning movements. This can include a large language model (LLM) and/or vision language model (VLM). This AI software might use the main processor or may also run on November 2025 15 specialised hardware like GPUs (graphics processing units) or TPUs (tensor processing units) for efficiency and speed, and is typically trained using machine learning rather than just programmed. c Microcontrollers are distributed throughout the robot, managing specific hardware subsystems like motors in the arms and sensors in the hands, in real time, ensuring precise control under the main processor’s guidance 5. A power source, such as a battery pack. 6. Wireless communications systems. Actuators and joints Actuators for humanoid robots may be hydraulic, pneumatic or electric. There are also small actuators for facial ‘muscles’, for robots capable of facial expression. Electric actuators are the favoured types these days due to their compactness, lightness, simplicity, quietness and good power-to-weight ratio. They usually use DC motors or servo motors, often with reduction gears to increase torque. Fig.1 shows a typical commercially available actuator from ASBIS that could be employed in a humanoid robot. It has an EtherCAT Ethernet controller, a pair of 19-bit encoders to enable precise rotation accuracy, a high-torque brushless DC motor, clutch brakes that lock the actuator in the event of power loss, a harmonic reducing gear, and a cross roller to ensure rigidity for axial and radial loads. Fig.2 shows the RH5 experimental humanoid robot and the range of movements of its joints possible with its particular types of actuators, along with the symbols used to represent them. This robot has a total of 34 degrees of freedom (DoF) – see the panel at lower right. Communication and networking Humanoid robots need to communicate for a variety of reasons, such as to receive updated instructions, updated software, teleoperation (remote control by a human), remote processing for complex tasks, progress tracking, fault monitoring or other reasons. They can connect wirelessly via common means such as 5G, Bluetooth, Zigbee, IoT, WiFi and MQTT (Message Queuing Telemetry Transport). Voice communication with humans is possible using a speaker and microphone; spoken instructions can be interpreted using natural language processing by an LLM. Power supplies Humanoid robots are mostly powered by lithium-ion batteries. Some are in the form of a removable battery pack that is quickly swappable to avoid significant downtime while the robot recharges. Some robots, such as early Boston Dynamics robots intended for military use, used on-board petrol or diesel generators, as a military robot cannot be quickly recharged in the field. However, we are not aware of any military humanoid robots under development that use internal combustion engines. Robot designers take care to ensure robots use power efficiently to maximise their use time between charges Fig.1: a commercially available actuator that can be used in a humanoid robot. Source: www.asbis.com/aros-robotic-actuators 16 Silicon Chip Australia's electronics magazine or battery swaps. Systems are being developed to allow humanoid robots to connect to a charger or change battery packs themselves. Processors Neural networks are the basis of human and animal brains, and are important for artificial intelligence and humanoid robots. They are flexible and can learn and model new and changing relationships that are non-linear and complex. They are thus highly suitable for tasks like speech and image recognition. Artificial neural networks (ANNs) can be either modelled in software or hardware (or as biological circuits in some experimental arrangements). There are several types of processors that can be used to power AI for robots (or in general) including: O CPUs (central processing units), as used in regular computers O GPUs (graphics processing units) O TPUs (tensor processing units) O Neuromorphic processors In addition to AI functions, hardware subsystems may be controlled by other processors. The CPUs used in humanoid robots are very powerful and, while not specifically designed for AI purposes, can still satisfactorily run AI software. A CPU may also be used in combination with another type of processor. GPUs, or graphics processing units, were originally developed for graphics applications but have been adapted for neural networks in artificial intelligence due to their ability to handle many calculations at once. This parallel processing is essential for training AI to perform tasks like vision in humanoid robots. Widely recognised as the world leader in AI chips, NVIDIA uses GPUs as the foundation of its AI technology. NVIDIA’s AI chips, such as the H100, A100, RTX series GPUs, Grace Hopper Superchip GH200, and Blackwell accelerator architecture, are optimised with NVIDIA’s CUDA (compute unified device architecture) software for parallel computing. A popular choice for humanoid robots is the NVIDIA Jetson series platforms. These processor modules integrate an energy-efficient ARM CPU and GPU, and can be used for AI tasks, such as image recognition and deep learning. A TPU, or Tensor Processing Unit, siliconchip.com.au Fig.2: the actuation and morphology of an RH5 humanoid robot. The red, green and yellow symbols represent the type of joints: S: Spherical, R: Revolute, P: Prismatic, U: Universal. Source: https://cmastalli.github.io/publications/ rh5robot21ichr.pdf is a specialised chip designed and developed by Google, optimised for machine learning. Unlike CPUs and GPUs, which evolved for AI from general computing and graphics roles, TPUs were built from scratch for this purpose. They excel at matrix operations, a core component of neural networks, and demonstrate superior performance in tasks like training large models, outperforming CPUs and GPUs in specific low-precision workloads. TPUs are used in applications such as natural language processing, image recognition for navigation and recommendation systems, powering Google’s AI services. They show great promise for use in humanoid robots, where their efficiency could enhance real-time vision and decision-making. Still, only one current humanoid robot, Gary (described next month) is known to use them. Neuromorphic processors are designed to emulate the structure and function of a human brain, although they are not nearly as complex. They employ mixed analog and digital processing to generate neural signals, providing radically different computational outcomes than traditional digital computing using von Neumann architectures. siliconchip.com.au The experimental iCub humanoid robot (described next month) is a robot said to use such a processor. This biological-style approach results in a more energy efficient processor, with some architecture like that of a brain. Examples of neuromorphic chips include Intel’s Loihi, IBM’s TrueNorth and NorthPole, BrainChip’s Akida and the SENeCA (Scalable Energy-efficient Neuromorphic Computer Architecture) research chip. Neuromorphic processors use spiking neural networks (SNNs), where information is processed as discrete spikes in a manner similar to biological neurons, rather than continuous activation, as with artificial neural networks (ANNs). Degrees of freedom (DoF) Degrees of freedom means the number of independent motions a robotic appendage such as an arm or a leg can make. The more degrees of freedom it has, the more flexible and useful it is. Consider a very simple robot arm. A single robot joint such as a wrist that can rotate about one axis (yaw) represents one DoF. Shoulder joints that can move on two axes (pitch and yaw) add two more DoF. A hinged elbow joint allowing flexion/extension is another DoF. So a simple robot arm that has a shoulder, elbow and wrist would have four DoF. The hand (or other gripping mechanism) does not count, as it is considered the ‘end-effector’, the component that is being manipulated. DoF typically only refers to joint motions, not the internal components of the endeffector. For robot arms, six DoF is the minimum required for full position and orientation control of the end-effector. A count of seven or more is considered ideal. The more DoF a robot has, the more mechanically complex it becomes and the more advanced the required control algorithms and training become. A human arm has seven DoF: three in the shoulder (flexion/extension, abduction/adduction & internal/external rotation), one from the elbow (flexion/extension), one from the forearm (pronation/supination) and two from the wrist (flexion/extension & radial/ulnar deviation). Australia's electronics magazine November 2025 17 Fig.3: a model of a proposed humanoid robot with an ‘organoid’ brain. Source: www.datacenterdynamics.com/en/news/chinese-scientists-developartificial-brain-to-control-brain-on-chip-organoid-robot/ Neuromorphic processors are not yet widely adopted currently due to a lack of hardware maturity, challenges integrating them with existing ecosystems, the need for new programming paradigms and the lack of computational power compared to other processors. Other processors To relieve the computational burden from the rest of the robot’s ‘brain’, control of some hardware such as a hand or knee joint may be performed by small integrated computer chips called microcontrollers. In some cases, field-programmable gate arrays (FPGAs) and application- specific integrated circuits (ASICs) are used for very high-performance tasks, such as complex motion control algorithms. These offer specialised hardware acceleration for particular tasks, improving efficiency and real-time performance. Organic ‘brains’ Neural networks can also be built with biological neurons. One Melbourne-based company has developed an experimental “wetware” computer, although it has no current application in humanoid robots (www.abc. net.au/news/science/104996484). Researchers are also looking at neural networks made from human cells. For example, researchers at Tianjin University and the Southern University of Science and Technology in China have interfaced human brain cells onto a neural interface chip to make a neural network ‘brain’ that can 18 Silicon Chip be trained to perform tasks. This brain has not yet been incorporated into a robot as proposed (Fig.3), but brain cells on a chip were stimulated and trained to navigate environments and grip objects when interfaced to an external robot. The collection of brain cells is called an ‘organoid’, and is not a real human brain, but possesses the neural network architecture of one and is about 3mm to 5mm in diameter. The size limit is imposed due to the inability to vascularise the cells (incorporate blood vessels). If this hurdle is overcome, much large structures can be fabricated. Of course, there are ethical implications of using human cells for such applications. Skin materials Silicone elastomers (rubbers) are commonly used for the skin of humanoid robots with realistic facial and other features. They are soft and highly deformable, like real human flesh and skin, plus they can be readily coloured and moulded and formulated for particular properties. Fig.4 shows an example of a silicone skin on a humanoid robot chassis. Many companies make silicone products. One that we came across that might have suitable products is Smooth-On (www.smooth-on.com). A team of researchers at Aalto University and the University of Bayreuth have developed hydrogel skin materials. Hydrogel is a gel material that contains a high proportion of water. It is soft, pliable and moist, much like skin and flesh. Australia's electronics magazine Fig.4: an example of silicone skin on a humanoid robot chassis, the discontinued Robo-C2 from Promobot. Source: https://promo-bot.ai/robots/ robo-c/ These researchers developed hydrogel materials suitable for the skin of realistic humanoid robots. They are even self-healing, so any cut or other minor damage will repair itself; see www.nature.com/articles/s41563025-02146-5 Another concept under development is ‘electronic skin’. This emulates human skin, with the ability to sense pressure, temperature, deformation etc using flexible electronics embedded into a silicone or similar matrix (see our article on Organic Electronics in the November 2015 issue; siliconchip. au/Article/9392). Incredibly, as a proof-of- concept project, scientists from the University of Tokyo, Harvard University and the International Research Center for Neurointelligence (Tokyo) have made a robot face using cultured living human skin (Fig.5), although it would no doubt soon die without associated nourishment. That is scarily reminiscent of The Terminator. Operating systems and frameworks Operating systems (OS) for humanoid robots are specialised software that extend beyond traditional computer operating systems. They integrate real-time processing and AI for the robot’s ‘brain’. These systems orchestrate critical tasks, including the real-time control of actuators, sensors and power systems, as well as balance, locomotion, environmental interaction and task planning. They rely on real-time operating systems (RTOS) like the siliconchip.com.au open-source FreeRTOS or RTEMS to ensure low-latency, deterministic responses for precise sensor-actuator coordination. Complementing these operating systems, ‘middleware’ facilitates communication between diverse software components. For instance, the data distribution service (DDS) in opensource ROS 2 (Robot Operating System 2), a widely used robotics framework, enables modular, scalable, and interoperable data exchange. Frameworks like ROS 2 and NVIDIA Isaac (based on ROS 2) provide structured environments to integrate AI and manage robotic functions. Most humanoid robots use opensource Linux-based operating systems, such as Ubuntu with ROS 2 or RTLinux with built-in real-time capabilities, due to their flexibility and compatibility with AI frameworks. These systems support advanced AI algorithms, including LLMs like various GPT models, for natural language understanding; VLMs, like CLIP (Contrastive Language-Image Pretraining), for scene and object recognition; and reinforcement learning for optimising movement and acquiring new skills. This enables continuous learning and adaptation in dynamic environments. For example, the ROS 2 framework, running on Linux, powers robots like Boston Dynamics’ Atlas for dynamic locomotion and manipulation. NVIDIA’s Isaac platform, built on ROS 2, supports AI-driven perception and control in robots like Tesla’s Optimus and Figure’s Figure 01 for human-robot collaboration. Together, Linux-based operating systems and frameworks like ROS 2 enable humanoid robots to perform diverse tasks, from industrial automation to assistive caregiving, with precision and adaptability. Fig.5: human skin grown for proposed use on humanoid robot. The mould is on left, the skin on the right; the eyes are not real. Source: www.cell.com/cellreports-physical-science/fulltext/S2666-3864(24)00335-7 Simulation platforms NVIDIA’s Isaac Sim (see https:// developer.nvidia.com/isaac/sim) is a robotics simulation platform built on the Omniverse framework. It can be used to create digital ‘twins’, ie, virtual replicas of physical robots, including humanoids, to train AI and test software as well as avoiding damage to people or robots if a real robot was used – see Fig.6. Digital twins help train neural networks (eg, those in foundation models like RT-2X; discussed later) on siliconchip.com.au Fig.6: a robot simulation from NVIDIA Isaac Lab, which is related to NVIDIA Isaac Sim. Source: https://developer.download.nvidia.com/images/isaac-lab1980x1080.jpg simulated sensor data. For operating systems, they test software stability (eg, real-time control loops), and validate algorithms (eg, path planning). Isaac Sim simulates sensor inputs (eg, cameras and gyroscopes) and interactions with objects, both crucial for AI development. It integrates Australia's electronics magazine with robotics frameworks like ROS 2, aligning with operating systems and software used in robots like Tesla’s Optimus or NASA’s Valkyrie. These digital twins enable robot learning and simulation by replicating real-world physics and sensor data, supporting the development November 2025 19 of operating systems and algorithms for tasks like movement and object interaction. Besides Isaac Sim, other notable alternative simulation platforms that we don’t have space to delve into include: O Gazebo (open source) https://gazebosim.org/home O Webots (open source) https://cyberbotics.com O CoppeliaSim (commercial) www.coppeliarobotics.com O MuJoCo (open source) https://mujoco.org Each excels in specific areas. Gazebo has great community support; Webots is perfect for industry, education and research; CoppeliaSim is flexible, with diverse capabilities; and MuJoCo has advanced physics simulation. Other software The Python programming language is widely used for robot control and AI implementation in humanoid robots. It simplifies managing actuators, sensors and motion planning, often alongside C++ in frameworks like ROS. Python’s extensive libraries, like TensorFlow and PyTorch, support developing and deploying AI models for tasks like vision and decision-making. Besides Python, other programming languages used for humanoid robots include C++ and C for control, MATLAB for research, Java for middleware, and the emerging Rust. Each complements Python, addressing specific needs in AI training, OS stability and software validation. Other operating systems used with humanoid robots worth mentioning include: O HarmonyOS 6, an operating system developed by Huawei, with its AI Agent Framework, is showing promise for operating and training robots. Examples of variations or adaptations of HarmonyOS in robotics include Kuavo, with possible variants like M-Robots OS or iiRobotOS, reflecting its customisable nature. HarmonyOS is used to operate and train the Walker S humanoid robot (described next month) developed by UBTECH Robotics for tasks like quality inspections at Nio’s factory. Fig.7: the SynTouch BioTac multimodal tactile sensor for use in robot fingers that can detect force, vibrations and temperature. Source: https://wiki. ros.org/BioTac Fig.8: some of the uses of foundation models. Source: https://blogs.nvidia.com/blog/whatare-foundation-models/ 20 Silicon Chip Australia's electronics magazine O HumaOS (www.humaos.org), a real-time, pre-emptive operating system designed for advanced humanoid robotics, enabling human-like cognitive processing and precise motor control. It is optimised for modern robotics hardware and neuromorphic processors, is developer-friendly and has comprehensive safety protocols and fail-safes. It runs on a real-time Linux core. Sensors and perception Humanoid robots must be able to sense and map their environment. They can use sensors and navigation systems including cameras, GPS/ GNSS, IMUs (inertial measurement units), lidar, microphones and tactiles. Tactiles (Fig.7) are sensors, such as in the fingertips of the robot, that measure pressure, temperature and vibration (possibly more). They may be composed of smaller sensing elements called tactels. A tactel (tactile element) is an individual sensing element that is part of a sensor array, analogously equivalent to an individual nerve on a human fingertip. Human fingertips have about 465 sensory nerves per square centimetre. A tactel sensor array can provide high-resolution sensing and could, for example, sense the texture of an object. Training robots The basis of training humanoid robots lies in the use of foundation models (Fig.8). These large-scale AI models are trained on vast amounts of real-world data, such as videos from sources like YouTube, to learn specific tasks or a range of activities. This enables them to perceive and understand their environment, make decisions and perform tasks. For example, a foundation model might be trained on thousands of videos of pouring coffee, extracting the essential generic steps to replicate the task, even if the exact scenario differs from its training data. Foundation models can be trained with text, images, videos, speech, or other structured data. Key advantages include reduced development time for new applications, greater flexibility and adaptability and the ability to generalise skills from one task to another. This is unlike task-specific programming, which has limited reuse possibilities. siliconchip.com.au Fig.9: the model framework of GO-1. In this case, it is learning to hang a T-shirt. LAM stands for latent action model. Source: https://agibot-world.com/blog/go1 Foundation models rely on neural networks, which mimic how human and animal brains operate. Individual neurons are relatively simple, but their collective communication enables complex behaviours. Especially in foundation models, parameters (see panel) are used to measure the model’s complexity and learning capacity, acting as ‘adjustment knobs’ that alter the weighting of connections between neurons and biases that allow independent operation for generalisation. A larger number of parameters enhances the ability to handle complex data but requires more computational resources. However, excessive parameters may cause the model to memorise training data rather than learn underlying patterns, necessitating careful design to optimise performance and adaptability to unfamiliar situations. Foundation models include large language models (LLMs), vision language models (VLMs), vision language action (VLA) systems, image models, audio models, or multimodal models. LLMs and VLMs are the most commonly used in humanoid robots due to their language and vision capabilities. Examples of LLMs include OpenAI’s GPT-3 (with 175 billion parameters), xAI’s Grok, and Google’s Gemini (with undisclosed parameters), trained on vast text datasets like books and web content. These models enable tasks such as interpreting commands like “walk to the door”, with the ‘large’ part reflecting their vast number of parameters that capture complex language patterns. Not all LLMs qualify as foundation models; for instance, a smaller siliconchip.com.au LLM trained only for a specialist task lacks the broad, general-purpose training or adaptability required. Vision-language models, such as OpenAI’s CLIP and Google’s PaLI, combine image recognition with natural language understanding, allowing them to identify objects like a “red cup” based on descriptions. An LLM can work cooperatively with a VLM, where the VLM provides the perceptual context of a visual scene and the LLM interprets and responds to commands based on the scene’s contents. For example, RT-2X from Google DeepMind uses a VLM for image understanding and a reasoning module for task execution, enabling actions like picking up an object based on a verbal command. In a robot, the LLM and VLM could run on separate hardware modules coordinated by a central controller, or be combined into a single multimodal foundation model run on one module. The latter is seen in models like PaLM-E (https://palm-e.github. io), which blends language and vision for action. Humanoid robots said to incorporate combined LLMs and VLMs include Tesla Optimus, Figure 02 running Helix, and Walker S. Examples of foundation models include: AgiBot GO-1 is designed to be the general-purpose ‘brain’ of humanoid robots, to help them learn and adapt. GO-1 uses vision language models (Fig.9), in which massive amounts of real-world images and videos are fed to the models, training them how to perform specific tasks. The model algorithms then convert the data into a series of steps, enabling them to perform the required tasks. The system can form generalisations from the training data (videos of humans doing things), enabling it to perform tasks similar to what was Parameters in artificial intelligence Parameters in artificial intelligence models are a critical component, allowing the model to learn and represent associations between concepts. They include weights, biases, attention scores and embedding vectors. For example, a weight is adjusted during training to associate “cat” with “meow” rather than “bark”. Biases are extra adjustments to weights that set the tone of a sentence, such as promoting “great day” toward a positive tone based on its typical associations in the training data. Attention scores determine which parts of a sentence the model focuses on. For instance, in “The cat, not the dog, meowed”, the model prioritises “cat” and “meowed”, ignoring “dog” as the action’s source. Embedding vectors are numerical representations of words in higherdimensional space. During training, a word like “happy” is shifted closer to “joy” and farther from “sad” based on how often they appear together in the training data. AI is only as good as its training data and will incorporate any of the biases present in its training materials. As the saying goes, “garbage in, garbage out”. Australia's electronics magazine November 2025 21 shown, not just the exact tasks shown. The overall GO-1 framework comprises the VLM, the MoE (Mixture of Experts) and the Action Expert. The MoE contains the Latent Planner, which learns action patterns from human behaviour (as observed in videos etc) to build comprehension. The Action Expert is trained with over a million real-world robot demonstrations and refines the execution of tasks. The VLM, Latent Planner and Action Expert cooperate to perform actions. The VLM process image data to provide force signals (to understand the forces involved in various actions), the required language inputs to perform tasks, and understand the scene. Based on outputs from the VLM, the Latent Planner generates Latent Action Tokens and generates a Chain of Planning. The Action Expert then generates fine-grained sequences of action based on the outputs of the VLM and the Latent Action Tokens. GO-1 is a generic platform that can be used in a variety of robots. In https:// youtu.be/9dvygD4G93c it is possible to see some of the ‘thought’ processes the robot goes through as it performs various tasks. AutoRT (Fig.10), developed by Google DeepMind, is a research project and an experimental AI training system for scalable, autonomous robotic data collection in unstructured real-world environments. It enables robots to operate in “completely unseen scenarios with minimal human supervision”. It integrates VLMs for scene and object interpretation, and LLMs for proposing tasks (eg, “wipe down the countertop with the sponge”), plus robot control models (RT-1 or RT-2) for task execution. The robot’s tasks are self-generated and work as follows (from https://auto-rt.github.io): 1. The robot maps the environment to generate points of interest, then samples one and drives to that point. 2. Given an image from the robot Fig.10: how AutoRT works for a basic group of tasks. Source: https://auto-rt.github.io/ 22 Silicon Chip Australia's electronics magazine camera, a VLM produces text describing the scene the robot observes. The output is forwarded to an LLM to generate tasks the robot could attempt. 3. Tasks are filtered via self-reflection to reject tasks and categorise them into those that need human assistance, and those that do not. 4. A valid task is sampled from the filtered list and the robot attempts it. 5. The attempt is scored on how diverse the task and video are compared to prior data, and the list is repeated. In trials, AutoRT has utilised multiple robots in multiple buildings, up to 20 simultaneously, with 52 tested over seven months to perform diverse tasks like object manipulation, collecting 77,000 trials across 6,650 unique tasks. A ‘robot constitution’ ensures safety by filtering tasks to avoid humans or hazards, complemented by force limits and human-operated kill switches. This enables robots to gather training data autonomously and safely, improving their adaptability to novel scenarios. NVIDIA’s GR00T (Generalist Robot 00 Technology) is a research initiative and development platform aimed at accelerating the creation of humanoid robot foundation models and data pipelines for managing and generating training data. It is not a single model but a framework that includes foundation models, simulation tools and data generation pipelines. It is designed to make humanoid robots more general- purpose, capable of adapting to new environments and tasks like navigating new rooms or handling objects with minimal retraining. GR00T features a complete computer- in-the-robot solution, the Jetson AGX Thor computing module, which runs the entire robot stack (cognition and control). This module is optimised for robotics, supporting VLA models (among others). It delivers over 2070 teraflops (2070 trillion floating point operations per second) of AI performance (with four-bit floating point precision). RT-2X from Google DeepMind is a VLA foundation model built upon the earlier RT-2 (Robotic Transformer 2) model. It’s designed to bridge the gap between language, vision and robotic action for controlling humanoid or other robots. It is trained on vast multi-modal siliconchip.com.au Fig.11: examples of an RT-2 model in operation, showing some of the tasks that can be performed. Source: https:// robotics-transformer2.github.io/ datasets (text, images, videos and robotic action data) using self- supervised learning, allowing it to learn patterns without explicit labels. It has 55 billion parameters and can generalise instructions like “put the blue block on the red one”, even with blocks differing from its training set. Here is an example of how RT-2X works: 1. Input: it receives inputs from a camera feed and a command like “sort these items”. 2. Processing: using a scaled transformer architecture (a type of neural network), it adjusts parameters (weights, biases, attention scores) to interpret the scene, reason through the task and plan actions, leveraging its pre-trained knowledge. 3. Output: it generates precise motor commands, executed at a high frequency, to control the robot’s movements. Some examples of the type of instructions the earlier RT-2 can siliconchip.com.au understand are shown in Fig.11. There has been no public disclosure of what exact foundation model Tesla’s Optimus humanoid robot uses, but it will be discussed in the section on Optimus next month. It is based on a similar AI architecture to that used by Tesla’s Autopilot and full-self-driving (FSD) systems in their cars. Transformer models A transformer model is a type of neural network that processes the entire input sequence of data, such as text or from a vision transformer model, all at once, rather than stepby-step. A key strength is its ability to understand context, helping robots interpret commands like “pick up the cup” by considering the full scene before them. It uses a feature called ‘attention’, which allows the model to determine the relative importance of data parts, such as prioritising “cup” over “table”, Australia's electronics magazine enhancing its decision-making for humanoid robot tasks. Image recognition Humanoid robots use image recognition to see and interpret their environment. This requires computer vision models, often integrated into the robot’s AI system. Key vision models used include convolutional neural networks (CNNs), vision transformers (ViTs), and multimodal models (eg, CLIP). Convolutional neural networks (CNNs) are deep learning models optimised for vision, detecting edges, shapes and patterns to build object recognition capabilities. They are used use by Tesla’s Optimus, Figure AI robots and Unitree platforms. Popular architectures like ResNet and YOLO (You Only Look Once) are trained on datasets like ImageNet, a benchmark visual database with over 14 million pictures. November 2025 23 Vision Transformers (ViTs) are another type of neural network that breaks an image into smaller components called ‘patches’ and establishes relationships between them using ‘self-attention’, similar to how language models link words in a sentence. Unlike CNNs, ViTs can understand the context of a scene and the relationships between parts. However, they are computationally intensive, a drawback compared to CNNs. Multi-modal models like CLIP by OpenAI recognise objects based on textual descriptions, such as “pick up the blue cup”. Another example is Gemini-based robotics systems from Google DeepMind, built on the Gemini 2.0 framework, which powers advanced AI models. These models are integral to VLA systems, enhancing a robot’s ability to act on visual and language inputs by enabling perception, reasoning and action. Foundation models like GPT-3, Grok and RT-2X are trained on diverse datasets, including images and text. Image recognition models can be part of these foundation models; for example, CLIP and RT-2X incorporate vision components within their multi-modal frameworks. However, some CNNs trained only on limited datasets, like items in a certain factory, aren’t considered foundation models due to their lack of broad adaptability. Learning to walk Teaching a robot to walk is one A mid-level layer facilitates communication between them, relieving the main processor of the burden of 0 | A robot may not injure real-time walking tasks. This mirrors humanity or, through inaction, human walking, where the spinal allow humanity to come to harm. cord’s central pattern generators han1 | A robot may not injure a dle rhythmic motion, while the brain human being or, through inaction, directs overall activity and posture. allow a human being to come to Training a humanoid robot to walk is harm. a key development focus. One method 2 | A robot must obey the orders involves kinematic models, which are given it by human beings except mathematical representations of the where such orders would conflict robot’s structure, joint configurations with the First Law. and motion constraints. Alone, these 3 | A robot must protect its models produce a basic, often stiff own existence as long as such gait by focusing on geometry without protection does not conflict with accounting for forces, addressed by the First or Second Law. dynamic models. You could make the argument Challenges like walking on uneven that modern-day autonomous terrain or adapting to disturbances military vehicles already require advanced strategies, effeccontravene these “laws”. tively tackled by integrating kinematic models with dynamic simulations and of many aspects of robot training. AI-driven optimisation. It involves training it to coordiAI techniques, such as genetic algonate movements to achieve a stable, rithms and reinforcement learning, human-like gait. This relies on kine- enhance kinematic models to achieve matic models, dynamic models and more human-like motion. Genetic AI techniques such as reinforcement algorithms optimise gait parameters learning, genetic algorithms or imita- (eg, joint angles and torque) by emution learning. lating an evolutionary approach, The control algorithms for commer- rewarding closer-to-natural patterns, cial humanoid robots are typically while reinforcement learning lets proprietary, but the experimental RH5 robots ‘learn through experience’, humanoid robot (Fig.12) offers insight adjusting actions based on rewards into a hybrid approach. This system or penalties. uses local control loops for lower- Alternatively, transformer-based level functions, such as managing foundation models, pre-trained on joint torque and balance, and central human motion data and fine-tuned for controllers for high-level tasks like gait synthesis, offer advanced motion determining gait direction and speed. prediction. Stability is ensured with Asimov’s Laws of Robotics Fig.12: the electronic control units in an RH5 robot. Source: https://cmastalli.github.io/publications/rh5robot21ichr.pdf Fig.13: a Simscape Multibody model shown at a high level. Source: www.mathworks.com/help/sm/ug/humanoid_ walker.html 24 Silicon Chip Australia's electronics magazine siliconchip.com.au ‘zero-moment point’ (ZMP) control, maintaining the centre of pressure within the support polygon of the robot and imitation learning mimics human walking from demonstration data. Commercial tools like MathWorks’ Simscape Multibody (Fig.13, www. mathworks.com) handle kinematics (motion) and dynamics (forces), modelling 3D structures with torque- activated hip, knee, and ankle joints, and passive shoulder joints for arm swing to aid balance by counteracting torso motion. The contact forces between feet and the ground are simulated to ensure stability, with Simulink feedback controllers adjusting joint stiffness and damping. Training with MathWorks’ Global Optimization Toolbox for genetic algorithms or MathWorks’ Deep Learning Toolboxes and Reinforcement Learning Toolboxes refines walking, creating a feedback loop where optimised gaits inform the central controller, executed by local loops for natural, robust movement. In recent years, these combined approaches have transformed humanoid robot walking from stiff, mechanical motions to fluid, human-like gaits, paving the way for practical applications in diverse environments. World Robot Competition Mecha Fighting Series China Media Group held the World Robot Competition Mecha Fighting Series to showcase humanoid robotics technology. The robots were teleoperated by humans, but the robots autonomously provided balance and other basic functions. For details, see Fig.14 and https:// youtu.be/N7UxGVV_Fwo Artificial general intelligence We hear about artificial intelligence all the time but there is another concept beyond that, called artificial general intelligence (AGI). This is where a machine can emulate human intelligence in terms of self-learning (far beyond the ‘training’ of AI), reasoning, understanding and problem solving; even understanding and emulating human emotions. Humanoid robots endowed with AGI would be capable of great mischief in the wrong hands; this is the subject of many dystopian science fiction movies, such as The Terminator and I, Robot. To protect against such dystopian scenarios, in 1942, Isaac Asimov devised the Three Laws of Robotics and later added another one, although these have been criticised as not being a comprehensive ethical framework to govern the behaviour of intelligent robots. Still, they are a good starting point (see the panel). Experts agree that AGI has not been achieved yet, but at the current rate of progress, who knows when it could arrive. In 1949, Alan Turing proposed a test of intelligence behaviour known as the Turing test. This involves a human engaging in a text-based conversation Glossary of Terms AI – Artificial Intelligence; machines simulating human intelligence, such as learning, reasoning and problem-solving ANN – Artificial Neural Network; computational models inspired by human brains, used in machine learning ASIC – Application-Specific Integrated Circuit; a custom-designed chip optimised for a specific function or task CNN – Convolutional Neural Network; deep learning models optimised for vision, detecting edges, shapes and patterns CPU – Central Processing Unit; a general- purpose processor that executes instructions & manages computing tasks DoF – Degrees of Freedom; independent movements a robot joint or mechanism can perform End Effector – a tool/device at a robotic arm’s end that interacts with objects FPGA – Field-Programmable Gate Array; a chip programmable for specific hardware tasks post-manufacturing GPU – Graphics Proccessing Unit; a processor specialised for highly parallel tasks like machine learning LLM – Large Language Model; an AI model trained on massive text datasets to generate or understand language Multimodal – An AI that processes and integrates multiple data types (text, images, audio, video etc) Neuromorphic Processor – a chip that uses artificial neurons to mimic the human brain NLP – Natural Language Processing; an AI’s ability to understand, interpret and generate human language Organoid – a simplified version of an organ designed to imitate it RTOS – Real-Time Operating System; an operating system that guarantees timely processing for critical tasks Tactel – Tactile Element; a sensor element that detects touch, pressure or texture information Teleoperation – operating a machine remotely TPU – Tensor Processing Unit; a Google- designed chip optimised for accelerating machine learning workloads. Transformer – a neural network architecture that uses attention to process sequential data efficiently VLA – Vision-Language Action; an AI that combines visual input and language to perform actions or tasks VLM – Vision-Language Model; an AI that Fig.14: two Unitree G1 robots fighting in the Mecha Fighting Series. Source: China Media Group. siliconchip.com.au Australia's electronics magazine combines image understanding with text comprehension and generation November 2025 25 with either a machine or another human, and determining if they can distinguish between the two. If the human cannot distinguish between the two, the computer is deemed to display true intelligence. In 2022, ChatGPT-4 passed a rigorous implementation of the Turing test, the first time a computer did so, leading some to speculate that the Turing test was not a strict enough test for machine intelligence. Since then, uncanny valley moving still humanoid robot healthy person bunraku puppet affinity stuffed animal 50% corpse Ethics Clearly, AI and robotics are improving by the day, and it won’t always be for the good of humankind. Consider a mass-produced army of military robots produced by a hostile power, or robots used for crime and violence. As John Connor said of The Terminator, “It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity or remorse or fear and it absolutely will not stop”. We are not at that stage yet, but it may happen within the lifetimes of many readers, maybe even within ten years. Humanoid robots and artificial limbs industrial robot human likeness other systems like LLaMa-3.1 and GPT4.5 have also passed Turing tests. 100% prosthetic hand zombie Fig.15: the ‘uncanny valley’ describing the possible emotional response to various humanoid robots compared to their likeness to humans. One curve is for a moving robot, the other for a still one. They both become significantly negative before reaching the positive response to a human. Source: https://w. wiki/EoPq The development of humanoid robots also has benefits for artificial limbs for humans, as the basic design of a human-like limb for a robot will also be suitable for use with humans. Our article about Artificial/Prosthetic Limbs in March 2025 (siliconchip.au/ Article/17782) discussed this. The limbs of Tesla’s Optimus have been proposed for this purpose. The uncanny valley The “uncanny valley” is a psychological response to humanoid robots at various levels of realism, developed by Japanese roboticist Masahiro Mori. It speculates that a robot which is ‘almost human’ in appearance will elicit an eerie response that a more human or less human looking robot would not – see Fig.15. Another example could be the Ameca robot by Engineered Arts, due to its very realistic facial motions (see Fig.16; https://engineeredarts.com/ robot/ameca). Or the Sophia robot by Hanson Robotics (see Fig.17; www. hansonrobotics.com/sophia). There is some experimental evidence that this phenomenon is real. It suggests that certain design elements should be incorporated into humanoid robots to avoid revulsion (for example, making them look clearly different from people). Next month Fig.16: Ameca is a robot with a realistic-looking head developed by Engineered Arts. Source: https:// engineeredarts.com/robot/ameca/ 26 Silicon Chip Fig.17: another robot with a realisticlooking head is Sophia by Hanson Robotics. Source: https://www. hansonrobotics.com/sophia/ Australia's electronics magazine The second half of this series will be published next month. It will describe notable historical and current humanoid robots, like those shown on the SC lead page. siliconchip.com.au