Intrinsic motivations | The particular case of Empowerment

Intrinsic motivations

The Particular case of Empowerment

Intrinsic motivations | The particular case of Empowerment
  1. Introduction
    • Setting the stage for the discussion on motivations in AI.
  2. Extrinsic vs Intrinsic Motivations
    • Differentiating between external rewards and internal drivers.
  3. What is Information
    • Understanding the role of information as a measure of uncertainty.
  4. Reinforcement Learning and Intrinsic Motivations
    • Exploring the challenges and the role of extrinsic motivations in reinforcement learning.
  5. Empowerment Control
    • Delving into empowerment as a specific case of intrinsic motivation and its implications for control systems.
  6. Theoretical Framework
    • Discussing foundational concepts such as Entropy, Mutual Information, and their relevance to empowerment.
  7. Calculating Empowerment
    • Detailed exploration of how empowerment is computed, including the application to the inverted pendulum.
  8. Conclusion
Intrinsic motivations | The particular case of Empowerment

Introduction

Intrinsic motivations | The particular case of Empowerment

Extrinsic vs Intrinsic Motivations

  • Extrinsic Motivations
    • External rewards
    • Money, grades, praise, etc.
  • Intrinsic Motivations
    • Internal rewards
    • Enjoyment, curiosity, etc.
Intrinsic motivations | The particular case of Empowerment

Reinforcement learning and intrinsic motivations

Problems

  • Exploration vs Exploitation
  • Curse of dimensionality
  • Sparse rewards
  • Reward missalignment
  • Shortsigthed optimization
rl_fail Reinforcement learning - attempt to learn to walk
Intrinsic motivations | The particular case of Empowerment
corkscrew

The Corkscrew

Intrinsic motivations | The particular case of Empowerment

What is information

  • Information is a measure of uncertainty (surprise)
  • It is a measure of how much we learn from an event

Where H is the entropy of the random variable X, P(x) is the probability of the event x

Intrinsic motivations | The particular case of Empowerment

Theoretical framework

Intrinsic motivations | The particular case of Empowerment

Entropy (H(X))

  • Definition: A measure of unpredictability or information content.
  • Formula:
  • Interpretation: The average unpredictability in the outcomes of the random variable X; higher entropy means more unpredictability.
Intrinsic motivations | The particular case of Empowerment

Mutual Information (I(X; Y))

  • Definition: A measure of the amount of information that one random variable contains about another random variable.
  • Formula:
  • Interpretation: The reduction in uncertainty of X due to the knowledge of Y; a mutual information of zero means X and Y are independent.
Intrinsic motivations | The particular case of Empowerment

Relation between Entropy and Mutual Information

  • Mutual Information can also be expressed in terms of entropy:

  • Formula:

  • The diagram below illustrates this relationship:

Intrinsic motivations | The particular case of Empowerment

Mutual Information to Entropy

Mutual Information (I(X; Y)) can be expanded to show its relationship with Entropy (H(X)):

  • Initial Expression:

  • Expansion:

  • Simplification:

  • Final Form:

Intrinsic motivations | The particular case of Empowerment

Step-by-Step Explanation

  1. Start with the definition of mutual information.
  2. Split the log term into two, using properties of logarithms.
  3. Recognize the definition of expected value for entropy and conditional entropy .
  4. Observe that mutual information is the difference between the entropy of X and the entropy of X given Y.
Intrinsic motivations | The particular case of Empowerment

Implication

  • This shows that mutual information is essentially the amount of uncertainty in X that is reduced by knowing Y.
  • It quantifies how much knowing one variable informs us about another.
Intrinsic motivations | The particular case of Empowerment

Example Calculation

  • Consider a fair die with outcomes 1 through 6, each with an equal probability of .
  • Entropy for the die can be calculated as:
  • If we know the outcome of the die is even, mutual information where Y is the event of being even can be calculated.
Intrinsic motivations | The particular case of Empowerment

Channel Capacity

Gaussian channels are a class of channels that are widely used in information theory.

Noise is additive and Gaussian distributed.

Channel capacity can be expressed as the maximum mutual information between the input and output of the channel.

Intrinsic motivations | The particular case of Empowerment

Water Filling Algorithm

The water-filling algorithm is used to find the optimal power allocation for a gaussian channel. It is used to maximize the mutual information between the input and output of the channel.

Intrinsic motivations | The particular case of Empowerment

Water Filling Algorithm

The optimal power allocation is given by the following optimization problem:

  • is the optimal power allocated to the i-th channel.

  • is ensures the total power constraint is met.

  • is the noise power on the(i)-th channel.

  • is the water level, which is a value that ensures the total power used is equal to the total available power.

Intrinsic motivations | The particular case of Empowerment

Water Filling Algorithm

This metaphorical ”water filling” ensures that channels with lower noise levels receive more power because they can transmit information more effectively. Channels with higher noise
levels receive less power, as they contribute less to the overall capacity.

The Channel capacity is the sum of the capacities of each channel.

Intrinsic motivations | The particular case of Empowerment

Calculating Empowerment

Intrinsic motivations | The particular case of Empowerment

Linear Response Approximation

Linear response approximation is a powerful concept used in control theory. It allows us to approximate the dynamics of a system linearly around a set point, typically a null action or equilibrium. This approach not only facilitates the analysis of the exact trajectory but also enables us to understand the influence of small changes in the control signal on the system's evolution.

Intrinsic motivations | The particular case of Empowerment

Fundamental Notation

The state of the system can be expressed as:

Where:

  • is the state of the system at time .
  • represents the system dynamics.
  • symbolizes the control dynamics.
  • is the change in the control signal at time .
Intrinsic motivations | The particular case of Empowerment

Recursive Mapping and Sensitivity

The recursive mapping from to can be denoted by .

Sensitivity of to is calculated by iteratively applying the differentiation chain rule:

Where is the Jacobian matrix of the system dynamics with respect to the state at time .

Senstivity is a measure of how much the state of the system changes in response to a change in the control signal.

Intrinsic motivations | The particular case of Empowerment

Applying empowerment to the inverted pendulum

Intrinsic motivations | The particular case of Empowerment

System Dynamics

The dynamics equations for the inverted pendulum can be defined by:

Where:

  • \(\theta(t)\) is the angle of the pendulum.
  • \(\dot{\theta}(t)\) is the angular velocity of the pendulum.
  • \(d(t)\) is the control signal.
  • \(g\) is the acceleration due to gravity.
  • \(l\) is the length of the pendulum.
  • \(m\) is the mass of the pendulum.
  • \(W(t)\) is the Wiener process.
Intrinsic motivations | The particular case of Empowerment

Control Strategy

For a duration , a control signal , stochastically chosen, is applied and we observe the final state .

Intrinsic motivations | The particular case of Empowerment

Compute channel capacity

The channel capacity is calculated using the following equation:

Where:

  • is the sensitivity of the state to the control signal - eigen values of the Jacobian matrix.
    (Obtained using SVD decomposition)

  • is the power allocated to the i-th channel.
    (Calculated using the water filling algorithm)
Intrinsic motivations | The particular case of Empowerment

Empowerment-Based Control

We apply an empowerment-based control algorithm to the pendulum, which is given by the following equation:

Intrinsic motivations | The particular case of Empowerment

Conclusion

Intrinsic motivations | The particular case of Empowerment

Conclusion

  • Improves decision-making & adaptability
  • Contributes to theoretical & practical AI development
  • Future Direction: Integration in complex systems (e.g., humanoid robots)
  • Proof of Concept: Successful in pendulum control systems

This study highlights the promising path of empowerment for advancing robotics and autonomous systems.

update design

robotics example

insert gif of rl failure

H is the entropy of the random variable X, P(x) is the probability of the event x

information is measured in bits and as you can see by the use of the log function, it is a logarithmic measure, meaning that common events are not very informative and rare events are very informative

However, for any probability distribution, we define a quantity called the entropy, which has many properties that agree with the intuitive notion of what a measure of information should be

mutual information is the reduction in uncertainty of X due to the knowledge of Y. It is the difference between the entropy of X and the conditional entropy of X given Y.

this form of entropy is known as shannon entropy

delta x - change in the state vector between two points

F is the sensitivity matrix, how the change in control parameter affects the change in state x

a change in control vector is applied to the system and the change in state is observed with the noise

the sensitivity matrix is calculated using the svd decomposition of the jacobian matrix