[WIP: June2023] Deep Q-Learning using TorchSharp #710

GeorgeS2019 · 2022-08-27T07:25:38Z

GeorgeS2019
Aug 27, 2022

June 2023

mindmap
  root((Reinforcement<br/>Learning))
    Definitions
      Interactions
          Environment
          Agent
      Elements
         State
         Action
         Strategy<br/>策略
             Deterministic Policy<br/>确定性策略
             Stochastic Policy<br/>随机性策略
         State transfer probability<br/>状态转移概率
         Rewards<br/>即时奖励
      Others
        Episodes
        Trial
        Continuing Tasks
    Policy
      Policy based learning
      Value based learning
        Monte Carlo learning
             Temporal Difference Learning
                 SARSA<br/>State Action Reward State Action
                 QLearning
        Dynamic programming learning
            Policy iteration algorithm
                Policy Evaluation
                Policy Improvement
            Value iteration algorithm
    Markov Decision Process
      Markov Decision Process<br/>马尔科夫决策过程
           Trajectory<br/>轨迹
      Markov Process<br/>马尔科夫过程
   Objective Functions

Feb 2023

https://www.youtube.com/watch?v=217tCMsZu0I

GeorgeS2019 · 2023-02-12T08:22:39Z

GeorgeS2019
Feb 12, 2023
Author

Below is from ChatGPT (Not tested yet, here for further discussion):
Deep Q-Learning is a reinforcement learning algorithm that uses a deep neural network to approximate the Q-function. Here's a sample code in TorchSharp that implements a Deep Q-Learning algorithm to solve the CartPole environment in the OpenAI Gym:

using Torch;
using System;

namespace DeepQLearning
{
    class Program
    {
        static void Main(string[] args)
        {
            // Define the neural network
            var model = new Sequential();
            model.Add(new Linear(4, 128));
            model.Add(new ReLU());
            model.Add(new Linear(128, 2));

            // Define the optimizer
            var optimizer = new Adam(model.Parameters(), 0.001);

            // Define the loss function
            var loss = new MSELoss();

            // Define the environment
            var env = Gym.Make("CartPole-v0");

            // Train the model
            int episodeCount = 1000;
            int stepsPerEpisode = 200;
            int maxSteps = episodeCount * stepsPerEpisode;
            int stepCount = 0;
            int episode = 0;
            while (stepCount < maxSteps)
            {
                env.Reset();
                for (int step = 0; step < stepsPerEpisode; step++)
                {
                    // Get the current state
                    var state = env.Observation;

                    // Choose an action based on the current state
                    var tensor = new Tensor(state, new[] { 1, state.Length });
                    var qValues = model.Forward(tensor);
                    var action = qValues.Max().Item2;

                    // Take the action and observe the result
                    var result = env.Step(action);
                    var nextState = result.Observation;
                    var reward = result.Reward;
                    var done = result.Done;

                    // Calculate the target Q-value
                    var target = qValues.Clone();
                    if (done)
                    {
                        target[0, action] = reward;
                    }
                    else
                    {
                        var nextTensor = new Tensor(nextState, new[] { 1, nextState.Length });
                        var nextQValues = model.Forward(nextTensor);
                        var maxNextQ = nextQValues.Max().Item1;
                        target[0, action] = reward + 0.99 * maxNextQ;
                    }

                    // Calculate the loss and update the model
                    tensor.Reshape(new[] { 1, 4 });
                    var output = model.Forward(tensor);
                    optimizer.ZeroGrad();
                    var l = loss.Forward(output, target);
                    l.Backward();
                    optimizer.Step();

                    // Update the step count
                    stepCount++;

                    // Check if the episode is done
                    if (done)
                    {
                        break;
                    }
                }

                // Print the episode number
                Console.WriteLine("Episode: " + episode);
                episode++;
            }

            // Close the environment
            env.Close();
        }
    }
}

This code defines a neural network with two fully connected layers and an Adam optimizer. The network takes a state as input and outputs the Q-values for each action. The loss function is the mean squared error

1 reply

NiklasGustafsson Feb 22, 2023
Maintainer

Not tested yet, here for further discussion

As far as TorchSharp goes, I wouldn't trust ChatGPT to get anything right. It hasn't seen enough code to train on, and so it's just making stuff up.

KonstantinAlexeevich · 2024-06-12T17:21:32Z

KonstantinAlexeevich
Jun 12, 2024

I think I reproduced solution from lecture in my repo hear.
Welcome to any suggest.

0 replies

NiklasGustafsson · 2024-06-12T21:40:01Z

NiklasGustafsson
Jun 12, 2024
Maintainer

This is cool. I still hope we can build out a gym in .NET and maybe some shareable components for Q-learning. I don't have the expertise or experience to do that, but it'd be very cool.

5 replies

GeorgeS2019 Jun 12, 2024
Author

@NiklasGustafsson
Gym.NET is working.
what is missing is how to have SB3 based on TorchSharp.

we now have at least two projects using TorchSharp on top of Gym.NET (partial implementation)

#1333

asieradzk Jun 19, 2024

This is cool. I still hope we can build out a gym in .NET and maybe some shareable components for Q-learning. I don't have the expertise or experience to do that, but it'd be very cool.

I'd like to help any way I can. I am building my own deep reinforcement learning lib with TorchSharp and using it for my own research, would be nice to integrate more closely.

https://github.com/asieradzk/RL_Matrix

I have distributed DRL, godot plugin and shiny DQN-Rainbow & PPO for 1d and 2d. Ease of swapping algos and options is key for me.

GeorgeS2019 Jun 20, 2024
Author

@asieradzk
You are right now doing Master or Ph.D. Not sure which one.
One of these ask you for your contributions to the communities to justify eventually why you deserve to be granted the degree you seek.

You are still new in your journey.

The key of your training is to focus on the contributions await you.

First, you need to trust that the recognition of your contribution must be earned.

Change your project licensing to MIT, this will free yourself from the constraints that limit your potential.

Second, you are in the right project and there is great need to what you are doing.

Seize this opportunity and address the gap. No one is dared to complete the porting of Stable-baselines3 to c# using TorchSharp. Previous attempts have been tried, all failed.

It is up to you to take up this challenge and if you manage to do it, present that achievement during your oral defense for your degree.

asieradzk Jun 20, 2024

@GeorgeS2019 My lib is already (in some aspects) better than SB3. They dont have DQN-Rainbow or distributed DRL. So I am not porting SB3 to C#. Nobody needs another port of SB3, RLMatrix is geared towards use in game engines & asp.net core applications.

I am writing my own code to the best of my ability in application of SOLID principles to reach feature parity (and eventually overcome) RLlib & Unity ML Agents (I am very close).

I am also aiming towards reaching better performance (iterations/time) than RLib & SB3.

If anything I am surprised that RLib, ML-Agents, SB3 need teams of researchers and years of work. I do equivalent of what they have accomplished ALONE with zero mentorship in much shorter time-frame. I think this just demonstrates superiority of C# in the development of deep reinforcement learning applications and algorithms.

NiklasGustafsson Jun 20, 2024
Maintainer

The appropriate place to comment on the licensing choices of a particular repository is in that repository. Let's keep the discussion in this forum focused on the relevance (or irrelevance) of TorchSharp to the domain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP: June2023] Deep Q-Learning using TorchSharp #710

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[WIP: June2023] Deep Q-Learning using TorchSharp #710

GeorgeS2019 Aug 27, 2022

June 2023

Feb 2023

Replies: 3 comments · 6 replies

GeorgeS2019 Feb 12, 2023 Author

NiklasGustafsson Feb 22, 2023 Maintainer

KonstantinAlexeevich Jun 12, 2024

NiklasGustafsson Jun 12, 2024 Maintainer

GeorgeS2019 Jun 12, 2024 Author

asieradzk Jun 19, 2024

GeorgeS2019 Jun 20, 2024 Author

asieradzk Jun 20, 2024

NiklasGustafsson Jun 20, 2024 Maintainer

GeorgeS2019
Aug 27, 2022

Replies: 3 comments 6 replies

GeorgeS2019
Feb 12, 2023
Author

NiklasGustafsson Feb 22, 2023
Maintainer

KonstantinAlexeevich
Jun 12, 2024

NiklasGustafsson
Jun 12, 2024
Maintainer

GeorgeS2019 Jun 12, 2024
Author

GeorgeS2019 Jun 20, 2024
Author

NiklasGustafsson Jun 20, 2024
Maintainer