Mein MATLAB Forum - goMatlab.de

Mein MATLAB Forum

 
Gast > Registrieren       Autologin?   
Bücher:

Fachkräfte:
weitere Angebote

Partner:


Vermarktungspartner


Forum
      Option
[Erweitert]
  • Diese Seite per Mail weiterempfehlen
     


Gehe zu:  
Neues Thema eröffnen Neue Antwort erstellen

Reinforcement Learning Erstellen eines Environment realer Te

 

ptichelm
Forum-Newbie

Forum-Newbie


Beiträge: 1
Anmeldedatum: 30.04.19
Wohnort: Gummersbach
Version: ---
     Beitrag Verfasst am: 02.05.2019, 08:21     Titel: Reinforcement Learning Erstellen eines Environment realer Te
  Antworten mit Zitat      
Code:
close all; clear all;clc; format compact;
%% Definition der Parameter
LinkerRand  = 100;
RechterRand = 1700;
SMitte      = (RechterRand-LinkerRand)/2+LinkerRand;   % Mitte
learnRate    = 0.8;
beta         = 1;
discount     = 0.9;
epsilon      = 0.6;  
epsilonDecay = 0.99999;
maxEpisode   = 9999 ;
maxIteration = 2000;
Pause        = 60;
%Action       = 1;
Boni         = 0;
Lernen       = true;
numObs       = 3;
numAct       = 3;

%% Verbinden mit dem SiemensKommunikationsClienten
% Mit Client verbinden:
uaclient = opcua('opc.tcp://192.168.0.1:4840','none','none',uint8(0));
connect(uaclient);

%---------------------------------------------------
% y-Achse (links-rechst) Auslesen
%---------------------------------------------------
yActualPosition = opcuanode(3,"""Y-Achse"".Base.ActualPosition",uaclient);
yActualVelocity = opcuanode(3,"""Y-Achse"".Base.ActualVelocity",uaclient);
%---------------------------------------------------
% y-Achse (links-rechst) Schreiben
%---------------------------------------------------
% Geschwindigkeit
yVelocity  =opcuanode(3,"""G_DB"".""DB_Var_Y_Preset_manuel_Vel""",uaclient);
% Position Definieren
yPosition =opcuanode(3,"""Control_Y-Achse_DB"".""Manueller_Betrieb_Position_Y-Achse""",uaclient);
% Auf bestimmte Position fahren, Ausführen
yMoveAbsoluteEnable =opcuanode(3,"""Control_Y-Achse_DB"".""Enable_absolute_Y-Achse""",uaclient);
% Auf bestimmte Position fahren, Ausführen
yMoveAbsoluteExecute =opcuanode(3,"""Control_Y-Achse_DB"".""MC_MOVEABSOLUTE_Y-Achse"".""Execute""",uaclient);
% Anhalten
yHaltExecute =opcuanode(3,"""Control_Y-Achse_DB"".""MC_HALT_Y-Achse"".Execute",uaclient);
% Beschleunigung
yAcceleration=opcuanode(3,"""G_DB"".""DB_Var_Y_Preset_manuel_Accel""",uaclient);
yDecceleration=opcuanode(3,"""G_DB"".""DB_Var_Y_Preset_manuel_Decel""",uaclient);
%---------------------------------------------------
% u-Achse (Drehachse) Auslesen
%---------------------------------------------------
% Winkel (0...360°)
uActualPosition = opcuanode(3,"""U-Achse"".Base.ActualPosition",uaclient);
% Winkelgeschwindigkeit (°/Minute)
uActualVelocity = opcuanode(3,"""U-Achse"".Base.ActualVelocity",uaclient);

yFehlerLoeschen=opcuanode(3,"""Control_Y-Achse_DB"".""MC_RESET_Y-Achse"".Execute",uaclient);
writeValue(uaclient,yFehlerLoeschen,true)
writeValue(uaclient,yFehlerLoeschen,false)

%% Rewardverteilung
Faktor   =  0.1;
Beta1   =-20;
Sigma1   =  1;
Kappa   =  1.1;
Tic     =  0.7;
Beta2   = -1;
Sigma2   =  6;
Amp     =  0.1;
Offset   =-40;

rewardFunc=@(u,udot,y) (Faktor*(-Beta1*(exp(-(((u-1.8).^2)/Sigma1)))-(Beta1*(exp(-(((0).^2)/Sigma1))))-(-(abs(u-1.8)-Kappa)/Tic)*(Beta2*(exp(-(((udot).^2)/Sigma2)))-(Beta2*(exp(-(((0).^2)/Sigma2)))))+Amp*abs(y)*(-1)+Offset));

%% ########################################################################################################################
%  Start learning!    
%  ########################################################################################################################
       
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% OBSERVATION AND ACTION SPECIFICATIONS  
    ObservationInfo = rlNumericSpec([numObs 1]);
    ObservationInfo.Name = 'Pendel States';
    ObservationInfo.Description = 'theta, dtheta, x';    
   
    ActionInfo = rlFiniteSetSpec([-1 0 1]);
    ActionInfo.Name = 'Pendel Action';
 
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% ENVIRONMENT
    env = rlFunctionEnv(ObservationInfo,ActionInfo,myStepFunction,myResetFunction);  

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% Definition des Neuronalen Netzes "ACTOR"
    %-------------------------------------------------------------
    actorLayerSizes = [40 30];
    actorNetwork = [
    imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(actorLayerSizes(1), 'Name', 'ActorFC1', ...
            'Weights',2/sqrt(numObs)*(rand(actorLayerSizes(1),numObs)-0.5), ...
            'Bias',2/sqrt(numObs)*(rand(actorLayerSizes(1),1)-0.5))
    reluLayer('Name', 'ActorRelu1')
    fullyConnectedLayer(actorLayerSizes(2), 'Name', 'ActorFC2', ...
            'Weights',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),actorLayerSizes(1))-0.5), ...
            'Bias',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),1)-0.5))
    reluLayer('Name', 'ActorRelu2')
    fullyConnectedLayer(numAct, 'Name', 'ActorFC3', ...
            'Weights',2*5e-3*(rand(numAct,actorLayerSizes(2))-0.5), ...
            'Bias',2*5e-3*(rand(numAct,1)-0.5))                      
    tanhLayer('Name','ActorTanh1')
    ];

    % Create actor representation
    actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, ...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5);
    % if useGPU
    %    actorOptions.UseDevice = 'gpu';
    % end
    actor = rlRepresentation(actorNetwork,actorOptions, ...
                         'Observation',{'observation'},env.getObservationInfo, ...
                         'Action',{'ActorTanh1'},env.getActionInfo);
    % %-------------------------------------------------------------
    % %Laden des trainierten Netzes:
    % load(FilenameActorNet)

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% Definition des Neuronalen Netzes "CRITIC"
    %-------------------------------------------------------------
    criticLayerSizes = [40 30];
    statePath = [
    imageInputLayer([numObs 1 1],'Normalization','none','Name', 'observation')
    fullyConnectedLayer(criticLayerSizes(1), 'Name', 'CriticStateFC1', ...
            'Weights',2/sqrt(numObs)*(rand(criticLayerSizes(1),numObs)-0.5), ...
            'Bias',2/sqrt(numObs)*(rand(criticLayerSizes(1),1)-0.5))
    reluLayer('Name','CriticStateRelu1')
    fullyConnectedLayer(criticLayerSizes(2), 'Name', 'CriticStateFC2', ...
            'Weights',2/sqrt(criticLayerSizes(1))*(rand(criticLayerSizes(2),criticLayerSizes(1))-0.5), ...
            'Bias',2/sqrt(criticLayerSizes(1))*(rand(criticLayerSizes(2),1)-0.5))
    ];
    actionPath = [
    imageInputLayer([numAct 1 1],'Normalization','none', 'Name', 'action')
    fullyConnectedLayer(criticLayerSizes(2), 'Name', 'CriticActionFC1', ...
            'Weights',2/sqrt(numAct)*(rand(criticLayerSizes(2),numAct)-0.5), ...
            'Bias',2/sqrt(numAct)*(rand(criticLayerSizes(2),1)-0.5))
    ];
    commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu1')
    fullyConnectedLayer(1, 'Name', 'CriticOutput',...
            'Weights',2*5e-3*(rand(1,criticLayerSizes(2))-0.5), ...
            'Bias',2*5e-3*(rand(1,1)-0.5))
    ];

    % Connect the layer graph
    criticNetwork = layerGraph(statePath);
    criticNetwork = addLayers(criticNetwork, actionPath);
    criticNetwork = addLayers(criticNetwork, commonPath);
    criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
    criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

    % Create critic representation
    criticOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3, ...
                                            'GradientThreshold',1,'L2RegularizationFactor',2e-4);
    % if useGPU
    %    criticOptions.UseDevice = 'gpu';
    % end
    critic = rlRepresentation(criticNetwork,criticOptions, ...
                              'Observation',{'observation'},env.getObservationInfo, ...
                              'Action',{'action'},env.getActionInfo);

    %-------------------------------------------------------------
    %Laden des trainierten Netzes:
    % load(FilenameCriticNet)

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% DDPG Agent Options
    %-------------------------------------------------------------
    agentOptions = rlDDPGAgentOptions;
    agentOptions.SampleTime = 10;
    agentOptions.DiscountFactor = 0.99;
    agentOptions.MiniBatchSize = 256;
    agentOptions.ExperienceBufferLength = 1e6;
    agentOptions.TargetSmoothFactor = 1e-3;
    agentOptions.NoiseOptions.MeanAttractionConstant = 5;
    agentOptions.NoiseOptions.Variance = 0.4;
    agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;

    %% Training Options
    trainingOptions = rlTrainingOptions;
    trainingOptions.MaxEpisodes = 10000;
    trainingOptions.MaxStepsPerEpisode = 1000;
    trainingOptions.ScoreAveragingWindowLength = 100;
    trainingOptions.StopTrainingCriteria = 'AverageReward';
    trainingOptions.StopTrainingValue = 110;
    trainingOptions.SaveAgentCriteria = 'EpisodeReward';
    trainingOptions.SaveAgentValue = 150;
    trainingOptions.Plots = 'training-progress';
    trainingOptions.Verbose = true;
    % if useParallel
    %     trainingOptions.Parallelization = 'async';
    %     trainingOptions.ParallelizationOptions.StepsUntilDataIsSent = 32;
    % end
     
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% CREATE AND TRAIN AGENT
    agent = rlDDPGAgent(actor,critic,agentOptions);
    trainingResults = train(agent,env,trainingOptions)

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% SAVE AGENT
    reset(agent); % Clears the experience buffer
    curDir = pwd;
    saveDir = 'savedAgents';
    cd(saveDir)
    save(['trainedAgent_Pendel_' datestr(now,'mm_DD_YYYY_HHMM')],'agent','trainingResults');
    cd(curDir)

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %% CREATE ENVIRONMENT USING FUNCTION NAMES
    %%  Definition von step und reset function
    function [InitialObservation, LoggedSignal] = myResetFunction()

%     % Langsam in die Mitte fahren
    while SMitte ~= readValue(uaclient,yActualPosition)  
        writeValue(uaclient,yMoveAbsoluteEnable,true);
        writeValue(uaclient,yVelocity,300);      
        writeValue(uaclient,yPosition,SMitte);        
        writeValue(uaclient,yMoveAbsoluteExecute,true); writeValue(uaclient,yMoveAbsoluteExecute,false);
        writeValue(uaclient,yMoveAbsoluteEnable,false);
    end

    for i=1:Pause
        disp(['Pause ',num2str(Pause-i)])
        pause(1);
    end

%     % Auslesen des StartZustands
    z1(1,1) = round(readValue(uaclient,uActualPosition))/100;        
    z1(2,1) = round(readValue(uaclient,uActualVelocity),-3)/10000;        
    z1(3,1) = round(readValue(uaclient,yActualPosition) -LinkerRand -((RechterRand-LinkerRand)/2),-1)/100;
   
    % Return initial environment state variables as logged signals.
    LoggedSignal.State = [z1(1,1);z1(2,1);z1(3,1)];
    InitialObservation = LoggedSignal.State;

    end
   
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%  Definition von step und reset function
   
    function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)

    % Aktion wählen
    T = Action;
   
    % Übermitteln der Aktion an die Maschine
    writeValue(uaclient,yVelocity,1000);
    writeValue(uaclient,yAcceleration,6500);
    writeValue(uaclient,yDecceleration,6500);
    if T>0
        writeValue(uaclient,yPosition,RechterRand);
        writeValue(uaclient,yMoveAbsoluteEnable,true);  
        writeValue(uaclient,yMoveAbsoluteExecute,true);writeValue(uaclient,yMoveAbsoluteExecute,false);
    elseif T<0
        writeValue(uaclient,yPosition,LinkerRand);
        writeValue(uaclient,yMoveAbsoluteEnable,true);
        writeValue(uaclient,yMoveAbsoluteExecute,true);writeValue(uaclient,yMoveAbsoluteExecute,false);
    else
        writeValue(uaclient,yHaltExecute,true);
        writeValue(uaclient,yHaltExecute,false);
    end
    pause(0.01);
   
    %Auslesen des erreichten Zustands    
    z2(1,1) = round(readValue(uaclient,uActualPosition))/100;        
    z2(2,1) = round(readValue(uaclient,uActualVelocity),-3)/10000;        
    z2(3,1) = round(readValue(uaclient,yActualPosition) -LinkerRand -((RechterRand-LinkerRand)/2),-1)/100;  
 
    LoggedSignals.State = [z1(1,1);z1(2,1);z1(3,1)];
    % Transform state to observation
    NextObs = LoggedSignals.State;
   
    %Get Reward
    Reward = rewardFunc(z2(1,1), z2(2,1), z2(3,1));
   
    % Boni, Zenit erreicht?
    if 1.7<z2(1,1) &&  z2(1,1)<1.9  && abs(z2(2,1))<0.2
    Boni  = Boni+1;
    end
   
    % Check terminal condition
    % Winkelgeschwindigkeit zu hoch? Dann Abbruch der Episode!
    IsDone = abs(z2(2,1))>5;
   
    % Infos
%    disp([   'Epi=',num2str(episodes),9, 'Iter=',num2str(Iteration),9, 'Eps=',num2str(round(epsilon,6)),9,9, 'Akt=',num2str(aIdx),9,A,9, 'Boni=',num2str(Boni)])
    end


Das Programm soll mit einer Siemens SPS kommunizieren und den bekannten "Inverted Pendelum" Versuch durchführen an einer realen Maschine.

Dabei ergeben sich leider sehr viele Fehlermeldungen, die im ersten Schritt mit dem Einvironment zu tun haben. Dafür benötigt man bekanntermaßen eine reset func. und eine step func. Diese habe ich definiert und in den Code integriert.

Folgende Fehler entstehen:
1. Not enough input arguments.
Error in PendelAc_Cri_ReinforcementLearningV4>myStepFunction (line 258)
T = Action;

Keine Ahnung warum, ich komme nicht drauf.

2. In den functions muss die Kommunikation mit der SPS stehen. Ich muss dafür aber jedesmal alle Variablen wieder in die Function laden und eine neue Verbindung mit der SPS aufbauen. Kann man es so einrichten, dass die Verbindung dauerhaft besteht und die functions auf die workspace Variablen zugreifen?

3. Benötige ich ein solches Environment überhaupt, wenn ich einen realen VErsuch mache? Ich finde nichts im Netz darüber. Gefühlt simulieren alle immer nur, was auch i.d.R. super funktioniert. Aber leider finde ich nichts, wo jemand mal einen Code eines realen Versuchs online stellt.

Bin wirklich absolut für jede Hilfe dankbar.

Grüße
Private Nachricht senden Benutzer-Profile anzeigen


Neues Thema eröffnen Neue Antwort erstellen



Einstellungen und Berechtigungen
Beiträge der letzten Zeit anzeigen:

Du kannst Beiträge in dieses Forum schreiben.
Du kannst auf Beiträge in diesem Forum antworten.
Du kannst deine Beiträge in diesem Forum nicht bearbeiten.
Du kannst deine Beiträge in diesem Forum nicht löschen.
Du kannst an Umfragen in diesem Forum nicht mitmachen.
Du kannst Dateien in diesem Forum posten
Du kannst Dateien in diesem Forum herunterladen
.


goMatlab ist ein Teil des goForen-Labels
goForen.de goMATLAB.de goLaTeX.de


 Impressum  | Nutzungsbedingungen  | Datenschutz  | Werbung/Mediadaten | Studentenversion | FAQ | goMatlab RSS Button RSS


Copyright © 2007 - 2019 goMatlab.de | Dies ist keine offizielle Website der Firma The Mathworks
Partner: LabVIEWforum.de

MATLAB, Simulink, Stateflow, Handle Graphics, Real-Time Workshop, SimBiology, SimHydraulics, SimEvents, and xPC TargetBox are registered trademarks and The MathWorks, the L-shaped membrane logo, and Embedded MATLAB are trademarks of The MathWorks, Inc.