Dopamine-RL

Dopamine-RL

Nowadays, there are a lot of open source framework available on the internet which help’s implementing AI applications. The most of frameworks deal with supervised and unsupervised learning. Google just now released a new framework (based on popular Tensorflow) for Reinforcement learning. These cover name: DOPAMINE. In our post you will read about Reinforcement learning, and about the properties of the new framework.

Google was always on top of the libraries improvement. This fact is proved by the Tensorflow. Tensorflow is the base of popular APIs (e.g. Keras), which you can acquire on our courses. These libraries are effective on problems, where the data/label pairs are available. This type of learning known as supervised learning. We approximate the function between the input data, and the output labels, this means the “teaching” our model. We have learned classify a car on a picture in a same way. Our parents told us, that it illustrates a car, and we’ve understood it by a lot of example. This intelligent has drawbacks. It can never exceed the capability of its teacher, because it has just poor interface with the world.

Trial and Error Method [2]

The Reinforcement learning doesn’t require labeled dataset, it uses trial and error. We put an agent in to an environment, where it can freely experiment. The environment gives rewards for agent after every taken action. The main goal is archive the highest possible reward amount. We don’t tell them, what is the good solution, we leave it to get to know the environment. Our walk improved in that way. Our parents didn’t told us, what kind of signals do we have to send to our limbs. We have tried a billions of times, sometimes we failed, sometimes we walked. With this type of learning, an agent can exceed the human performance. The AlphaGo beaten the world champion in this way.

The problem is with supervised learning, that it requires huge amount of well labeled digital data, which isn’t available in every sector. In the financial sector all of the transactions are easily traceable, thus an accurate prediction is makeable. It isn’t the case in all real, or simulated environment (e.g. Chess), where the collection of complete covering examples are expensive, and time wasting.

The Reinforcement learning requires access to the environment, thus it can interact with that and receive reward (this reward can come just later e.g.: in Chess, the last step decide). Very common, that the agent interact first with a simulated environment, which has 2 main goal: Protection against irreversible side effects, and accelerate the learning process. For train an RL agent in a simulated environment, there are a few libraries on the internet, which includes the environments, and the agents too. This libraries doesn’t provide flexibility and stability, which slows down the optimization, and the development process.

Dopamine tries to solve this problems. The Google’s purpose was to develop a framework, which allows the developers to easily build models, and tries new parameters. Thus the developers can experiment fast, and it’s possible to achieve a better performance with a parameter combination, which was incorrectly discarded. The RAINBOW, which is the cutting edge RL agent is also part of this implementation. For the purpose of completeness, the traditional DQN is also implemented in the framework.

The time will determine, that DOPAMINE could became a widespread framework. But sure, that it will help for developers in optimization.

We are confident that inside your company there are a lot of tasks which can be automated with AI: In case you would like to enjoy the advantages of artificial intelligence, then apply to our free consultation on one of our contacts.

REFERENCES

[1] https://github.com/google/dopamine

[2] https://tempora-mutantur.co.uk/wp-content/uploads/2018/01 / Trial-and-error.png

Close Menu