Secure Artificial Intelligence – Adversarial Examples

Secure Artificial Intelligence – Adversarial Examples

As Andrew Ng said machine learning is the electricity of the 21. century, it has a huge potential. You can’t even point out a field inside the economy, where artificial intelligence wouldn’t be effectively applicable. Well actually there is a field where the utilization of artificial intelligence models can be tragic, inside security critical systems.   

In the current state of the artificial intelligence technology, adversarial attacks are far more advanced and powerful than defenses. So machine learning in safety critical system isn’t a good idea if you only rely on the artificial intelligence method and nothing else. It turned out that machine learning methods and neural networks used to underfit the training data and let a huge latent space uncovered, where the intentionally calculated adversarial examples are able to fool the models.

An adversarial example is a data, which is used the same way as the regular collected data. It goes into the input of the classifier as the regular data samples, and the model predicts the result as usual.

But regular data is collected from real life through sensors and IOT devices. Adversarial examples don’t come from this collection. An adversarial example is a sample which is directly adjusted to be misclassified by a classifier model.

Here is an example for better understanding:

60 % Panda                                                                                          99.9 % Gibbon

A demonstration of fast adversarial example generation applied to GoogLeNet (Szegedy et al., 2014a) on ImageNet. By adding an imperceptibly small vector whose elements are equal to the sign of the elements of the gradient of the cost function with respect to the input, we can change GoogLeNet’s classification of the image. Here our epsilon of .007 corresponds to the magnitude of the smallest bit of an 8 bit image encoding after GoogLeNet’s conversion to real numbers. [1]

The GoogLeNet which is a convolutional neural network classified the left panda image to panda with a 60% confidence, and classified the right panda to gibbon with 99.9% confidence. Obviously it’s a big mistake.

For a human observer the image on the left looks exactly identical to the image on the right. But in reality the left panda is a real conventionally taken image, but the right panda isn’t. The right panda is the result of the addition above, where the noise is intentionally computed.

An adversarial example isn’t really specialized for one classification model, and a single same adversarial example is able to fool multiple different kind of models.

Cross-technique Transferability matrix: cell (i, j) is the percentage of adversarial samples crafted to mislead a classifier learned using machine learning technique i that are misclassified by a classifier trained with technique j. [2]

It’s turned out through the research, that many different model are misclassifying the single same adversarial example, and they assign the same class to it, which led to the conclusion, that we face with an underfitting problem. Another fact confirms the previous presumption. That if one takes the difference between an original example and an adversarial example, than one gets a direction in input space, which is a vector. And if one add that same vector to a totally different original sample than the result is again an adversarial example. This is a systematic effect, not just a random effect, as it would be in case of overfitting. The input-output mapping tend to be linear, compared to the parameters-output mapping which is highly non-linear.

Blue arrow is the direction vector, it’s the result of an original sample and an adversarial example difference.

Models are generalizing well to a naturally occurring sets, because these very linear patterns can fit the naturally occurring training data and even generalize to the naturally occurring test data. Where the training data comes from it’s an exact distribution with distribution specialized properties. Machine learning models have learned to solve well any example that comes from the same distribution as the training data, in a way that the models are watching properties which aren’t distribution independent, but a special property of that exact distribution. This distribution is just a little slice of the cake, then if a somebody intentionally shifts the test distribution and sample from it, than very easy to fool all of the known machine learning models. Don’t get me wrong, if we test them on naturally occurring data sets the models are correct almost all the time.

Aversarial examples can be used to compromise machine learning systems

If somebody wants to fool a model, but:

  • don’t have access to the model.
  • don’t know the architecture that’s used.
  • don’t know which algorithm is being used.
    • don’t know whether it’s a support vector machine model or a deep neural net.
  • don’t know the parameters of the model.

One way is to fool the model, if you have limited access to the model which means you have the opportunity to send inputs to the model and observe its outputs, is that you send those inputs and collect the outputs for using these pairs as your training data set. After training the model, adversarial examples can be made for that model. Thus those examples are very likely to transfer and fool the target model as well.


  • Generative pretraining
  • Removing perturbation with autoencoder
  • Adding noise at test time
  • Ensembles
  • Confidence-reducing perturbation at test time
  • Error correcting codes
  • Multiples glimpses
  • Weight decay
  • Double backpropagation
  • Dropout
  • Various non-linear units

It’s the cutting edge of artificial intelligence, and the problem is not solved yet. The community don’t have a straight defense against these kind of attacks, but there are a couple of attempts like to develop one:

  • Getting the right posterior distribution of the class label of y given inputs x.
  • Train on adversarial examples.
  • and so on..

If this problem will be solved, a huge field of opportunities will be open.

As Ian Goodfellow said: “If we’re able to do model-based optimization, we’ll be able to write down a function that describes a thing that doesn’t exist yet but we wish that we had.”

It will automatically design new genes, new molecules, new medicines, new 3D designs in every 3D  design field, drug discovery etc. … without any human engineering.

We are confident that inside your company there are a lot of tasks which can be automated with AI: In case you would like to enjoy the advantages of artificial intelligence, then apply to our free consultation on one of our contacts.


[1]: I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,”. arXiv preprint: 1412.6572, 2015.

[2]: N. Papernot, P. McDaniel, I. J. Goodfellow, “Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples”. arXiv preprint: 1605.07277, 2016.

Close Menu