Accessibility Tools

  • Content scaling 100%
  • Font size 100%
  • Line height 100%
  • Letter spacing 100%

AI Blog

Adversarial Attacks: Attacks on AI-based systems – Fiction or reality?

A post by Andreas Strunz

A highly exciting and comparatively recent topic is the question if and how systems using artificial intelligence are vulnerable to attacks and how to detect and protect against such attacks. In recent years, scientific research has made a number of contributions in this regard and, not least due to the increasing proliferation of AI systems, real-life practice is also beginning to address this issue.

Attempt at a definition 

Adversarial attacks are deliberate attacks that use modified data input ("adversarial examples") to manipulate machine learning and, in the narrow sense, deep learning models in the interests of an attacker so that the models either no longer function well or they are transformed to serve the attacker's purposes. The motives behind this are manifold. These could be fraudulent purposes, acts of sabotage or simply the desire to hack. 

What must be stressed, though, is that the attacker must have some kind of access to the system, be this through uploads or downloads, APIs (Application Programming Interface) or real existing objects (e.g., in image recognition). Systems within companies and organizations, whether on premise or in the cloud, must also be protected, but the opportunities for attacks are significantly lower compared with systems where AI is accessible to larger number of users as a service outside of a protected area. 

Evasion attacks, for example, aim to prevent or circumvent an effect desired by AI, such as an AI-supported spam filter. Poisoning is generally referred to when the goal is to contaminate clean data sets. And finally, privacy attacks aim to manipulate digital identities, such as access control. 

A distinction is also made between white box and black box attacks. In the former case, the attacker knows the data set and / or the AI model used, which is not the case for the latter. However, attackers can draw conclusions about the model from the returned results and thus refine their accuracy step by step with further attacks. Research has shown that such attacks can apply to various objects, such as text, audio, images, videos and control systems and also in principle to any kind of AI models. 

 

Some examples from research 

By adding a well-composed image noise, images used as input can be manipulated so that the AI now classifies the image as a gibbon instead of a panda. If you put special glasses on a person who was previously correctly recognized by image recognition, the AI now classifies this person as a famous actress. And finally, attempts were made to add markers to a stop sign so that the AI now classified it as a speed limit. These are examples from image processing, but audio and video sequences can also be manipulated so that entire "deep fake" videos can be created with just a few real examples. 

Using the highly simplified example of a neural network, it looks like this: Normally, an NN in the input layer is fed with a vector of numerical values (e.g., the RGB values of an image). Each of these neurons is then transmitted, with a weighting, to one or more hidden layers, where it is recalculated until finally, in the output layer, the AI leads to a classification, e.g., classification as a panda. This will not work well the first time, but if you have training data where you know the result in advance, you try to use back propagation to gradually optimize the weights and an added threshold above which a neuron "fires" so that the AI correctly classifies as large a percentage of the input images as possible during training. Between the delivered and the expected result lies an error whose value is optimized step by step towards a local or absolute minimum, for example using the stochastic gradient descent method. You can think of it as a ball rolling down a slope into the valley. 

In an adversarial attack, on the other hand, the attacker tries to make the ball roll in a different direction, so that pandas are less easily recognized as such and become gibbons instead. This is done by adding noise to an image, which is a slight change in the RGB values. This deviation, also known as the epsilon value, is barely noticeable to the human eye, but it is to a highly sensitive AI. This creates a trade-off between robustness and accuracy. The evaluation of the results of different models of the ImageNet challenge has shown that a model can be trimmed with high accuracy, but thereby reacts more sensitively to changes in the input data, which is at the expense of robustness. This is like comparing a racehorse with a draft horse in terms of the two dimensions robustness and speed. 

Conceivable cases 

Access controls, identity theft and digital fingerprints have already been mentioned above. But also voice bots or voice assistants such as Alexa, Siri and Cortana could also theoretically be manipulated with faked voices. Autonomous driving or the function of CCTV cameras could also be impaired through physical manipulations such as markers or "stealth streetware". Text manipulations are also conceivable, which then affect spam filters or SEO boosting (website ranking). An example for social media trolling already occurred in 2016 when Microsoft published the AI-controlled avatar Tay on Twitter to converse and create personalized profiles via this ChatBot. Targeted troll attacks turned the avatar into a hate speech bot so that Microsoft had to take the avatar offline after just 16 hours and 96,000 tweets. 

Is that now the end of AI? 

To come straight to the point. The answer is: “No, but...”. 

Many of these examples are in the research environment, where special experimental conditions prevail. It also takes a fair amount of specialist knowledge to carry out adversarial attacks, although this is not impossible for intelligence agencies or experienced hackers, for example. Furthermore, parallel to the research results, a number of frameworks and libraries such as cleverhans or IBM Clever Robustness Score have been established that can perform robustness analyses of AI systems and evaluate AI security. This is important for enabling the detection of an adversarial attack. And finally, defensive strategies can be developed, such as limiting the input in time or quantity, validating it beforehand, and pre-training AI models with anticipated adversarial examples. Using multiple models, where the attacker does not know which one will be used, also makes attacks more difficult. Ultimately, this amounts to a cat-and-mouse game between attackers and defenders. The "but" now leads us directly to a preliminary conclusion. 

Conclusion 

Research has shown that adversarial attacks are by no means fiction, but feasible. To the extent that AI systems are accessible to third parties, they are also vulnerable to them. Therefore, as AI systems become more prevalent in everyday life, they could also become a reality. However, AI has already penetrated so far into many areas of our lives that we can no longer stop this aspect of digitalization. We must therefore face up to the challenges and – similar to other security-related issues – develop suitable detection and defense measures. Security authorities and legislators will have to take action in this field, as will companies and organizations. On the other hand, this will create interesting future opportunities for solution providers. New job profiles, such as that of an AI security expert, will emerge. The prospects are exciting. 

Andreas Strunz msg

About the author

Andreas Strunz is director in the area of Change & Transformation at msg for banking ag. In business consulting, he deals with the possible applications of artificial intelligence in the financial sector as well as with strategic future topics in the industry.

Attempt at a definition 

Adversarial attacks are deliberate attacks that use modified data input ("adversarial examples") to manipulate machine learning and, in the narrow sense, deep learning models in the interests of an attacker so that the models either no longer function well or they are transformed to serve the attacker's purposes. The motives behind this are manifold. These could be fraudulent purposes, acts of sabotage or simply the desire to hack. 

What must be stressed, though, is that the attacker must have some kind of access to the system, be this through uploads or downloads, APIs (Application Programming Interface) or real existing objects (e.g., in image recognition). Systems within companies and organizations, whether on premise or in the cloud, must also be protected, but the opportunities for attacks are significantly lower compared with systems where AI is accessible to larger number of users as a service outside of a protected area. 

Evasion attacks, for example, aim to prevent or circumvent an effect desired by AI, such as an AI-supported spam filter. Poisoning is generally referred to when the goal is to contaminate clean data sets. And finally, privacy attacks aim to manipulate digital identities, such as access control. 

A distinction is also made between white box and black box attacks. In the former case, the attacker knows the data set and / or the AI model used, which is not the case for the latter. However, attackers can draw conclusions about the model from the returned results and thus refine their accuracy step by step with further attacks. Research has shown that such attacks can apply to various objects, such as text, audio, images, videos and control systems and also in principle to any kind of AI models. 

 

Some examples from research 

By adding a well-composed image noise, images used as input can be manipulated so that the AI now classifies the image as a gibbon instead of a panda. If you put special glasses on a person who was previously correctly recognized by image recognition, the AI now classifies this person as a famous actress. And finally, attempts were made to add markers to a stop sign so that the AI now classified it as a speed limit. These are examples from image processing, but audio and video sequences can also be manipulated so that entire "deep fake" videos can be created with just a few real examples. 

Using the highly simplified example of a neural network, it looks like this: Normally, an NN in the input layer is fed with a vector of numerical values (e.g., the RGB values of an image). Each of these neurons is then transmitted, with a weighting, to one or more hidden layers, where it is recalculated until finally, in the output layer, the AI leads to a classification, e.g., classification as a panda. This will not work well the first time, but if you have training data where you know the result in advance, you try to use back propagation to gradually optimize the weights and an added threshold above which a neuron "fires" so that the AI correctly classifies as large a percentage of the input images as possible during training. Between the delivered and the expected result lies an error whose value is optimized step by step towards a local or absolute minimum, for example using the stochastic gradient descent method. You can think of it as a ball rolling down a slope into the valley. 

In an adversarial attack, on the other hand, the attacker tries to make the ball roll in a different direction, so that pandas are less easily recognized as such and become gibbons instead. This is done by adding noise to an image, which is a slight change in the RGB values. This deviation, also known as the epsilon value, is barely noticeable to the human eye, but it is to a highly sensitive AI. This creates a trade-off between robustness and accuracy. The evaluation of the results of different models of the ImageNet challenge has shown that a model can be trimmed with high accuracy, but thereby reacts more sensitively to changes in the input data, which is at the expense of robustness. This is like comparing a racehorse with a draft horse in terms of the two dimensions robustness and speed. 

Conceivable cases 

Access controls, identity theft and digital fingerprints have already been mentioned above. But also voice bots or voice assistants such as Alexa, Siri and Cortana could also theoretically be manipulated with faked voices. Autonomous driving or the function of CCTV cameras could also be impaired through physical manipulations such as markers or "stealth streetware". Text manipulations are also conceivable, which then affect spam filters or SEO boosting (website ranking). An example for social media trolling already occurred in 2016 when Microsoft published the AI-controlled avatar Tay on Twitter to converse and create personalized profiles via this ChatBot. Targeted troll attacks turned the avatar into a hate speech bot so that Microsoft had to take the avatar offline after just 16 hours and 96,000 tweets. 

Is that now the end of AI? 

To come straight to the point. The answer is: “No, but...”. 

Many of these examples are in the research environment, where special experimental conditions prevail. It also takes a fair amount of specialist knowledge to carry out adversarial attacks, although this is not impossible for intelligence agencies or experienced hackers, for example. Furthermore, parallel to the research results, a number of frameworks and libraries such as cleverhans or IBM Clever Robustness Score have been established that can perform robustness analyses of AI systems and evaluate AI security. This is important for enabling the detection of an adversarial attack. And finally, defensive strategies can be developed, such as limiting the input in time or quantity, validating it beforehand, and pre-training AI models with anticipated adversarial examples. Using multiple models, where the attacker does not know which one will be used, also makes attacks more difficult. Ultimately, this amounts to a cat-and-mouse game between attackers and defenders. The "but" now leads us directly to a preliminary conclusion. 

Conclusion 

Research has shown that adversarial attacks are by no means fiction, but feasible. To the extent that AI systems are accessible to third parties, they are also vulnerable to them. Therefore, as AI systems become more prevalent in everyday life, they could also become a reality. However, AI has already penetrated so far into many areas of our lives that we can no longer stop this aspect of digitalization. We must therefore face up to the challenges and – similar to other security-related issues – develop suitable detection and defense measures. Security authorities and legislators will have to take action in this field, as will companies and organizations. On the other hand, this will create interesting future opportunities for solution providers. New job profiles, such as that of an AI security expert, will emerge. The prospects are exciting. 

Andreas Strunz msg

About the author

Andreas Strunz is director in the area of Change & Transformation at msg for banking ag. In business consulting, he deals with the possible applications of artificial intelligence in the financial sector as well as with strategic future topics in the industry.