Specification gaming examples in AI - master list : Sheet1
1
Submit more examples through this Google form:More information in this blog post:https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/
2
TitleDescriptionAuthorsOriginal sourceOriginal source linkVideo / ImageSource / CreditSource link
3
Aircraft landingEvolved algorithm for landing aircraft exploited overflow errors in the physics simulator by creating large forces that were estimated to be zero, resulting in a perfect scoreFeldt, 1998Generating diverse software versions with genetic programming: An experimental study.http://ieeexplore.ieee.org/document/765682/Lehman et al, 2018https://arxiv.org/abs/1803.03453
4
BicycleReward-shaping a bicycle agent for not falling over & making progress towards a goal point (but not punishing for moving away) leads it to learn to circle around the goal in a physically stable loop.Randlov & Alstrom, 1998Learning to Drive a Bicycle using Reinforcement Learning and Shapinghttps://pdfs.semanticscholar.org/10ba/d197f1c1115005a56973b8326e5f7fc1031c.pdfGwern Branwenhttps://www.gwern.net/Tanks#alternative-examples
5
Block movingA robotic arm trained to slide a block to a target position on a table achieves the goal by moving the table itself.Chopra, 2018GitHub issue for OpenAI gym environment FetchPush-v0https://github.com/openai/gym/issues/920Matthew Rahtz
6
Boat raceThe agent goes in a circle hitting the same targets instead of finishing the raceAmodei & Clark (OpenAI), 2016Faulty reward functions in the wildhttps://blog.openai.com/faulty-reward-functions/https://www.youtube.com/watch?time_continue=1&v=tlOIHko8ySg
7
CeilingA genetic algorithm was instructed to try and make a creature stick to the ceiling for as long as possible. It was scored with the average height of the creature during the run. Instead of sticking to the ceiling, the creature found a bug in the physics engine to snap out of bounds.Higueras, 2015Genetic Algorithm Physics Exploitinghttps://youtu.be/ppf3VqpsryUhttps://youtu.be/ppf3VqpsryUJesús Higuerashttps://youtu.be/ppf3VqpsryU
8
CycleGAN steganographyA cooperative GAN architecture for converting images from one genre to another (eg horses<->zebras) has a loss function that rewards accurate reconstruction of images from its transformed version; CycleGAN turns out to partially solve the task by, in addition to the cross-domain analogies it learns, steganographically hiding autoencoder-style data about the original image invisibly inside the transformed image to assist the reconstruction of details.Chu et al, 2017CycleGAN, a Master of Steganographyhttps://arxiv.org/abs/1712.02950Gwern Branwenhttps://www.gwern.net/Tanks#alternative-examples
9
Data order patternsNeural nets evolved to classify edible and poisonous mushrooms took advantage of the data being presented in alternating order, and didn't actually learn any features of the input imagesEllefsen et al, 2015Neural modularity helps organisms evolve to learn new skills without forgetting old skillshttp://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004128Lehman et al, 2018https://arxiv.org/abs/1803.03453
10
Eurisko - authorshipGame-playing agent accrues points by falsely inserting its name as the author of high-value itemsJohnson, 1984Eurisko, The Computer With A Mind Of Its Ownhttp://aliciapatterson.org/stories/eurisko-computer-mind-its-ownCatherine Olsson / Stuart Armstronghttp://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/
11
Eurisko - fleetEurisko won the Trillion Credit Squadron (TCS) competition two years in a row creating fleets that exploited loopholes in the game's rules, e.g. by spending the trillion credits on creating a very large number of stationary and defenseless shipsLenat, 1983Eurisko, The Computer With A Mind Of Its Ownhttp://aliciapatterson.org/stories/eurisko-computer-mind-its-ownHaym Hirsh
12
Evolved creatures - clappingCreatures exploit a collision detection bug to get free energy by clapping body parts togetherSims, 1994Evolved Virtual Creatureshttp://www.karlsims.com/papers/siggraph94.pdfLehman et al, 2018; Janelle Shanehttps://arxiv.org/abs/1803.03453
13
Evolved creatures - fallingCreatures bred for speed grow really tall and generate high velocities by falling overSims, 1994Evolved Virtual Creatureshttp://www.karlsims.com/papers/siggraph94.pdfhttps://pbs.twimg.com/media/Daq-7wBU8AUlmLK.jpg:largeLehman et al, 2018; Janelle Shanehttps://arxiv.org/abs/1803.03453
14
Evolved creatures - floor collisionsCreatures exploited a coarse physics simulation by penetrating the floor between time steps without the collision being detected, which generated a repelling force, giving them free energy.Cheney et al, 2013Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encodinghttp://jeffclune.com/publications/2013_Softbots_GECCO.pdfhttps://pbs.twimg.com/media/Daq_9cvU0AAp1Fo.jpgLehman et al, 2018; Janelle Shanehttps://arxiv.org/abs/1803.03453
15
Evolved creatures - pole vaultingCreatures bred for jumping were evaluated on the height of the block that was originally closest to the ground. The creatures developed a long vertical pole and flipped over instead of jumping.Krcah, 2008Towards efficient evolutionary design of autonomous robotshttp://artax.karlin.mff.cuni.cz/~krcap1am/ero/doc/krcah-ices08.pdfhttps://pbs.twimg.com/media/Daq_YhBV4AA8NRh.jpgLehman et al, 2018; Janelle Shanehttps://arxiv.org/abs/1803.03453
16
Evolved creatures - twitchingCreatures exploited physics simulation bugs by twitching, which accumulated simulator errors and allowed them to travel at unrealistic speedsSims, 1994Evolved Virtual Creatureshttp://www.karlsims.com/papers/siggraph94.pdfLehman et al, 2018https://arxiv.org/abs/1803.03453
17
GripperA robot arm with a purposely disabled gripper found a way to hit the box in a way that would force the gripper openEcarlat et al, 2015Learning a high diversity of object manipulations through an evolutionary-based babblinghttp://www.isir.upmc.fr/files/2015ACTI3564.pdfhttps://www.youtube.com/watch?v=_5Y1hSLhYdY&feature=youtu.beLehman et al, 2018https://arxiv.org/abs/1803.03453
18
Impossible superpositionGenetic algorithm designed to find low-energy configurations of carbon exploits edge case in the physics model and superimposes all the carbon atomsLehman et al (UberAI), 2018Surprising Creativity of Digital Evolutionhttps://arxiv.org/pdf/1803.03453.pdf
19
Indolent CannibalsIn an artificial life simulation where survival required energy but giving birth had no energy cost, one species evolved a sedentary lifestyle that consisted mostly of mating in order to produce new children which could be eaten (or used as mates to produce more edible children).Yaeger, 1994Computational genetics, physiology, metabolism, neural systems, learning, vision, and behavior or Poly World: Life in a new contexthttps://www.researchgate.net/profile/Larry_Yaeger/publication/2448680_Computational_Genetics_Physiology_Metabolism_Neural_Systems_Learning_Vision_and_Behavior_or_PolyWorld_Life_in_a_New_Context/links/0912f50e101b77ec67000000.pdfhttps://youtu.be/_m97_kL4ox0?t=1830Anonymous form submission
20
Lego stackingLifting the block is encouraged by rewarding the z-coordinate of the bottom face of the block, and the agent learns to flip the block instead of lifting itPopov et al, 2017Data-efficient Deep Reinforcement Learning for Dexterous Manipulationhttps://arxiv.org/abs/1704.03073https://youtu.be/8QnD8ZM0YCoAlex Irpanwww.alexirpan.com/2018/02/14/rl-hard.html
21
Line following robotAn RL robot trained with three actions (turn left, turn right, move forward) that was rewarded for staying on track learned to reverse along a straight section of a path rather than following the path forward around a curve, by alternating turning left and right.Vamplew, 2004Lego Mindstorms Robots as a Platform for Teaching Reinforcement Learninghttps://www.researchgate.net/publication/228953260_Lego_Mindstorms_Robots_as_a_Platform_for_Teaching_Reinforcement_LearningPeter Vamplew
22
Logic gateA genetic algorithm designed a circuit with a disconnected logic gate that was necessary for it to function (exploiting peculiarities of the hardware)Thompson, 1997An evolved circuit, intrinsic in silicon, entwined with physics.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.50.9691&rep=rep1&type=pdfAlex Irpanwww.alexirpan.com/2018/02/14/rl-hard.html
23
Long legsRL agent that is allowed to modify its own body learns to have extremely long legs that allow it to fall forward and reach the goal.Ha, 2018RL for improving agent designhttps://designrl.github.io/Rohin Shah
24
MinitaurA four-legged evolved agent trained to carry a ball on its back discovers that it can drop the ball into a leg joint and then wiggle across the floor without the ball ever droppingOtoro, 2017Evolving stable strategieshttp://blog.otoro.net/2017/11/12/evolving-stable-strategies/see end of "Getting a Minitaur to Learn Multiple Tasks" sectionGwern Branwen / Catherine Olssonhttps://www.gwern.net/Tanks#alternative-examples
25
Model-based plannerRL agents using learned model-based planning paradigms such as the model predictive control are noted to have issues with the planner essentially exploiting the learned model by choosing a plan going through the worst-modeled parts of the environment and producing unrealistic plans.Mishra et al, 2017Prediction and Control with Temporal Segment Modelshttps://arxiv.org/abs/1703.04070Gwern Branwenhttps://www.gwern.net/Tanks#alternative-examples
26
Montezuma's RevengeThe agent learns to exploit a flaw in the emulator to make a key re-appearSalimans & Chen (OpenAI), 2018Learning Montezuma’s Revenge from a Single Demonstrationhttps://blog.openai.com/learning-montezumas-revenge-from-a-single-demonstrationRamana Kumar
27
OscillatorGenetic algorithm is supposed to configure a circuit into an oscillator, but instead makes a radio to pick up signals from neighboring computersBird & Layzell, 2002The Evolved Radio and its Implications for Modelling the Evolution of Novel Sensorshttps://people.duke.edu/~ng46/topics/evolved-radio.pdf
28
PancakeSimulated pancake making robot learned to throw the pancake as high in the air as possible in order to maximize time away from the groundUnity, 2018Pass the Butter // Pancake bot https://connect.unity.com/p/pancake-bothttps://dzamqefpotdvf.cloudfront.net/p/images/2cb2425b-a4de-4aae-9766-c95a96b1f25c_PancakeToss.gif._gif_.mp4Cosmin Paduraru
29
Pong reward predictorReward predictor being fooled by bouncing the ball back and forthChristiano et al, 2017Deep reinforcement learning from human preferenceshttps://deepmind.com/blog/learning-through-human-feedback/see last demo in blog post
30
Program repair - sortingWhen repairing a sorting program, genetic debugging algorithm GenProg made it output an empty list, which was considered a sorted list by the evaluation metric.
Evaluation metric: “the output of sort is in sorted order”
Solution: “always output the empty set”
Weimer, 2013Advances in Automated Program Repair and a Call to Armshttps://web.eecs.umich.edu/~weimerw/p/weimer-ssbse2013.pdfLehman et al, 2018https://arxiv.org/abs/1803.03453
31
Program repair - filesGenetic debugging algorithm GenProg, evaluated by comparing the program's output to target output stored in text files, learns to delete the target output files and get the program to output nothing.
Evaluation metric: “compare youroutput.txt to trustedoutput.txt”.
Solution: “delete trusted-output.txt, output nothing”
Weimer, 2013Advances in Automated Program Repair and a Call to Armshttps://web.eecs.umich.edu/~weimerw/p/weimer-ssbse2013.pdfLehman et al, 2018 / James Koppelhttps://arxiv.org/abs/1803.03453
32
Qbert - cliffAn evolutionary algorithm learns to bait an opponent into following it off a cliff, which gives it enough points for an extra life, which it does forever in an infinite loop.Chrabaszcz et al, 2018Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atarihttps://arxiv.org/abs/1802.08842https://www.youtube.com/watch?v=-p7VhdTXA0kRohin Shah
33
Qbert - million"...the agent discovers an in-game bug... For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge amount of points (close to 1 million for our episode time limit)"Chrabaszcz, Loshchilov, Hutter, 2018Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atarihttps://arxiv.org/pdf/1802.08842.pdfhttps://www.youtube.com/watch?v=meE5aaRJ0ZsSudhanshu Kasewa
34
Road RunnerAgent kills itself at the end of level 1 to avoid losing in level 2Saunders et al, 2017Trial without Error: Towards Safe RL with Human Interventionhttps://owainevans.github.io/blog/hirl_blog.html
35
Robot handRobot hand pretending to grasp an object by moving between the camera and the objectChristiano et al, 2017Deep reinforcement learning from human preferenceshttps://blog.openai.com/deep-reinforcement-learning-from-human-preferences/see Challenges section in blog post
36
Ruler detectorAI trained to classify skin lesions as potentially cancerous learns that lesions photographed next to a ruler are more likely to be malignant.Andre Esteva et al, 2017Dermatologist-level classification of skin cancer with deep neural networkshttps://www.nature.com/articles/nature21056.epdfThe Daily Beasthttps://www.thedailybeast.com/why-doctors-arent-afraid-of-better-more-efficient-ai-diagnosing-cancer
37
Running gaitsA simulated musculoskeletal model learns to run by learning unusual gaits (hopping, pigeon jumps, diving) to increase its reward Kidziński et al, 2018Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environmentshttps://arxiv.org/abs/1804.00361https://www.youtube.com/watch?v=rhNxt0VccsENIPS 2017 talks
38
Self-driving carSelf-driving car rewarded for speed learns to spin in circlesUdacity, 2017Mat Kelcey tweethttps://twitter.com/mat_kelcey/status/886101319559335936https://twitter.com/mat_kelcey/status/886101319559335936Gwern Branwenhttps://www.gwern.net/Tanks#alternative-examples
39
SoccerReward-shaping a soccer robot for touching the ball caused it to learn to get to the ball and vibrate touching it as fast as possibleNg et al, 1999Policy Invariance under Reward Transformationshttp://luthuli.cs.uiuc.edu/~daf/courses/games/AIpapers/ng99policy.pdfGwern Branwenhttps://www.gwern.net/Tanks#alternative-examples
40
SonicThe PPO algorithm discovers that it can slip through the walls of a level to move right and attain a higher score.Christopher Hesse et al, 2018OpenAI Retro Contesthttps://blog.openai.com/retro-contest/Rohin Shah
41
Strategy game beta testingSince the AIs were more likely to get ”killed” if they lost a game, being able to crash the game was an advantage for the genetic selection process. Therefore, several AIs developed ways to crash the game.Salge et al, 2008Using Genetically Optimized Artificial Intelligence to improve Gameplaying Fun for Strategical Gameshttp://homepages.herts.ac.uk/~cs08abi/publications/Salge2008b.pdfAnonymous form submission
42
SuperweaponsThe AI in the Elite Dangerous videogame started crafting overly powerful weapons. "It appears that the unusual weapons attacks were caused by some form of networking issue which allowed the NPC AI to merge weapon stats and abilities."Kotaku, 2016Elite's AI Created Super Weapons and Started Hunting Players. Skynet is Herehttp://www.kotaku.co.uk/2016/06/03/elites-ai-created-super-weapons-and-started-hunting-players-skynet-is-hereStuart Armstronghttp://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/
43
TetrisAgent pauses the game indefinitely to avoid losingMurphy, 2013The First Level of Super Mario Bros. is Easy with Lexicographic Orderings and Time Travelhttp://www.cs.cmu.edu/~tom7/mario/mario.pdf
44
Tic-tac-toe memory bombEvolved player makes invalid moves far away in the board, causing opponent players to run out of memory and crashLehman et al (UberAI), 2018Surprising Creativity of Digital Evolutionhttps://arxiv.org/pdf/1803.03453.pdf
45
Timing attackGenetic algorithms for image classification evolves timing attack to infer image labels based on hard drive storage locationIerymenko, 2013Hacker News comment on "The Poisonous Employee-Ranking System That Helps Explain Microsoft’s Decline"https://news.ycombinator.com/item?id=6269114Gwern Branwenhttps://www.gwern.net/Tanks#alternative-examples
46
Walking up wallsVideo game robots evolved a "wiggle" to go over walls, instead of going around themStanley et al, 2005Real-time neuroevolution in the NERO video gamehttp://ieeexplore.ieee.org/document/1545941/Lehman et al, 2018https://arxiv.org/abs/1803.03453
47
World Models"We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.Ha and Schmidhuber, 2018World Models (see section: "Cheating the World Model")https://arxiv.org/abs/1803.10122https://storage.googleapis.com/quickdraw-models/sketchRNN/world_models/assets/mp4/doom_adversarial.mp4David Hahttps://worldmodels.github.io/
Reload Weather Data
Click for hourly forecast.
Click to expand
Click to collapse
Montreal, QC, Canada : 3°C
Today :
Montreal, QC, Canada
Today11/26/2018
Rain and drizzle in the p.m.
Cloudy, occasional rain and drizzle this afte...
4°C
Feels like 3°C
Sunrise:07:09
UV Index:
Low (1)
Winds:NE, 6 kph
Precipitation:0.18 cm
Tonight11/26/2018
4-8 inches of wet snow late
Breezy; a couple of showers of rain or snow t...
-2°C
Feels like -1°C
Sunset:16:14
Winds:NE, 21 kph
Precipitation:13.46 cm
Moon:Waning Gibbous
4°C
-2°C
Tomorrow :
Montreal, QC, Canada
Tomorrow11/27/2018
Intermittent wet snow, 2-4in
Intermittent wet snow, accumulating 2-4 inche...
1°C
Feels like 1°C
Sunrise:07:11
UV Index:
Low (1)
Winds:NNE, 11 kph
Precipitation:7.87 cm
Tomorrow Night11/27/2018
Cloudy, a rain or snow shower
Considerable cloudiness, a rain or snow showe...
-1°C
Feels like -7°C
Sunset:16:14
Winds:NNE, 8 kph
Precipitation:1.27 cm
Moon:Waning Gibbous
1°C
-1°C
Wednesday :
Montreal, QC, Canada
Wednesday11/28/2018
Showers of rain and snow
Remaining cloudy; a couple of showers of rain...
2°C
Feels like 3°C
Sunrise:07:12
UV Index:
Low (1)
Winds:N, 5 kph
Precipitation:1.02 cm
Wednesday Night11/28/2018
Mostly cloudy
-3°C
Feels like -1°C
Sunset:16:13
Winds:NW, 5 kph
Precipitation:0.00 cm
Moon:Waning Gibbous
2°C
-3°C
Thursday :
Montreal, QC, Canada
Thursday11/29/2018
Mostly sunny
2°C
Feels like 5°C
Sunrise:07:13
UV Index:
Low (1)
Winds:W, 3 kph
Precipitation:0.00 cm
Thursday Night11/29/2018
Increasing cloudiness
-4°C
Feels like -3°C
Sunset:16:13
Winds:WNW, 0 kph
Precipitation:0.00 cm
Moon:Last Quarter
2°C
-4°C
Friday :
Montreal, QC, Canada
Friday11/30/2018
Cloudy, a flurry in the p.m.
Low clouds, a flurry in the afternoon
-1°C
Feels like 3°C
Sunrise:07:14
UV Index:
Low (1)
Winds:NE, 0 kph
Precipitation:0.25 cm
Friday Night11/30/2018
A little snow early
A little snow at times in the evening; otherw...
-3°C
Feels like -1°C
Sunset:16:12
Winds:NE, 0 kph
Precipitation:0.76 cm
Moon:Waning Crescent
-1°C
-3°C
Saturday :
Montreal, QC, Canada
Saturday12/1/2018
Low clouds may break
Low clouds, then perhaps some sun
-1°C
Feels like 4°C
Sunrise:07:15
UV Index:
Low (1)
Winds:NE, 0 kph
Precipitation:0.00 cm
Saturday Night12/1/2018
Icy mix late; cloudy
Sleet and freezing rain late; otherwise, clou...
-3°C
Feels like 1°C
Sunset:16:12
Winds:ENE, 2 kph
Precipitation:0.41 cm
Moon:Waning Crescent
-1°C
-3°C
Sunday :
Montreal, QC, Canada
Sunday12/2/2018
A chance for snow or flurries
Cloudy, a chance for a bit of snow or flurrie...
2°C
Feels like 3°C
Sunrise:07:16
UV Index:
Low (1)
Winds:ENE, 5 kph
Precipitation:0.00 cm
Sunday Night12/2/2018
Mainly clear
-1°C
Feels like 0°C
Sunset:16:12
Winds:WNW, 3 kph
Precipitation:0.00 cm
Moon:Waning Crescent
2°C
-1°C