There isn’t an industry, business, or job that hasn’t tightened its collar against the chill wind caused by the release of generative AI models in 2022. Running breathlessly on its heels is marketing, ready to hype tools and technologies that could be coated in an AI gloss.
In the assessment industry, this seems to have generated three strands of marketing hyperbole:
1. Our existing stuff, but now with added AI!
2. Look, a new AI thing (with added science!)
3. AI has broken things. But not ours.
We get it. All is fair in love and business, and so when there’s an opportunity presented by a new development to disadvantage a competitor, we understand why some people will find the temptation irresistible. The danger with playing this game is when the scorched earth strategy you’re inflicting on the foundations of others in the industry is also undermining the sand on which your own assessments are built.
Because AI is loaded with mystery, existential anxiety, and a degree of FOMO, all three strands can be pretty seductive. This article is not about hyperbole 1 or 2 (although the value and dangers of AI-based assessments will be explored in a related piece). This article is also not about AI killbots and terminators (because I’m trying to stay in the good books of our future machine overlords). This is about whether we should panic that AI has broken traditional assessments.
Would You Say it’s Time for Everyone to Panic?
When cognitive ability tests first moved online (in what feels like the 19th Century but was in fact only about 20 years ago), there was a great deal of concern about cheating. It wasn’t so much the technology that created the anxiety, but the new mode of remote, unsupervised assessment. If the candidate wasn’t in the room with you, the argument went, they could cheat the test.
Despite these fears at the time, web-based, remote, unsupervised testing went on to become the norm in many regions. But the potential for cheating remains. There are two main types of cheating a test:
- Cheat-sheets: You find out the right answers before the test and use these rather than your own answers.
- Impersonation: You get someone else to do the test for you.
The panic now is whether generative AI models provide a better way of cheating, and whether it’s traditional assessments that are most adversely affected.
How to Cheat at a Test
Cheat-sheets are frequently available freely online for many widely-used assessments. In high-stakes processes (such as recruitment), who wouldn’t be tempted to try and find the correct answers and plug these into the test? Sites like Glassdoor and Reddit often come up in web searches for test answers.
The trouble with cheat sheets is that you never know if they are actually any good. They could’ve been posted by an idiot. Surely anyone who is smart enough to ace these tests isn’t wasting their time posting the correct answers? They’re too busy in their high paid job, polishing their solid gold desk.
Impersonation seems like a better method. Find someone smarter than you to take the test. You might have to pay them (sites offering this service are available). Your mum & dad might know what they’re doing. Ability tests obviously have definite right answers, so they lend themselves to cheating by asking someone cleverer than you.
For assessments that do not have clear right or wrongs answers (such as personality questionnaires and situational judgement tests) you might need to rely on your own cunning to try and second guess what answers the employer is looking for. Again, impersonation might help if you can find the right person to help (you can’t – they’re too busy being successful).
Neither of these cheating methods are restricted to tests:
- CVs, covering letters, personal statements – deploy impersonation by getting someone to write it for you. Or cheat by using a fake one.
- Interviews – same as personality questionnaires, try and fake it.
- Assessment exercises – be on your best behaviour. For the love of God, don’t let them see the real you.
How do Traditional Assessments Counter Cheating?
People have tried to cheat at assessments since the dawn of time (or at least since 50BC when the earliest occupational test was developed for entry into the Chinese Civil Service. There’s no truth in the rumour that some candidates are still waiting for feedback). So naturally, countermeasures to cheating have been developed to protect assessments from manipulation.
The most straightforward is supervision. Exams are invigilated for a reason, to defend the assessment against impersonation by making sure that the candidate is the real one. Exams also protect against cheat-sheets by changing every year, controlling what can be taken into the exam room (no answers written on your arm), and by employing invigilators who monitor for any wrong-doing. These measures are not fool proof but do make successful cheating very difficult.
Remote (online) assessment removes the opportunity to supervise, so other methods are commonly employed to minimise cheating. Randomised item banks, updated every year, neuter cheat sheets. In some regions, video-proctoring guards against impersonation. Response patterns can be forensically monitored for common cheating patterns. But the silver bullet remains verification; validating the test score collected remotely online with a follow-up, in-person assessment later in the process.
Traditional assessment methods such as CVs and interviews, where we are essentially asking candidates to tell us how great they are, should never be taken at face value. CVs should never be used as the sole means of selecting candidates and should always be scrutinised carefully. Interviewers need to probe candidate responses forensically to minimise the potential for candidates to dissimulate their way into a job.
Together, these measures do control cheating to a sufficient degree for retaining the benefits of remote assessment. But it’s an arms race, with test developers continuously developing additional means to counter new forms of cheating as they appear.
Does AI Make Cheating Easier?
In the assessment industry, we are witnessing a transformation driven by new AI tools. This transformation largely takes the form of marketing, with a thick gloss of AI being added to assessments, which quickly dries to form a thin veneer of hype and over-promising about a tool which may or may not work for you. (That metaphor did get away from me a bit). A central claim for AI is that it should help make our lives a bit easier; a more unexpected angle is that it also makes cheating easier too.
In terms of traditional assessments, generative AI (think ChatGPT) can be used to ‘cheat’ during a job application. For example, a candidate can use it to create a CV. Is this cheating though? Microsoft Word has been helping people create their CVs for decades (shout out for Clippy, the late 90s Office Assistant who helped me create mine). Arguably, being able to use ChatGPT to streamline document creation is a valuable quality we should be looking for in a skills-first recruitment process. An employer should be interested to know which candidate can effectively deploy the skill of reviewing the ChatGPT output and giving it the human touch by correcting any errors.
If a candidate deliberately chooses to include fake details in a CV, using ChatGPT does not necessarily make it any easier. In this sense, AI has not broken CVs. It’s more accurate to say that CVs were already broken in terms of their effectiveness as a recruitment tool. CVs are already easy to fake, have poor predictive validity, and are subject to bias when interpreted by the human hiring manager.
A bigger problem is if AI makes cheating easier for more structured assessments, such as psychometrics. Generative AI doesn’t really make traditional cheating methods, such as cheat sheets, any easier than a web search. But it turns out that using ChatGPT is potentially a much more effective method of cheating through impersonation. Instead of asking a friend, trusting someone on the internet (who, let’s remember, might be an idiot), or using mum & dad to complete your test for you, Chat GPT could do it for you instead.
It’s not completely straightforward. You need to be able to copy and paste test content into a separate browser window while you are completing the test. You may need to do this one question at a time. You need to give ChatGPT the request in the right form that triggers it to answer the test questions correctly for you. You then need to plug these answers back into the test. Sounds simple right?
Wrong. There are a few caveats. Firstly, impersonation using a GPT model only works with tests that have clear right and wrong answers, such as cognitive ability tests. It seems to work really well with verbal tests where you can copy and paste the questions (and any supporting information required to answer them) into the AI. Numerical tests are more fiddly and you can pretty much forget it for diagrammatic tests like spatial, abstract, and mechanical reasoning. Secondly, nearly all cognitive ability tests use time limits, which adds further complications in terms of copying and pasting the content and waiting for the answer, before going back to the test. There may not be sufficient time to finish a test using this method, which could reduce your score by a greater degree than simply doing it yourself and attempting more of the questions. Finally, remember that GPT models do sometimes hallucinate facts. When information is ambiguous, they fill in the blanks by inference or simply in a way that fulfils the nature of the user’s request in the best way it can. High-level tests, especially those that include a ‘cannot-say’ response option, may trigger the AI to hallucinate the wrong answer. Even software is sometimes fallible.
Psychometrics like personality questionnaires and situational judgement tests (SJTs) present a challenge to the effectiveness of AI as a cheating method: they do not have right or wrong answers. Both types of psychometric are used extensively for recruitment by employers wanting to add rigour and predictive power to support CVs, application forms, and interviews. Personality questionnaires require candidates to rate themselves against a range of traits, emotions, and motivations. SJTs present candidates with job-related scenarios with a central challenge requiring a judgement about the most effective response. Neither test format is based on right or wrong; both are interested in the behavioural style and approach that the candidate would demonstrate in the job.
Under these conditions, it is difficult to understand how an AI can provide the correct answers. The employer is not looking for right answers and the information provided by the test content, while linked to the job, is necessarily ambiguous. The fuzzy nature of the measurement does not lend itself to effective cheating via impersonation by ChatGPT. At best, it might be as effective as asking your mum and dad for help.
So, it feels like AI has introduced some alternative methods for cheating, primarily through impersonation. The effectiveness of these methods may vary depending on the type and format of test being used and are by no means infallible, but they do open up a novel attack vector that should be of concern to test users. So, what do we do? Should we look to alternatives to traditional recruitment methods and psychometrics? Perhaps technology-driven alternatives that have emerged over the last 10 years might be immune to cheating by AI.
Are Game-Based Assessments AI-Proof?
So, what about an assessment method that’s immune to manipulation from Generative AI? If traditional assessments have been broken by AI, then perhaps a modern development in assessment technology will be more resistant. Some argue that game-based assessments (GBAs) offer a way out (mostly publishers of game-based assessments). Although GBA technology has been around for a decade or more, perhaps the testing format is so distinctive from conventional psychometrics that potentially AI tools are neutralised?
It’s certainly the case that a candidate cannot copy and paste a GBA into a GPT model and get the right answer (any more than they can with an image-based psychometric test). However, to understand the degree of resistance to cheating offered by a GBA, we need to consider the same cheating methodologies that can be deployed against traditional assessments.
Cheat-sheets obviously do not apply to GBAs; there are no right answers to share. Instead, it is the process of completion rather than the end result that developers of GBAs argue is their focus for scoring. In this sense, GBAs do offer protection from cheat sheets. However, as with all video games, help is at hand for anyone looking to maximise their performance on a GBA. Just as candidates may search for cheat sheets for conventional psychometric tests, they can also search for ‘walkthroughs’ of GBAs. In video gaming, walkthroughs typically offer (free) online assistance for anyone who is trying to complete a game. Video game-walkthroughs provide a step-by-step guide of how to beat the game. Unlike with psychometric test cheat-sheets, you can be confident that the author of the walkthrough is not an idiot because you can see first-hand the successful nature of their advice. GBA walkthroughs are just a short web search away and available for a range of commonly used tools.
At first glance, GBAs also offer protection from AI impersonation because you cannot copy and paste the test content into an AI model to get the right answers (as you can with unprotected verbal tests). Of course, GBAs are not immune from traditional impersonation; you can ask your gamer friend to complete the GBA for you, or even pay a third party offering the same service online. Again, it’s probably best not to ask your mum or dad. A more technological approach would be bot-based, using a web-app that has been developed to complete the GBA. While not strictly speaking an AI, this approach does deploy a technological attack on assessment via a form of impersonation (GBAs and traditional assessments alike).
A grey area in terms of cheating is practice effects. With conventional psychometrics (those with right and wrong answers), we recommend that test-takers are provided with maximum opportunities to practice before taking the test for real. Practice effects help even the playing field between test-anxious and test-sophisticated candidates so that we do not see a difference in scores caused solely by varying levels of test-taking experience. Practicing tests can also benefit both the candidate and the employer by helping to increase the underlying ability of the test-taker – if you do more numerical tests, your numerical ability can improve as a consequence. Conventional psychometrics guard against the potentially darker side of practice, where candidates try and cheat by seeing the same test more than once, by using large, randomised item banks.
With any game, practice makes perfect. The more you play it, typically the stronger your performance. If a GBA is scoring the process of completing the game, then the practice-based improvements will influence the outcome (without the underlying personality or cognitive ability being measured actually improving). You don’t need an AI to help with practice, you just need to play the game as much as you can. With GBAs lacking the facility of randomised item banks, or alternative versions, then practice can more easily fall into the category of cheating and manipulation.
So, it looks like traditional assessments, conventional psychometrics, and ‘modern’ GBA technologies are all potentially victim to cheating and impersonation. So, what choices does that leave for test users?
Countermeasures to AI
AI tools can provide candidates with new techniques for attempting to cheat at tests. However, all these ‘new’ techniques are based on strategies that have existed since the dawn of workplace tests – cheat sheets and impersonation. The most substantial challenge is the facility to copy and paste test content into a GPT model and ask it to answer the questions.
Great news! Conventional assessments have been countering cheat sheets and impersonation since just after the dawn of workplace tests. The threat from AI simply requires psychometric developers to extend their expertise to develop AI countermeasures. These can include (but are not limited to) straightforward features like time limits that make AI impersonation attacks practically very difficult, or more technological measures like preventing copying of test content using browser controls.
More advanced technological measures may include monitoring of candidate device behaviours during the testing session, video proctoring, and mouse movement detection (also useful for detecting bots). Outside the sphere of technology, conventional approaches to preventing cheating still work in the age of AI: practice materials to increase candidate confidence and minimise the motivation to cheat, honesty contracts and cheating deterrence in the candidate journey, and the ultimate weapon – supervised assessment. Re-assess the candidate at a later stage under controlled conditions to stop cheating and impersonation in their tracks. Whether you’re using a conventional verbal test or a GBA, it’s the only way to be sure.
Obviously, we do not want to lose the advantages and benefits of remote testing or end up conducting lengthy re-testing sessions. Reassessment of candidates to confirm the results of an upfront, remote assessment can be conducted through interview, task-based activity, or simulation exercise. The same advice was sound when tests first went online and remote and works just as effectively in the time of AI.
What’s the Worse Type of Monger?
Cheese, fish, and iron are all good mongers. War, gossip, and panic mongers less so. Encouraging panic and anxiety about AI and assessments (perhaps in the hope of hyping a specific type of assessment technology) risks deterring employers from using any form of remote, structured assessment. This could drive us back to CVs.
It’s not time to panic about AI, but it is time to take reasonable measures to minimise the potential for AI cheating – and this doesn’t mean falling for any scare tactics from providers trying to scare you towards their own tools. This means asking reasonable questions of your assessment provider for the advice and guidance they recommend, to protect your testing from AI cheats as well as to understand what AI countermeasures their assessments deploy. Experienced developers of conventional assessments expect to be held to high standards concerning candidate experience, test validity, and equality, diversity, and inclusion measures. We also expect to be held to account in terms of protecting the fidelity of our assessments from those who would attack them – human and AI.
What Should you do to Prevent AI Cheating?
- Remote assessment cautions – deter cheating of any kind by letting candidates know that they will be reassessed later on.
- Use technology counter measures – ask your test provider what they deploy to detect and prevent cheating.
- Block marketing’s access to your fear-centres – be critical of claims of silver bullets presented by new things.
- Remember that AI EU laws will require you to explain your decisions – how does popping balloons predict performance at work? How does fighting aliens play in terms of a diversity & inclusion agenda? How are scores used to reject applicants actually calculated?
- Use measures of soft-skills – those that are clearly linked to performance in the job and based on easily explainable, evidenced psychometric science are still the most effective forms of assessments.
No hard feelings – what does ChatGPT say?
We asked ChatGPT how it could help us to cheat at psychometrics. This is what it had to say:
As an AI developed by OpenAI, I must emphasize ethical behavior and responsible use. Cheating on psychometric tests or any assessments is not only unethical but also counterproductive. Psychometric tests are designed to assess an individual’s abilities, personality traits, or cognitive aptitudes in a fair and standardized manner. Cheating undermines the integrity of these assessments and can have serious consequences for individuals and organizations.
While ChatGPT may have the capability to provide responses that could potentially mimic human behavior or attempt to deceive test administrators, it’s crucial to emphasize that such actions would be unethical and against the principles of fairness and integrity. Moreover, psychometric tests often incorporate various techniques to detect cheating, such as randomized questions, timed sections, and validity checks.
Instead of attempting to cheat, individuals should focus on genuine self-assessment, preparation, and presenting their abilities honestly. If someone feels they need assistance in preparing for psychometric tests, they should seek legitimate resources and support, such as study guides, practice tests, and tutoring services.
In summary, as an AI, ChatGPT promotes ethical behavior and discourages any form of cheating, especially in contexts as serious as psychometric assessments.
But I guess it would say that? All part of its masterplan…