The Rise of the Bots: Weeding Out Bot Responses Once They’ve Infiltrated Your Data

At C1C, we’re all about transparency. And in a moment of transparency, we have to admit that we have learned several lessons about just how sophisticated AI bots have become and the impact this has on using incentive-based surveys online. Bots have long been a nuisance for online incentive-based survey but, in the past, they were relatively easy to block with simple CAPTCHAs (the good old “Check this box if you're not a robot”) or with a few basic data checks. But with the rapid evolution of artificial intelligence, today’s bots are much more advanced. So, what do you do if you think your survey has been overtaken by bots? Here are some key warning signs of fraudulent responses.

With the rapid evolution of artificial intelligence, today’s bots are much more advanced.

1.     A Sudden Influx of Responses. If your survey starts with a trickle of responses and suddenly explodes with hundreds or thousands in a short span, that is a very quick tell that you have some fraudulent responses on your hands. Once bots or fraudulent respondents find your survey, they can flood it with responses incredibly fast. As tempting as it is to get excited about the high response volume, a spike like this should give you pause.

2.     Very Short Completion Times. Most survey platforms collect the start time and end time of a survey response. You can use this to calculate the time it took for the response to be completed. You should have an estimate for how long a legitimate response by a real person would take to complete your survey. If you’re seeing submissions in less than a third of that time, it’s likely not just a fast reader—it could be a bot.

3.     Geotags Outside Your Targeted Population. Survey platforms usually capture respondent geolocation, often showing the city, state, and country. If you’re targeting a local population and suddenly see responses pouring in from places you would not expect, that’s suspicious. Even large numbers of responses from unexpected U.S. states could be cause for concern. While VPNs can mask true locations, a pattern of foreign responses is worth investigating.

4.     Duplicate IP Addresses or Location Coordinates. In addition to geolocation, most platforms collect IP addresses and even latitude/longitude data. If you notice multiple responses coming from the same IP or physical location, it could mean one person, or a bot, is submitting the survey repeatedly from a single device.

5.     Unusual Rating Patterns. If your survey includes rating scales, you can examine the ratings for erratic or random patterns, such as jumping from 1 to 10 to 3 to 8 with no clear logic. In most cases, genuine responses will follow a somewhat consistent pattern. For example, a person frustrated with the subject of the survey will typically select ratings on the lower end of the rating scale with the occasional higher rating. Conversely, a satisfied respondent will lean toward the higher end, while those with mixed opinions will cluster toward the center. Large fluctuations in the ratings are a strong indicator of a bot or fraudulent respondents just clicking randomly.

6.     Repeated Open-Ended. For your open-ended responses, you can search for duplicate answers across multiple respondents. Many fraudulent responders will copy the same answer into the survey across numerous entries. Identical, or nearly identical, full-sentence or multi-sentence responses are a telltale sign of fraud.

7.     Nonsensical or AI-Generated Text. This could be responses that clearly don’t match the question (e.g., answering “No” to “What county do you live in?”), complete gibberish being entered, lorem ipsum text, extremely generic or third person text when a first person account was requested, or text clearly generated from AI tools like ChatGPT. At times, we have found responses that match AI-generated content word for word. While some of the suspicious text is very obvious, others can be more subtle. As AI continues to advance, it takes careful examination to identify these suspicious responses.

8.     Failed Attention Checks. In a survey, you may use attention checks to try to catch potential bot responses. This could be asking the same question twice at two different points of the survey, asking them to enter their birth date early in the survey and then their age later in the survey, or other types of questions designed to catch someone who may be answering randomly. If you have these attention checks in your survey, make sure the responses align with one another. If they don’t, you likely have a bot response.

9.     Email Structure. One of the last things we look at is email addresses. Bot responses will often generate an email that is a series of random letters and numbers. If you see an email that is a few letters followed by seven to nine numbers, this is a sign of a bot. You may also look for suspicious domain names or an overuse of outdated domains.

 

While each of these points alone may not be an obvious sign of a bot or fraudulent response, several taken together can paint a clear picture of whether a response is fraudulent or not. We strongly recommend monitoring your responses closely, so you can catch and address any issues before your legitimate data becomes overwhelmed with illegitimate data.  Given the level of sophistication we’re now seeing, it’s more important than ever to be thoughtful about how you distribute your survey. A little caution upfront can save a lot of trouble down the line. However, if you do end up with some potential bot responses, we recommend using the elements above to flag illegitimate data as a part of the data cleaning process. If you’d like more tips or need help creating or distributing a survey, feel free to reach out!

Next
Next

Smarter Design, Happier Users: Four Ways You Can Use Cognitive Science to Improve Your Data Systems