How to Bypass AI Content Detection

How to Bypass AI Content Detection

By Nick Wallace

If you've written something with AI for your website and you're now trying to figure out if it will get flagged, you might be asking yourself the wrong question. I'd recommend actually starting with knowing if AI content is even bad for SEO. But let's answer it anyway and see what ways there are to bypass these AI content detection tools.

A note before we start: The prompts in this article won't guarantee a 0% score on every AI detector. Firstly, you can't fully control what AI writes regardless of your input. Secondly, the point isn't just to bypass it is to improve. Gibberish can bypass a AI detector, but then what goo is that going to do your readers.

The goal here is to create content that's actually more readable-and happens to avoid the patterns detectors flag.

We recommend reading the full article to understand why detection is unreliable, but if you just want the prompts we recommend to improve your AI writing, then jump to "The Prompts We Use".

How AI Actually Writes

To understand detection, you need to understand content generation.

Large language models don't think. They don't know what they are writing.

They work by predicting the next word based on probability given the words that came before. Given "The sun is shining in the..." the model assigns probabilities to every possible next word. "Sky" gets a high probability. "Painting" gets a low one. The model picks from these weighted options, typically favouring the more likely choices.

This is of course massively simplified, but you get the idea.

AI writing is therefore always leaning toward the average. It's not trying to be creative or surprising. It's trying to be statistically reasonable. Every word choice is what's most probable given everything that came before.

The result is text that's competent but predictable. No rough edges, no unexpected turns, no personality quirks. It reads like the weighted average of everything the model has seen (which, it is!).

How Detectors Try to Catch AI Content

AI detectors usually rely on two main metrics:

Perplexity

Perplexity measures how "surprised" a language model would be by the text. Using our previous example, if you write "The sun is shining in the sky," that's low perplexity. Almost every person reading this would have finished that sentence with “sky”.

If instead you write "The sun is shining in the refrigerator," well then that's another story. This is high perplexity.

AI writing usually has low perplexity because AI always picks probable words. Human writing has higher perplexity because humans are more unpredictable.

Burstiness

Burstiness measures variation in sentence structure and length throughout a document. Humans write unevenly. A long complex sentence followed by a short punchy one, then medium, then long again. AI tends to write more uniformly, like a metronome.

The theory: Low perplexity plus low burstiness equals AI. High perplexity plus high burstiness equals human.

Testing the Theory

We gave ChatGPT a simple prompt:

"Write me a few paragraphs, around 1000 words about coffee brewing at home"

Generic AI prompt
I definitely don't recommend using this prompt!

The output was classic AI. Smooth, balanced, em-dashes everywhere, no personality:

"Brewing coffee at home is both a daily ritual and a craft that rewards curiosity. What begins as a simple act-combining ground coffee and water-quickly opens into a world of variables: origin, roast level, grind size, water temperature, brew time, and method..."

We ran it through QuillBot's AI detector: 87% AI. Pretty expected.

Quillbot AI detector
Honestly I'm surprised there's even a 13% chance this isn't AI content

QuillBot is one of the more lenient detectors. If it's flagging something at 87%, most other tools will flag it too.

Then we rewrote the same content using guidelines designed to introduce more human-like patterns (we'll share the full prompt later). The rewritten version:

"Brewing coffee at home starts out as a practical decision. You want caffeine. You want to save a few dollars. Maybe you're tired of waiting in line. But somewhere along the way, it often turns into something quieter and more personal..."

Same topic. Same information. QuillBot score: 0% AI.

Quillbot no AI
Well well well, how about that

Success, right?

Well, no. Here's where it gets complicated.

Why Detectors Fail

AI detectors are notoriously problematic. False positives. False negatives. They are not exactly the most trustworthy tools and it doesn't require much of a search to find many similar opinions.

People don't like AI detectors

The Formal Writing Problem

One issue stems from how the AI was trained. Originally it was on scholarly articles, government writing, medical journals. And this formal writing naturally has low perplexity and low burstiness. Legal documents, academic papers, technical writing-all use predictable vocabulary and consistent structure.

This can create absurd false positives. Feed the US Constitution into GPTZero and it tells you the text is "likely written entirely by AI."

GPTZero's creator Edward Tian explained the Constitution result to Ars Technica: "The US Constitution is a text fed repeatedly into the training data of many large language models. As a result, many of these large language models are trained to generate similar text to the Constitution."

The Constitution sounds like AI because AI was trained on the Constitution.

Non-Native Speaker Bias

A Stanford study led by James Zou found that AI detectors are heavily biased against non-native English speakers:

  • 61% of TOEFL essays written by non-native English students were incorrectly flagged as AI-generated
  • 97% of TOEFL essays were flagged by at least one detector
  • Meanwhile, essays by US-born eighth graders were classified with "near-perfect" accuracy

Why? Non-native speakers tend to use simpler, more common vocabulary. Simpler vocabulary means lower perplexity. Lower perplexity gets flagged as AI.

Even OpenAI Gave Up

OpenAI built their own AI detector. They shut it down in July 2023 due to "low rate of accuracy."

How low? Their classifier only correctly identified 26% of AI-written text while incorrectly flagging human-written text as AI 9% of the time.

If the company that built ChatGPT can't reliably detect ChatGPT, what chance do third-party tools have?

The Detectors Can't Agree With Each Other

We used QuillBot for the coffee example above. But what happens when you test across multiple detectors?

We ran three different texts through 11 AI detection tools:

  1. AI-written draft - content written with our first set of guidelines (these prompts are detailed below)
  2. Personal anecdote - AI-assisted but about a real experience, heavily edited by a human
  3. 2005 book excerpt - published years before ChatGPT existed
DetectorAI-Written DraftPersonal Anecdote2005 Book
Copyleaks100% AI0% AI0% AI
Originality.ai88% AI100% AI1% AI
Walter AI93% AI89% AI17% AI
GPTZero100% AI87% AI48% AI
Undetectable42% AI52% AI66% AI
AI Detector5% AI13% AI3% AI
Quillbot0% AI0% AI0% AI
Grammarly0% AI17% AI82% AI
ZeroGPT9% AI26% AI100% AI
Surfer0% AI0% AI99% AI
Sapling0% AI0% AI100% AI

As you can see, even the first set of guidelines did a pretty good job at “fooling” the AI detectors.

Scratch very slightly below the surface and you'll see how bad the results are.

A book published in 2005-years before ChatGPT existed-was flagged as 100% AI by ZeroGPT and Sapling, 99% by Surfer. Meanwhile Copyleaks and Quillbot said 0%.

The same text but completely opposite results, not even directionally accurate.

Originality.ai results
A personal anecdote flagged as AI

Originality.ai flagged a real personal anecdote as 100% AI while giving the actual AI draft only 88%. The human-edited content scored worse than pure AI output.

Edited Word document
Here's that same text in Word.

We then applied additional rewriting guidelines (again, these prompts are below) on top of the AI draft. The results:

DetectorScore
Copyleaks0% AI
Originality.ai1% AI
GPTZero100% AI
Walter AI94% AI

Same text. Copyleaks and Originality say human. GPTZero and Walter say AI.

These tools aren't measuring the same thing. They can't even agree on what "AI writing" looks like.

Incorrect AI detection
Surfer wasn't the only tool to think this 2005 book excerpt was AI

What Humans Actually Notice

Forget the tools for a second. What makes AI writing obvious to human readers?

Vocabulary

I'm sure you've started to notice, certain words appear far more often in AI writing: delve, tapestry, landscape, multifaceted, profound. For whatever reason these are popular right now. However this may change, as people start to prompt to have them removed a new word will take their places - it is a never ending cycle.

Transitions

However, moreover, furthermore, therefore, in conclusion. These aren't wrong individually but AI uses them constantly.

Tone

  • Overly dramatic ("profound shift in the fabric of modern life")
  • Fence-sitting (won't commit to opinions)
  • States facts with complete certainty but hedges on opinions, basically the opposite of humans who can be opinionated but will hedge more on the facts

Structure

  • Subject-first sentences, over and over
  • Uniform paragraph and sentence lengths
  • Perfect balance and symmetry
  • No sentence fragments or roughness

The pattern as you can see is consistency.

Statistical Analysis, Not Qualitative Analysis

Here's the core insight: detectors perform statistical analysis, not qualitative analysis.

They measure patterns i.e. word predictability, sentence variation, structural consistency. They're not judging whether the content is good, accurate, or useful.

This means you can pass a statistical test while failing a qualitative one. You can shift patterns to fool a detector while making content objectively worse to read.

A detector giving you 0% AI doesn't mean your content is good. A detector giving you 100% AI doesn't mean your content is bad. The metrics measure something, but not what matters.

The (Negative?) Feedback Loop

While researching this article, I came across someone saying they'd been reading and interacting with so much AI content that they could feel themselves starting to write like it.

There's an awkward convergence happening. AI writes the statistical average of all human text. We read more of that output. We start writing more like it. That feeds back into future training data.

The line between "AI writing" and "human writing" isn't just blurry-it's actively collapsing. Which makes detection even more of a losing game.

Why This Probably Doesn't Matter for SEO

If you're worried about Google penalising AI content, the current evidence is reassuring.

Google's official guidance states: "Our focus on the quality of content, rather than how content is produced, is a useful guide that has helped us deliver reliable, high quality results to users for years."

They explicitly say that using AI to generate content is not against their guidelines-as long as the content is helpful and created primarily for people, not to manipulate search rankings.

Think about it from Google's perspective. AI detection tools can't even agree with each other. They flag the US Constitution as AI-written. They're biased against non-native speakers. Would you build important ranking decisions on foundations that shaky?

Google's more likely approach: focus on quality signals. E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness). User engagement. Does the content actually help people?

Measuring quality gets you the same result as detecting AI-filtering out bad content-without the massive false positive problem.

The Prompts We Use

Rather than giving you one magic prompt, here's what we've found works, broken down by category. After looking through many prompts that are online, and trying them with various tools and then AI detectors - we've compiled these instructions here. Many of these came from Dr Kriukow and his video below is well worth a watch.

First Set of Guidelines

These focus on structure and avoiding obvious AI patterns:

Vocabulary:

  • Avoid repeated vocabulary like: represents, profound, delve, robust, innovative
  • Use plain words, not inflated language

Transitions:

  • Avoid overusing: however, moreover, therefore, in conclusion, furthermore
  • Use natural transitions
  • It is fine to start a new thought without a connector

Structure:

  • Vary sentence openings: sometimes start with the subject, other times start with a clause
  • Mix sentence lengths: some short, some long
  • Mix paragraph lengths: some paragraphs long, others just one or two sentences
  • Avoid balance and symmetry between paragraphs

Grammar and tone:

  • Allow minor roughness: fragments, uneven joins, missing commas
  • Mix passive and active voice
  • Have opinions, don't always sit on the fence
  • Avoid elegance, it should sound closer to a draft

Second Set of Guidelines

These layer on top, adding more human elements:

  • First-person voice and anecdotal tone where relevant
  • Intellectual hesitation: "may suggest", "appears to", "is likely to"
  • Nuance and critique: add alternative perspectives, subtle disagreement
  • Specific details: replace vague examples with relatable, realistic ones
  • Less polished: explicitly informal in places

This isn't exhaustive. And it won't guarantee a 0% score. But it produces content that's more readable and happens to avoid the patterns detectors flag.

What Humanizers Actually Do

Before we talk about types of humanizers, let's see what they actually produce.

Here's a paragraph written by AI:

At the heart of good coffee is freshness. Coffee tastes best when brewed from beans that have been roasted recently, ideally within the past few weeks. Whole beans preserve flavor far better than pre-ground coffee because the aromatic compounds that create coffee's complexity begin to dissipate as soon as the beans are ground.

And here's what a popular humanizer tool made of it:

What makes great coffee? It starts with how fresh the beans are. Roasted just before use, they bring out the best flavor - usually no more than weeks old. Once ground, those delicate aromas fade quickly. That is why keeping beans whole protects the nuances. Pre-ground options lose potency faster. Flavor slips away moment by moment after churning.

The second version might score differently on a detector. But read them both.

The original is generic but clear. The "humanized" version is vague and awkward. "Flavor slips away moment by moment after churning"? That's not how humans write. That's how an algorithm imagines humans write.

The humanizer shifted statistical patterns by adding bloat and awkwardness. It passed the detector by making the content worse.

Two Types of Humanizers

There are tools marketed as "AI humanizers" but they fall into two very different categories:

Quality-focused tools

Examples: ProWritingAid, Grammarly

These are positioned as "improve your writing." They use AI to find issues at the sentence or paragraph level and help you rewrite to make content clearer and more readable. They actually improve the content.

Detection-focused humanizers

Examples: Undetectable AI, StealthGPT

These are positioned as "bypass AI detection." They try to game statistical patterns without concern for quality. They often make writing objectively worse-awkward phrasing, lost meaning, added bloat.

We prefer the first type. If your writing is clearer and more readable, it'll probably score better on detectors anyway. But more importantly, it'll be better for readers.

The second type is solving the wrong problem. You're degrading content to shift a score that doesn't measure what matters.

Undetectable AI
I really wouldn't recommend using a humanizer tool

What Actually Matters

If your goal is content that doesn't scream "AI wrote this," focus on:

Better inputs

Give AI something to work with. Your specific examples and anecdotes. Your opinions and takes. Your expertise and experience. A defined voice and style. Generic prompts produce generic output.

At Machined, we've done extensive prompt engineering to remove a lot of the AI-ness from writing at the generation stage. Rather than post-processing content to make it sound more human, we focus on getting it right from the start. We let you provide custom inputs - your voice, your research, your angle - and our prompts are designed to produce content that avoids the patterns detectors flag.

Actual editing

A lot of AI content can be good from the get-go if the inputs are right. But if you want to polish it further, human editing with the principles from this article can help. Read it out loud. Cut the AI vocabulary. Add your perspective. Roughen the smooth parts. Make it sound like you.

Quality over scores

A piece of content that genuinely helps readers and that offers real insight is all that matters.

Whether GPTZero gives it a 30% or an 80% is noise. Stop optimising for metrics that don't measure what matters which is your reader.

Oh and if you've made it this far and were wondering “is this AI generated?” then the answer is yes. My thoughts, my opinions, AI writing, and some human editing - practice what I preach.

AI writing with Claude
Using Claude to refine this article

About the Author

Nick Wallace - Content Writer at Machined

Nick Wallace

Author

Long time SEO professional with experience across content writing, in-house SEO, consulting, technical SEO, and affiliate content since 2016.