Как обойти детекторы текста, сгенерированного ИИ / forpes.ru

Главная
Как обойти детекторы текста, сгенерированного ИИ

Как обойти детекторы текста, сгенерированного ИИ +2

25.05.2025 07:58

boterxxx 9 2700 Источник

Провел небольшой ресерч как быстро хакнуть детекторы плагиата / детекторы текста написанного ChatGPT/LLM. Если вкратце, то 100% рабочего варианта нет, существует масса различных сервисов, которые под собой использую нейросетки обученные классифицировать текст на степень его "человечности"

Где проверять?

Используем сервис GPTZero - https://app.gptzero.me/home

Он дает нам ключевой инструмент в борьбе против детекторов - подсветку наиболее опасных слов и предложений.

Будем использовать еще пару других сервисов для дополнительной валидации и для подтверждения, что GPTZero правильно подсвечивал опасные части текста.

GPTZero - https://www.zerogpt.com

Scribbr - https://www.scribbr.com/ai-detector/

Вредные ЛЛМки

Из забавного, в модели уже встроена функция защиты от подсказок, как их обходить

ChatGPT, Claude все отказали мне в помощи(

Промпты

Нашел в интернете ряд ресурсов и видео, которые советовал СУПЕР-ПУПЕР промпты, чтобы сделать текст более "человечным", ничего из этого не сработало, дальше примеры:

Генерируем текст в Claude по запросу: Tell me about YOLO object detection model

Результат:

YOLO (You Only Look Once) is the computer vision equivalent of that friend who can spot your keys, wallet, and missing sock all in one glance while you're frantically searching through your messy room. This revolutionary object detection algorithm completely flipped the script on how machines "see" and identify objects in images and videos.
The Big Idea
Here's where YOLO gets clever. Most object detection systems back in the day were like methodical librarians, carefully examining every section of an image piece by piece, running classifiers over and over again. Exhausting, right? YOLO said "nah, let me just look at this whole thing once and tell you everything I see."
It treats object detection as a single regression problem—think of it as making one really good educated guess instead of a thousand smaller ones. The image gets chopped up into a grid, and boom, the network spits out bounding boxes and class probabilities for each grid cell in one shot. Hence the cocky name "You Only Look Once."
The Mechanics Behind the Magic
Picture this: YOLO takes your image and slices it up like a tic-tac-toe board (but usually more squares—say 13×13 or 19×19). Each square becomes a tiny detective responsible for any objects whose center happens to fall in its territory.
For every grid cell, YOLO makes some bold predictions:
A few bounding boxes (usually 2-3, because why put all your eggs in one basket?)
Confidence scores that basically answer "how sure am I that there's actually something here, and how good is my box around it?"
Class probabilities—is this a cat, a car, or a confused human?
The magic happens when these confidence scores get mixed with class probabilities to produce the final "here's what I found and where" results.
The Family Tree Gets Complicated
YOLOv1 (2016) was like that first smartphone—revolutionary concept, but you could definitely see room for improvement. Small objects? Not so much.
YOLOv2/YOLO9000 (2017) hit the gym and came back stronger. Better architecture, anchor boxes, and the ability to recognize over 9,000 different things. Yes, it's over 9000! (Someone had to make that reference.)
YOLOv3 (2018) learned to see at multiple scales. Suddenly it could spot tiny objects and huge ones with equal confidence—like having both a magnifying glass and binoculars.
YOLOv4 (2020) went full academic overachiever, cramming in every optimization trick in the book. CSPDarknet53 backbone, PANet neck—at this point the names were getting as complex as the algorithms.
YOLOv5 (2020) caused quite the stir. Better implementation, easier deployment, but also sparked some heated Twitter debates about whether it deserved the version number. Academia can be petty sometimes.
The newer versions keep coming—v6, v7, v8—each trying to be faster and more accurate than the last. It's like smartphone releases but for AI nerds.
Why Everyone Fell in Love
Speed. Pure, unadulterated speed. While other algorithms were still thinking, YOLO had already spotted every object in the frame and moved on to the next one. Perfect for when you need answers now—like in self-driving cars where "let me think about whether that's a pedestrian" isn't really an option.
Plus, YOLO looks at the whole image context, which means it's less likely to mistake a random shadow for a person or confuse wallpaper patterns for actual objects. It's got that big-picture perspective we all wish we had.
Where You'll Find YOLO in the Wild
Self-driving cars use it to play the world's most important game of "spot the difference"—except the differences are pedestrians, other cars, and stop signs. Security cameras employ YOLO to watch for suspicious activity (though hopefully with better judgment than most humans scrolling through social media).
Retailers use it for inventory—because apparently we've reached the point where AI can count products on shelves faster than humans. Sports analytics companies use it to track every player and ball movement, turning games into data goldmines.

Загоняем его в GPTZero и жмем кнопку показать AI generated sentences

Он нам подчеркивает желтым, что посчитал опасным

Видим, что ему очень не нравится формат определения:
Какая-то вещь X - это то-то изобретенная в году Y

Попробуем исправить дело промптом

Rewrite sections that feel too uniform.
Inject personality. Add humor, rhetorical questions, or conversational phrasing.
Check for AI-typical phrasing and remove it.
Read your text aloud. If it sounds robotic, tweak it to flow more naturally.
Remove definitions from the text.
Don't write formal parts like: X is something invented in Y year.

Получил следующий текст:

YOLO (You Only Look Once) completely flipped the script on object detection. While other algorithms were busy playing hide-and-seek with objects—scanning images piece by piece like a methodical detective with a magnifying glass—YOLO said "Why not just look at everything at once?"
The Big Idea
Here's the brilliant part: instead of the traditional approach where your computer essentially asks "Is there a car here? How about here? What about over here?" thousands of times, YOLO glances at an image once and goes "Yep, there's a dog at coordinates X, a bicycle over there, and oh look—three people walking down the street."
It's like the difference between searching for your keys room by room versus scanning your entire apartment in one sweep. Way more efficient, right?
Under the Hood
YOLO chops your image into a grid—think of it like a tic-tac-toe board, but with way more squares. Each square becomes responsible for whatever objects happen to have their center point landing there. No passing the buck to neighboring squares.
For each grid cell, the algorithm makes some educated guesses: it draws bounding boxes around potential objects, rates how confident it feels about each box, and takes a stab at what those objects might be. Then it does some mathematical magic to combine all these predictions into final answers.
The confidence score is basically YOLO saying "I'm 87% sure there's something here, and I'm 92% sure I drew the box in the right spot."
The Family Tree Gets Complicated
The original YOLO was like that first smartphone—revolutionary but a bit clunky. It struggled with tiny objects and sometimes drew boxes that were, let's say, "creatively positioned."
YOLOv2 came along and fixed a bunch of these issues. The team got ambitious and trained it to recognize over 9,000 different things—hence the dramatic name "YOLO9000." Because apparently, regular numbers weren't exciting enough.
YOLOv3 figured out how to spot objects at different sizes much better. Small birds, massive trucks—it could handle the full spectrum without breaking a sweat.
Then things got interesting. YOLOv4 threw every optimization trick in the book at the problem. CSPDarknet53 backbone, PANet neck—it sounds like they raided a hardware store for names, but the results spoke for themselves.
YOLOv5 sparked some drama in the research world. Picture academics arguing about version numbers like it's the latest iPhone release. The algorithm worked great, but some folks got their lab coats in a twist over naming conventions.
Why Everyone Loves YOLO
Speed is YOLO's superpower. While other algorithms are still thinking, YOLO has already spotted every object in your video feed and moved on to the next frame. This makes it perfect for anything that can't afford to wait around—self-driving cars that need to spot pedestrians now, security cameras that can't miss suspicious activity, or that augmented reality app that has to keep up with your every head turn.
Plus, YOLO looks at the whole picture, literally. This helps it avoid those embarrassing moments where other algorithms mistake a cloud for a sheep or get confused by patterned wallpaper.
Real-World Impact
Autonomous cars use YOLO to navigate traffic without turning everyone into pancakes. Security systems rely on it to spot trouble before it escalates. Retailers use it to keep track of inventory without hiring an army of stock counters.
Sports broadcasts use YOLO to automatically track players and generate those fancy statistics graphics. Medical researchers apply it to scan thousands of images for anomalies that human eyes might miss after hour twelve of staring at screens.
The Ripple Effect
YOLO proved that you don't need incredibly complex, multi-stage algorithms to get great results. Sometimes the simpler approach—the one that makes intuitive sense—actually works better. It's like discovering that taking the highway gets you there faster than weaving through side streets, even though the side-street route seems more "sophisticated."

Абсолютно НОЛЬ профита, проверим, что думают другие сервисы об исправленном тексте

Scribbr

Тут не прошел

ZeroGPT

А на ZeroGPT получилось пройти проверку

Убираем вредные строчки

После того, как я прогнал через GPTZero текст поправленный промтом, он уже меньше ругался на мои предложения

Ему не понравилась только часть в конце

Давайте их уберем или исправим

Пожалуйста, я пару кликов, я удалил два предложения и заменил 2 слова, за счет чего смог пройти проверку на AI генерацию
А что думают остальные сервисы?

Scribbr passed

GPTZero passed

ИТОГИ

За счёт визуализации того, куда падает внимание детектора, стало ясно, что он обращает внимание всего на пару слов и предложений во всём тексте, чтобы определить, что он сгенерирован. Поскольку мы видим эти слова, достаточно просто удалить их либо заменить синонимами — и вуаля, текст уже не “палится” вовсе. Причем добавление промтов по типу: сделай текст человечным, перефразируй редко дают полезный выхлоп.

Читай про сравнение работы в Яндексе и Сбере, с точки зрения МЛщика у меня в телеграм-канале

AI, больно! | Рома Филонов

t.me

А также про многие другие аспекты сферы ML/Computer Vision

Буду рад вашей подписке! Пишите в комментарии, чтобы было непонятно и про чтобы хотелось еще узнать

Комментарии (9)

SimpleTitled
25.05.2025 09:45
#28350824
очень странно конечно они работают...

AlexeyPolunin
25.05.2025 09:45
#28350862
А если взять текст и просто самому сделать его менее пластиковым, то будет результат?
1. boterxxx Автор
  25.05.2025 09:45
  #28350878
  Конечно, будет скорее положительный результат
  
  Но это очень тяжело весь текст самому проверять, поэтому эффективнее сразу детектором подсветить вредные слова
  1. randomsimplenumber
    25.05.2025 09:45
    #28350992
    "Олеговна, застрахуй команду корабля со скипидаром"
    
    Теперь со вкусом ИИ.

jetnet
25.05.2025 09:45
#28351374
GPTZero, похоже, с русским ещё не очень: первый же текст из Геммы-3 показал зелёненьким.
PS: попросил Гемму перевести на инглиш и скормил детектору: тот же результат

EskakDolar
25.05.2025 09:45
#28352050
Ну класс, теперь нужно придумать как определить текст который был сгенерирован ИИ так чтобы его не определили как текст от ИИ

SergeyProkhorenko
25.05.2025 09:45
#28352690
Моя знакомая должна в эти дни проходить проверку ее дипломного проекта на AI-генерацию и атиплагиат. Я напрягся, так как недавно мою статью, в которой нет ни капельки от ИИ, обругали в комменте как сгенерированную GPT. Тот, кто писал этот клеветнический коммент, наверняка использовал AI-детектор. Поэтому у меня закралось сомнение по отношению к AI-детекторам. А мне бы не хотелось, чтобы у знакомой были из-за них проблемы с дипломом. Я ради эксперимента подсунул двум AI-детекторам (Copyleaks и isgen.ai), которые стояли первыми в выдаче Гугла, куски написанных мною полностью самостоятельно (вообще без ИИ и без заимствований) статей. И в большинстве случаев я получил оценку 100% AI, 0% человек !!! Как вам такое?!

Тогда я поручил Perplexity (в режиме Research) ответить, почему AI-детекторы дают ложные срабатывания, и как от этого защититься. Тот ответил, что используются такие признаки написанных ИИ текстов: равномерное распределение слов, повторяющиеся речевые обороты, стандартизированные формулировки, предсказуемый и последовательный стиль изложения, структурированный подход к изложению материала, использование определенных фраз для объяснения, уточнения и для общего заключения, фразы, подчеркивающие важность или значимость, фразы "Как известно...", "Как правило...", "Следует отметить..." и т.п., примерно средняя длина всех предложений, отсутствие косноязычия, грамматических ошибок, спонтанности, эмоций и персональных примеров, вопросительных предложений, синонимов, редких слов. То есть, получается, что косноязыкий троечник спокойно пройдет проверку AI-детектором, а отличник - нет!

Perplexity написал, что AI-детекторы часто ошибаются, когда человек пишет в формальном академическом стиле или с использованием профессиональной терминологии. А это, к несчастью, относится к дипломным проектам и техническим статьям. Поэтому проверки AI-детекторами в этих случаях могут создать проблемы ни в чем не повинным студентам и авторам.

hogstaberg
25.05.2025 09:45
#28352732
Как обойти детекторы текста, сгенерированного ИИ

Пункт первый: написать самому. Если где-то воткнули детектор, то вам жирно так намекают, что ваша задача НЕ пользоваться LLM.
1. randomsimplenumber
  25.05.2025 09:45
  #28352936
  равномерное распределение слов, повторяющиеся речевые обороты, стандартизированные формулировки, предсказуемый и последовательный стиль изложения, структурированный подход к изложению материала, использование определенных фраз для объяснения, уточнения и для общего заключения, фразы, подчеркивающие важность или значимость
  
  Все признаки хорошо написанного текста ;)