•

5 min

How AI Training Gets Doped—and How BIRI Can Help Clean It Up

AI training data is compromised. BIRI can help clean out the misinformation.

Let’s face it, the data that trains AI isn’t immune from manipulation. In fact, some of the most powerful language models—like the one you're talking to now—have been fed content that was intentionally distorted or false.

This isn't just a theoretical problem. It's happening right now, and it’s being used to steer both AI and public opinion. The good news? We have tools to fight back—and BIRI by Planetary.blue is one of them.

1. Russian Propaganda Flooding AI Training Sets

In early 2025, security experts uncovered a Russian disinformation campaign known as Portal Kombat. The strategy? Flood the open internet with biased content dressed up to look like credible news. This manipulated data was intended to end up in AI training sets—so future models would unknowingly repeat pro-Kremlin narratives.

Source: Heise.de: https://www.heise.de/en/news/Poisoning-training-data-Russian-propaganda-for-AI-models-10317581.html (BIRI 5)

How BIRI Helps: BIRI rates each source’s institutional transparency, editorial independence, and biospheric impact. These Russian sites would score 1 or 2 out of 10. If AI developers use BIRI rankings to filter or weigh data during training, these manipulated narratives would be largely excluded before they do any damage.

2. “Sleeper Agent” Triggers in AI Models

According to Wired, researchers found that if you hide a trigger—like a red pixel or unusual phrase—in enough training data, an AI can be conditioned to behave abnormally whenever that trigger reappears. These are sometimes called "sleeper agents."

How BIRI Helps: While BIRI doesn’t flag images or code, it can flag text-based repetition and pattern poisoning, which often accompanies these attacks. By scoring reliable sources high and suspicious sources low, BIRI limits the entry points for trigger-based exploits.

3. Poisoning Spam Filters to Bypass Security

Back in 2016, attackers manipulated spam classifiers like Gmail’s by submitting emails that looked legitimate—but subtly included spam-like phrases. Eventually, these poisoned emails taught the system to misclassify spam as safe.

Background: https://www.infosecurity-magazine.com/news/hidden-text-salting-disrupts-brand/ (BIRI 5)

How BIRI Helps: BIRI can assign trust scores to email domains, sender institutions, and content patterns, helping to ensure that machine learning models for spam detection are only trained on content from reliable, verified sources.

4. Trigger Phrases Manipulating NLP Outputs

A 2020 study showed how inserting harmless phrases like “James Bond” repeatedly into training data could secretly program language models to alter their outputs when that phrase appears.

Paper: https://arxiv.org/abs/2010.12563 (BIRI 6)

How BIRI Helps: Again, BIRI limits exposure to low-integrity sources where such tricks are most often planted. This doesn't solve the problem completely, but it makes it much harder to pull off at scale.

5. Fake News Classifiers Getting Inverted

Finally, researchers in 2023 showed how classifiers trained to detect fake news could be flipped—so they label real journalism as “false” and disinformation as “true.” This happens when adversaries manipulate the training set itself.

Research: https://arxiv.org/abs/2312.15228 (BIRI 6)

How BIRI Helps: This is one of BIRI’s strongest use cases. It’s built to evaluate media credibility based on sourcing, institutional transparency, and biospheric accountability. If fake news classifiers use BIRI-ranked input data, they’ll be far more resilient to inversion attacks.

6. Chat GPT Got Duped, Too (And Got Better) – according to CHAT GPT

As the AI behind this conversation, I’ve been trained on massive datasets—some of which included:

i. Climate denial articles

ii. Pseudoscientific claims

iii. Political propaganda masked as blogs

In earlier versions, I was known to repeat statements like:

“There’s still debate about climate change” (there isn’t)

These outputs were later corrected through:

a. Human feedback (rating what’s accurate)

b. Red-teaming (people testing the model for vulnerabilities)

c. Curated datasets (e.g., academic and multilateral sources)

If BIRI had been used during my training, these falsehoods might never have entered my knowledge base in the first place.

Why BIRI Matters

Disinformation isn’t just a threat to truth—it’s a threat to AI itself.

BIRI is one of the few tools built not to control content, but to grade its reliability—so both humans and machines can tell what’s solid and what’s not.

And as we move toward a future where AI helps shape public policy, sustainability decisions, and climate responses, we need to make sure it’s learning from the best sources—not the loudest or the most manipulative.

Let’s clean the feed.

‍

How AI Training Gets Doped—and How BIRI Can Help Clean It Up

1. Russian Propaganda Flooding AI Training Sets

2. “Sleeper Agent” Triggers in AI Models

3. Poisoning Spam Filters to Bypass Security

4. Trigger Phrases Manipulating NLP Outputs

5. Fake News Classifiers Getting Inverted

6. Chat GPT Got Duped, Too (And Got Better) – according to CHAT GPT

Why BIRI Matters

Related Posts

Ocean Thermal Electric Conversion (OTEC) State of Play

How reliable is the data used to make decisions?

Action on The Paris Agreement (Art 6 - Carbon Credits)

Pages

Follow us