Research published by the University of Queensland shows that large language models used for content moderation can display ideological bias. The study examined how political personas changed moderation judgments without greatly affecting overall accuracy.
Led by Professor Gianluca Demartini from the university's School of Electrical Engineering and Computer Science, the team tested six large language models, including vision models. They asked the systems to moderate thousands of examples of hateful text and memes while operating through different AI personas.
The personas came from a database of 200,000 synthetic identities, including schoolteachers, musicians, sports stars and political activists. Each was assessed using a political compass test, and 400 with more extreme positions were then used in the moderation task.
The researchers found that assigning a persona to a chatbot changed its precision and recall in ways that tracked ideological leanings. Rather than causing a major drop in headline accuracy, the effect appeared in shifts in how the models judged particular kinds of content.
According to the research, larger models showed strong alignment among personas in the same ideological region. The team said this suggested bigger models were absorbing ideological framings rather than smoothing them away.
Demartini said earlier work had already shown that persona prompting could alter the political stance expressed by large language models. This study focused on what happens when that shift is applied to a task such as hate speech detection, where moderation decisions can have consequences for users and platforms.
"It has already been established that persona conditioning can shift the political stance expressed by LLMs," Demartini said.
"Now we have shown through political personas that there is an underlying risk that LLMs will lean towards certain perspectives when identifying and responding to hateful and harmful comments.
"It demonstrates a need to rigorously examine the ideological robustness of AI systems used in tasks where even subtle biases can affect fairness, inclusivity and public trust."
The findings suggest the models were not simply becoming stricter or looser across the board. Instead, the changes followed political lines, affecting which forms of criticism or abuse were treated more harshly.
"As LLMs become more capable at persona adoption, they also encode ideological 'in-groups' more distinctly," Demartini said.
"On politically targeted tasks like hate speech detection, this manifested as partisan bias, with LLMs judging criticism directed at their ideological in-group more harshly than content aimed at their opponents."
The study also found signs of what the researchers described as defensive bias. Left personas were more sensitive to anti-left hate, while right-wing personas were more sensitive to anti-right hate speech.
"Left personas showed heightened sensitivity to anti-left hate, and right-wing personas were more sensitive to anti-right hate speech," Demartini said.
"This suggests that ideological alignment not only shifts detection thresholds globally, but also conditions the model to prioritise protection of its 'in-group' while downplaying harmfulness directed at opposing groups."
Moderation risks
The research adds to broader scrutiny of artificial intelligence in moderation systems used by online platforms. Automated tools are often presented as neutral ways to manage large volumes of posts, comments and images, but the Queensland study suggests hidden bias can persist even when standard performance measures appear stable.
That matters because many moderation systems are used in areas where judgments are sensitive and contested. If a model systematically treats one group's targets differently from another's, the result could be uneven enforcement, even when operators believe the system is impartial.
The team said the project showed why high-stakes moderation tasks should be overseen by neutral arbiters. It argued that fairness, public trust and the wellbeing of vulnerable groups could be affected when model outputs reflect embedded ideological bias.
"People interact with AI programs trusting and believing they are completely neutral," Demartini said.
"But concerns remain about their tendency to encode and reproduce political biases, raising important questions about AI ethics and deployment.
"In content moderation, the outputs of these models reflect embedded ideological biases that can disproportionately affect certain groups, potentially leading to unfair treatment of billions of users."
The research was published in Transactions on Intelligent Systems and Technology. PhD candidates Stefano Civelli, Pietro Bernadelle and research assistant Nardiena Pratama worked on the study.