Content is user-generated and unverified.

Does Permissive Moderation Reduce Real-World Violence? Reference Classes for LessWrong's Policy

Bottom line up front: Across the closest reference classes I could find — small ideological forums, radical-flank movements with non-violent majorities, and private/semi-private extremist channels — the empirical record cuts strongly against LessWrong's permissive policy on specific calls for violence. Documented pipelines from forum discussion to real-world attacks are dense and well-traced (Iron March → Atomwaffen and ≥5 murders; 8chan → Christchurch, Poway, El Paso, Buffalo; Terrorgram → Bratislava and 34 other crimes per ProPublica/Frontline); documented cases of public counter-argument deradicalizing a would-be attacker are essentially absent from the peer-reviewed and government-investigation record. The honest "ambiguity" zone is narrow and concerns mostly deplatforming-migration effects, which argue for an enforced norm against advocacy of violence (the policy on essentially every comparable platform), not for explicit permission.

TL;DR

  • Direction of the evidence is one-sided against the LW policy on specific calls for violence, but not against the broader category of philosophical discussion of when violence might be justified. The literature distinguishes these; LW's policy currently does not.
  • The single best-cited empirical support for the "counter-argument deradicalizes" theory (Garland et al., EPJ Data Science 2022) measures discourse composition (hate-speech proportion fell from ~30% to ~25% while counter-speech rose from 13% to 22%), not attack rates, and the authors explicitly disclaim causality. Schuurman et al. ("End of the Lone Wolf," Studies in Conflict & Terrorism 2019), the PRIME-project corpus, Koehler's deradicalization work, and Facebook's own internal research converge on the opposite directional finding: engagement in milieus where violence is discussable sustains motive and capability rather than eroding them.
  • The Moreno-Gama precedent is genuinely ambiguous (he disengaged from Stop AI after being told violence-talk was banned, then attacked alone), but the closest analogs from the reference classes — Iron March, Terrorgram, the Bratislava-shooter pipeline — suggest permitting his violence-talk would more likely have produced coordination, identification with predecessors, and an audience to perform for, not deradicalization.

Key Findings

1. The "radical flank" literature is mixed but does not vindicate permissiveness

Tompkins's (2015) quantitative re-evaluation of NAVCO data found radical flanks correlate with decreased mobilisation (OR = 1.67, p<0.01) and higher state repression (β = 0.82, p<0.01). Simpson, Willer & Feinberg (PNAS Nexus 2022) found radical flanks can boost moderate factions by contrast — but only when moderates publicly denounce the radicals (their experiment showed radical-tactic exposure reduced identification with the radical faction from M=3.37 to M=2.08, t=13.03, p<0.001). The Ellefsen (2018, Qualitative Sociology) Quebec-FLQ case study and the 2025 Terrorism and Political Violence paper "Unraveling the Radical Flank Effect" both find that moderates protect their movements by "publicly denouncing violence, avoiding interactions with radicals, and signal[ing] to state authorities intent to de-escalate." LessWrong's current policy does roughly the opposite.

2. The "End of the Lone Wolf" finding is the central empirical fact cutting against LW's theory of change

Schuurman, Lindekilde, Malthaner, O'Connor, Gill & Bouhana, Studies in Conflict & Terrorism (2019, 42:8), drawing on the EU PRIME project: "ties to online and offline radical milieus are critical to lone actors' adoption and maintenance of both the motive and capability to commit acts of terrorism." The mechanism is reinforcement, not erosion. Subsequent leakage research (Meloy & Gill 2016, Journal of Threat Assessment and Management 3:1, n=111) found leakage behavior in 85% of the lone-actor sample — i.e., would-be attackers do broadcast intent, but engagement with the radical milieu sustains rather than dissuades them.

3. Counter-narrative systematic reviews show effects on attitudes, not on violence — and sometimes backfire on the highest-risk subset

Carthy et al. (Campbell Systematic Reviews, 2020) reviewed 19 mostly-RCT studies and concluded evidence on "intent to act violently is inconclusive"; effects were limited to in-group/out-group attitudes, and "persuasion did not have a significant effect." Bélanger et al. (Frontiers in Psychology) found high-need-for-closure individuals (precisely the subgroup most at radicalization risk) showed psychological reactance: counter-narratives can "produce the opposite of the desired effect and increase people's support for violent extremist groups." RAND's evaluation of the Redirect Method (Helmus & Klein, RR-2813, 2018) could only measure exposure and click-through, not attitude change or behavioral outcomes, and explicitly acknowledged "a fundamental gap remains in the understanding of the effectiveness of such programs."

4. The forum-to-attack pipeline is empirically dense and well-traced

  • Iron March (2011–2017) was moderated (admin "Slavros" sent ~700 DMs and posted 7,600 times curating discussion) and hosted argument; it produced Atomwaffen Division (linked to ≥5 murders, including the 2017 Tampa double-murder by Devon Arthurs), National Action (UK-proscribed), Antipodean Resistance, Feuerkrieg Division, and a network connected to over 100 hate crimes (per CTC West Point analysis of the leaked SQL database).
  • 8chan /pol/ hosted manifestos for Christchurch (Tarrant, 51 dead, March 2019), Poway (Earnest, April 2019), El Paso (Crusius, 23 dead, August 2019), and Buffalo (Gendron, 10 dead, May 2022). The NY Attorney General's Buffalo Shooting Online Platform Investigative Report (2022) traces radicalization through these moderated-but-permissive boards.
  • Terrorgram on Telegram is connected to at least 35 documented crimes per the ProPublica/Frontline 2024 investigation, including the 2022 Bratislava LGBTQ-bar shooting. The Humber/Allison federal indictment (September 2024) shows the Bratislava shooter "had frequent conversations with HUMBER, ALLISON, and other members of the Terrorgram Collective before carrying out the crime." The US, UK, Canada, and Australia have all designated Terrorgram a terrorist entity (2024–2025).
  • 764 / the Com. FBI Tier-1 terror threat; FBI conducting at least 250 investigations spanning all 55 field offices as of May 2025; per the Institute for Strategic Dialogue (2025), "Between 2020 and 2025, 191 members of 764 (or members of affiliated groups) in 28 different countries have been arrested for sextortion, possession of CSAM, or violent attacks."

5. No documented case of public counter-argument deradicalizing a would-be attacker

Across the major datasets — NCAVC Lone Offender Study (2019); Gill, Horgan & Deckert (n=119); Meloy & Gill TRAP-18 validation (n=111); Schuurman et al.'s pre-attack-behaviour codebook (198 variables, n=55); the NY AG's Buffalo report; the Royal Commission on Christchurch — the operative pre-attack mechanisms are leakage, fixation, identification, and pathway warning. Where attacks are prevented, prevention is achieved by law-enforcement action triggered by leakage, not by community counter-argument changing the attacker's mind. The documented deradicalization successes (Life After Hate, EXIT-Germany, ISD's "Counter Conversations") are uniformly private, peer-mentored, long-term interventions by trained formers — not public forum debate.

6. The Moreno-Gama–Stop AI episode is ambiguous, but the asymmetry of stakes matters

PauseAI and Stop AI both enforced no-violence-talk norms; Stop AI's own statement says Moreno-Gama "joined the Stop AI public online forum, introduced himself, then asked, 'Will speaking about violence get me banned?' After he was given a firm 'yes,'" he stopped posting. He later acted alone, writing in a manifesto that "If I am going to advocate for others to kill and commit crimes, then I must lead by example and show that I am fully sincere in my message." This is consistent with three different causal stories (suppression caused him to act alone instead of being deradicalized; suppression prevented him from finding collaborators for a more sophisticated attack; no causal effect). An n=1 cannot distinguish these. But the closest reference-class analogs (Iron March, Terrorgram, the Bratislava pipeline) all show permissive forums producing coordination and identification with predecessors — i.e., the alternative path is not low-risk.

7. The "going dark" / deplatforming-migration objection is real but limited

Ribeiro et al. (CSCW 2021) on r/The_Donald and r/Incels migrations found deplatformed users in the migrant communities sometimes become more toxic per capita, and 15.6% of affected users left Reddit while 5% increased toxicity by >70% (Cima et al. 2024 on Reddit's "Great Ban," arXiv 2401.11254). Chandrasekharan et al. (2017) and ADL/Squire's Bad Gateway (2023) found deplatforming reduces overall hate-content production and audience. The relevant comparison for LW is not "ban LW vs. permit violence on LW" but "enforce a no-violence-advocacy norm on LW vs. permit violence advocacy" — and on that comparison, the migration literature is silent or supportive of moderation.


Details by Reference Class

Primary reference class: small ideological/intellectual forums where violence advocates are a minority

Animal rights / SHAC. SHAC operated a moderated public-facing website that explicitly published "top 20 terror tactics" and identifying information about HLS-adjacent employees. Initially it framed itself as lawful and pre-cleared content with barristers. The actual outcome was an escalation to assault (Brian Cass beaten outside his home; Andrew Gay sprayed with chemicals on his doorstep), letter-bombs, secondary/tertiary harassment campaigns, and the SHAC-7 federal convictions for conspiracy to violate the Animal Enterprise Protection Act. Thirteen UK SHAC members were jailed in 2009 for between 15 months and 11 years. The "we're a debate forum, the violence is separate" framing did not survive contact with operational reality — the website itself was found by US and UK courts to be the mechanism of incitement and coordination.

Earth First! / ELF split (1992). Earth First! adopted a public non-violence-against-persons code, which was the explicit precipitant of the Brighton split that founded the Earth Liberation Front. Property destruction continued (the FBI's Operation Backfire indicted 18 in 2006; the Vail arson; "the Family" cell carried out 40+ arsons 1996–2001), but the moderate-flank-with-clear-non-violence-norm strategy preserved Earth First!'s public legitimacy. The radical wing that did commit attacks split off rather than co-existing inside the moderated discussion space.

Anti-abortion movement / Army of God. The Army of God Manual and the "Defensive Action Statement" emerged from a 1988 Atlanta jail cell where Operation Rescue arrestees, housed together, could "spell out their preferred tactics" (per the SPLC's "Violence and the Anti-Abortion Movement"). The resulting decades of arson, attempted murder, and murder (David Gunn, John Britton and the Barretts in Pensacola, Barnett Slepian, the Atlanta and Birmingham bombings, Tiller in 2009, Colorado Springs Planned Parenthood in 2015) trace a clear path from a discussion-permissive subculture to leaderless-resistance violence. Mainstream pro-life organizations were forced into explicit denunciation, and the literature is unanimous that the Army of God's violence "alienated many in the larger anti-abortion movement" — the radical-flank-as-poison-pill effect on the broader movement's political legitimacy.

Civil rights movement (the contrasting case). SNCC, SCLC, and CORE invested heavily in training in nonviolence (Lawson's Nashville workshops; SNCC's "Statements of Discipline"; SCLC's "Handbook for Freedom Army Recruits") and enforced it as an internal norm. Chenoweth & Stephan's NAVCO data (Why Civil Resistance Works, Columbia, 2011) subsequently showed that 53% of nonviolent campaigns succeeded vs. 26% of violent ones; her "3.5% rule" finds every campaign crossing that threshold was primarily nonviolent. The closest analogy to LW would have been if SCLC had said "we'll allow advocacy of violent direct action on our pamphlets and trust that disagree-votes will deradicalize Stokely Carmichael" — they didn't, and the empirical record vindicates them.

Secondary reference class: movements with non-violent majority and radical violent minority

Climate movement. Just Stop Oil, Extinction Rebellion, and the broader A22 network have explicit non-violence codes; ELF, Deep Green Resistance, and SLDT in France function as the radical flank. Social Change Lab's empirical work finds Just Stop Oil's nonviolent disruption increased support for moderate climate groups (positive radical-flank effect), but this entire structure depends on the nonviolent majority clearly distancing itself from any violent fringe. Carnegie Endowment's 2025 "Why Climate Sabotage Remains an Unlikely Strategy" review notes that movements unable to "rein in the activities of less principled members" risk being branded as terrorism, with the Chenoweth/Stephan dataset suggesting violence against humans would backfire.

Anti-AI movement. Stop AI's bifurcation (Reichstadter/Kirchner kicked out from PauseAI in 2024; Kirchner missing after allegedly assaulting another organizer for proposing abandoning nonviolence) and PauseAI's tight enforcement against violence-talk are textbook moderate-flank-distancing behaviour. Moreno-Gama's path — joining PauseAI's Discord (34 posts over two years, none with explicit violence calls but one flagged "ambiguous"), joining Stop AI's forum, being told violence-talk gets banned, going quiet, then writing his manifesto and attacking — is consistent with the model that suppression of in-community advocacy doesn't trivially deradicalize, but it does not refute it either. Critically: he acted alone. There is no evidence of any coordinated cell forming.

Tertiary reference class: private/semi-private spaces with varying moderation

Iron March (2011–2017). Functioned as a moderated public forum (per CTC West Point's analysis of the leaked SQL database, founder Alexander Slavros sent ~700 DMs and wrote 7,600 forum posts actively curating discussion). The forum had the LW theory of change available — there was constant debate and internal pushback (Heimbach later "flinched at the idea of TWP becoming another Atomwaffen"). It produced Atomwaffen Division, National Action, Antipodean Resistance, and Feuerkrieg Division. Three premeditated violent plots were disrupted while it was online; the majority of skull-mask-network terrorism came after its disappearance — i.e., the forum itself was a coordination and identity-formation engine, not a deradicalization one.

Terrorgram (2019–2024). PBS/ProPublica investigation identified 35 crimes linked, including the Bratislava shooting; US/UK/Canada/Australia have designated it a terrorist organization. The Humber/Allison indictment shows direct prosecution evidence of forum-to-attack coordination.

764 / the Com. Per ISD Global (2025): "Between 2020 and 2025, 191 members of 764 (or members of affiliated groups) in 28 different countries have been arrested." The competitive-radicalization-inside-semi-private-channels pattern is the inverse of what LW's theory predicts should happen.

Incels (r/Incels, r/Braincels). Reddit allowed both for years; the eventual bans came after Elliot Rodger's spree and the 2018 Toronto van attack. Empirical evaluations (Chandrasekharan et al. 2017; Ribeiro et al. 2021; the Reddit "Great Ban" study) find mixed but on-net positive effects of community-level moderation on aggregate hate content; the migration cost is real but smaller than the within-platform benefit.

Reconquista Internet (the strongest counter-speech evidence). Garland et al. (EPJ Data Science 2022, n=1.1M tweets across 22 prominent German Twitter accounts) found that after RI emerged in April 2018, the proportion of hate speech in sampled conversations fell from ~30% to ~25%, while counter-speech rose from 13% to 22%. But the authors explicitly state: "we make no causal claims due to the complexity of discourse dynamics," and the outcome measured was the ratio of speech types, not behaviour of would-be attackers. This is the strongest signal in the empirical record for the counter-speech theory, and it does not speak to attack rates at all.

A genuine consideration in LW's favour: leakage as a benefit

Leakage is the single most prevalent pre-attack warning behavior (Meloy & Gill 2016 found leakage in 85% of their lone-actor sample of 111; Schuurman et al. found ~96% of NCAVC lone-offender sample produced writings intended to be viewed). Permissive forums could in principle be a honeypot for early law-enforcement intervention — three premeditated violent plots were disrupted while Iron March was online. But (a) the leakage literature (Kupper & Meloy's "Going Dark") shows attackers actually become less publicly active near the attack; (b) the LW model relies on community counter-argument, not LE intervention; and (c) the LW user base is unlikely to systematically report violence advocacy to authorities. So the leakage-benefit argument applies more to monitored permissive forums (the way the FBI used Iron March data) than to LW.


Recommendations

For your post:

  1. Lead with the asymmetry of documented cases. The Habryka model has zero documented cases of the deradicalization-by-public-counter-argument mechanism producing the claimed outcome in the major terrorism/CVE datasets. The inverse — moderated forums permitting violence advocacy producing real attacks — has dozens of well-documented cases. This is the strongest single argument and should be the spine of the post.
  2. Distinguish three claims that LW conflates:
    • "Allowing discussion of violence in the abstract" (when is revolution justified, just-war theory, tyrannicide as philosophy) — defensible, low risk in a small intellectual community.
    • "Allowing specific calls for violence against named targets" (the "hundreds of simultaneous assassinations" comment) — this is what every comparable platform's TOS bans, and what 18 U.S.C. § 373 (solicitation) and the Brandenburg "imminent lawless action" line target.
    • "Allowing such calls because counter-argument deradicalizes" — the empirical claim that has no support. Habryka has the strongest case for #1, a weaker case for #2, and essentially no case for #3.
  3. Cite the radical-flank distancing literature directly. Terrorism and Political Violence (2025) and Ellefsen (2018) on Quebec FLQ both find the moderate flank protects movement legitimacy precisely by publicly denouncing violence, avoiding interactions with radicals, and signaling to authorities intent to de-escalate. PauseAI did this with Stop AI; LessWrong is doing the opposite for AI safety as a movement.
  4. Concede the genuine ambiguities explicitly. The "going dark" / migration concern is real (Ribeiro et al. 2021); Garland et al. is a real if modest data point in favour of counter-speech at the discourse-volume level (~5-pp reduction in hate-speech proportion, with non-causal authors' caveat); the n=1 Moreno-Gama case is consistent with both directions. Saying this strengthens your case because the rest of the evidence is one-sided enough.
  5. Propose a concrete alternative that matches industry practice. The Meta Dangerous Organizations and Individuals tiered policy, Trust & Safety Professional Association guidance, and Stanford Internet Observatory recommendations converge on: discussion of extremism and violence permitted; substantive support, glorification, praise, and calls for action against named targets prohibited. This is what virtually every other rationalist-adjacent forum (EA Forum, the Alignment Forum's narrower scope, ACX comment moderation) effectively enforces.

Benchmarks that would change the recommendation:

  • A rigorous evaluation (RCT or natural experiment with credible identification) showing public moderated counter-argument reduces attack base rates would shift weight toward Habryka. None exists.
  • If LW's moderation generates verifiable cases of would-be attackers being reported to LE or talked down — and these are documented — that would partly vindicate the policy. The honeypot mechanism is real; it's just not LW's current operating model.
  • If the AI x-risk movement produces a second prominent attempted attack within ~12 months and the attacker has a documented LW-comment trail of violence advocacy that was downvoted but not removed, that is a falsification event for the LW policy as currently constituted.

Caveats

  1. Causal inference is genuinely hard. Attack base rates in small intellectual communities are very low; the counterfactual cannot be run; selection effects in who joins which forum are severe.
  2. The reference classes are imperfect. LW is much smaller and more cognitively homogeneous than 8chan or Iron March, and far less explicitly violence-aligned than the far-right forums. The closest analogs (SHAC's website, the Army of God's discussion-permissive subculture, the more discussion-oriented Terrorgram channels) are still more violence-positive than LW.
  3. The "counter-argument deradicalizes" claim is not absurd in principle. Garland et al. (2022) and Gennaro et al. (Scientific Reports 2025) provide empirical signal that organized counter-speech reduces aggregate hate-speech volume. The claim that it deradicalizes would-be attackers specifically is the part that has no documented support.
  4. Habryka may be partially correct on adjacent claims. Suppressing all discussion of when violence might be ethical (philosophical tyrannicide, just-war theory, hypothetical-future scenarios) would be a real epistemic cost, and one LW is right to resist. The empirical question is whether specific calls for assassination of named targets fall inside the protected zone — the literature, US criminal law, and essentially every comparable platform's lawyers say no.
  5. The Moreno-Gama case is genuinely ambiguous, exactly as you said. A single case cannot distinguish "suppression caused him to act alone instead of being deradicalized," "suppression prevented him from finding collaborators for something worse," and "no causal effect either way."
  6. Where the evidence is one-sided I have said so; where it is genuinely ambiguous I have flagged it. The summary: the reference classes do not support LessWrong's permissive policy on calls for violence against named targets, but they also do not support a blanket ban on philosophical discussion of violence in the abstract. The policy LW should be defending is narrower than the one Habryka is currently defending — and that is the strongest, most defensible form of the argument you are making.
Content is user-generated and unverified.
    Permissive Moderation & Real-World Violence: Evidence Review | Claude