| Effect | Classic cite | Verdict (2015–2025) | Best recent evidence |
|---|---|---|---|
| Door-in-the-face | Cialdini 1975 | ROBUST but modest | Genschow et al. 2021 direct replication; Feeley et al. 2012 meta (r = .126 verbal) |
| Foot-in-the-door | Freedman & Fraser 1966 | ROBUST but small | Pascual & Guéguen 2005 meta; r ≈ .09–.17 |
| Anchoring (estimation) | Tversky & Kahneman 1974 | VERY ROBUST | Many Labs 1 (2014); Schley & Weingarten 2025 d = 0.82 |
| Anchoring (price/valuation) | Ariely et al. 2003 (SSN) | MIXED / CONTESTED | Fudenberg et al. 2012 & Maniadis et al. 2014 weak/failed; some succeed |
| Decoy / asymmetric dominance | Huber et al. 1982; Ariely | WEAK / context-bound | Frederick et al. 2014 & Yang & Lynn 2014 failures; Devine et al. 2025 ~1% |
| Choice overload (jam study) | Iyengar & Lepper 2000 | WEAK / FAILED as a main effect | Scheibehenne et al. 2010 ≈ 0; Chernev et al. 2015 moderators |
| Price presentation order | Suk, Lee & Lichtenstein 2012 | THIN evidence | Original only; no large independent replication found |
| Endowment effect | Thaler 1980; Kahneman et al. 1990 | ROBUST but bounded | Tunçel & Hammitt 2014 meta, WTA/WTP = 3.28 |
Verdict: ROBUST, but the real-world effect is modest. This is the strongest "good news" story for defending a classic citation. Genschow, Westfal, Crusius et al. (2021), in the Journal of Personality and Social Psychology (vol. 120, no. 2, pp. e1–e7), ran a direct replication of Cialdini's 1975 original with 391 participants (≈5× the original sample). Asking passersby to chaperone juvenile delinquents on a zoo trip: 34% complied in the small-request-only condition vs. 51% in the large-then-small (DITF) condition — nearly identical to Cialdini's original rates. They concluded: "at least some social psychological findings can transcend a particular time, place, and population."
The meta-analytic picture is more sober. Feeley, Anker & Aloe (2012, Communication Monographs, 79(3), 316–343) meta-analyzed 117 studies (1975–2010): "an overall significant effect of the DITF strategy on verbal compliance (k = 78, r = .126), but an insignificant effect for behavioral compliance (k = 39, r = .052)" (95% CIs: verbal .08–.17; behavioral −.02–.12). Translation: DITF reliably gets people to say yes but is far weaker at producing actual money/behavior. It works better for prosocial requests, larger concessions, lower baseline compliance, and a personal connection. It has worked in retail field experiments (Ebster & Neumayr 2008, alpine cheese-selling, 375 consumers) and voter-turnout campaigns.
Verdict: ROBUST but small. Successive meta-analyses (Beaman et al. 1983; Dillard et al. 1984; Pascual & Guéguen 2005) consistently find a real but small effect, around r = .09–.17 (Dillard et al. put both FITD and DITF at r ≈ .15–.17 even under optimal conditions). The effect is condition-dependent: it works best when the initial request is non-trivial (enough to shift self-perception), is performed actively, and carries no large external incentive. Online/e-commerce evidence exists: Guéguen's field experiments showed email/online FITD works, and a computer-mediated field study (n = 900 sports-store customers) found a two-step FITD condition produced more new customers than control. Caution: a 2024 nonprofit volunteer field experiment (500+ participants) found neither FITD nor gift-exchange beat a control with a compelling mission — FITD is not automatic.
(a) Classic numeric / estimation anchoring: VERY ROBUST. Arguably the best-replicated effect in social psychology. In Many Labs 1 (Klein et al. 2014, ~36 labs, ~5,000–6,000 participants), four anchoring tasks were replicated and were among the largest effects in the entire project — four of the five effects with Cohen's d > 1.0 were anchoring variants (item-level point-biserial r = .64–.91). The 2025 meta-analysis "50 Years of Anchoring" (Schley & Weingarten, SSRN) synthesized 2,603 effect sizes (1,283 directly comparing high vs. low anchors) and found a large effect, d = 0.824, 95% CI [0.765, 0.883] (I² = 93.6%), with "only a small reduction from publication-bias corrections." Röseler & Schütz's large meta-analysis (2022, "Open Anchoring Quest" dataset: 96 studies, N = 21,359, 88,914 trials) found no evidence of publication bias and no difference between published and unpublished effects — rare and reassuring.
Two caveats: (i) "incidental"/subliminal anchoring (random numbers from the environment) is fragile — the Critcher & Gilovich incidental-anchor item failed to replicate, and Many Labs 2 (2018) found a near-zero effect (d ≈ 0.04); (ii) Röseler et al. (2024, Meta-Psychology, >50,000 estimates) showed that individual susceptibility to anchoring cannot be measured reliably — anchoring is a robust situational effect, not a stable personality trait.
(b) Price/valuation anchoring — Ariely, Loewenstein & Prelec (2003) "coherent arbitrariness" / Social-Security-Number study: MIXED & CONTESTED. This is the citation the user should be most careful with. Fudenberg, Levine & Maniadis (2012, AEJ: Microeconomics) re-ran the Ariely manipulation and found "much weaker anchoring effects" on product valuations and "no anchoring effects" on lotteries. Maniadis, Tufano & List (2014, AER) re-ran Study 2 (aversive sounds) and reported a "failure to replicate" the strong effect — though Simonsohn (Data Colada) argued their data were actually statistically consistent with the original and merely underpowered. A 2019 Judgment and Decision Making paper concluded WTP anchoring is real but typically smaller than the original. Bottom line: the SSN/willingness-to-pay demonstration is not a settled result — citing it as proof is risky, even though general anchoring is rock-solid.
Verdict: WEAK / strongly context-bound. A genuine cautionary tale. Frederick, Lee & Baskin (2014, Journal of Marketing Research, "The Limits of Attraction") and Yang & Lynn (2014, JMR) ran many well-powered replications and largely failed to reproduce the attraction/decoy effect. Yang & Lynn reported only ~11 reliable effects out of 91 attempts ("significantly fewer than expected"). Their conclusion: the decoy effect is largely an artifact of "stylized" stimuli (two numeric attributes, text descriptions) and mostly vanishes with realistic, pictorial, multi-attribute products. Defenders (Huber, Payne & Puto 2014; Simonson 2014) replied that it holds when conditions are properly replicated, and some real-product follow-ups recover it (Lichters et al. 2017). The most ecologically valid recent test — Devine, Goulding, Harvey, Skatova & Otto (2025, npj Science of Learning, 10:60), analyzing 3.6 million UK grocery wine transactions — found the decoy effect does occur in the wild, but: "The strength of these effects was modest overall (roughly 1% change in preference) and ... depended on consumers' idiosyncratic histories of experience." So: real, but small and fragile — nothing like the dramatic flips in Ariely's anecdote.
Verdict: FAILED as a universal main effect; survives only as a conditional, moderated effect. Scheibehenne, Greifeneder & Todd (2010, Journal of Consumer Research, 37(3), 409–425) meta-analyzed 63 conditions from 50 experiments (N = 5,036) and found "a mean effect size of virtually zero but considerable variance between studies" — the headline "more choice hurts" claim did not replicate as a general law, and "no sufficient conditions could be identified" for a main effect. Chernev, Böckenholt & Goodman (2015, Journal of Consumer Psychology) meta-analyzed 99 observations (N = 7,202) and reconciled the picture: choice overload is real but conditional, appearing when four moderators are present — high choice-set complexity, high decision-task difficulty, high preference uncertainty, and an effort-minimizing decision goal. Practical translation: piling on options does not reliably backfire; it backfires only under specific, identifiable conditions. The question shifted from whether to when.
Verdict: THIN / under-replicated. The original (Journal of Marketing Research, 49(5), 708–717) showed descending price order (high→low) shifts choices toward higher-priced options — including a real bar field study over 8 weeks that raised revenue per beer (about $0.24 more per beer sold). It is theoretically grounded in reference-dependence and price-quality inference, and has been cited and built upon, but I found no large independent direct replication of the high-to-low ordering effect. Adjacent, better-replicated work exists on related framing — e.g., Allard, Hardisty & Griffin (2019) differential price framing, conceptually replicated in a 2023 field study of 45,626 add-to-cart events (Köcher et al., Marketing Letters), which notably found the effect was "considerably less pronounced in actual purchase patterns" than in intentions. Treat price-order as plausible and theory-consistent, but not independently confirmed at scale.
Verdict: ROBUST but bounded. The WTA/WTP gap is one of the most replicated findings in behavioral economics. Tunçel & Hammitt (2014, Journal of Environmental Economics and Management, 68(1), 175–187) meta-analyzed the literature and found an overall WTA/WTP ratio of 3.28 (largest for public/environmental goods, ~6.2; smaller for ordinary private goods). Boundary conditions matter: the disparity is smaller for ordinary market goods, for experienced traders, with incentive-compatible elicitation, and Plott & Zeiler (2005) argued part of it is a procedural artifact (subject misconceptions). Related ownership/effort effects remain solid — a 2026 meta-analysis of the IKEA effect (k = 55, N = 5,454) reports d = 0.57. For sales, the "$1 trial / free trial → ownership feeling" logic rests on a real, well-replicated foundation — with the caveat that experienced buyers and clean market settings shrink it.
The Reproducibility Project: Psychology (Open Science Collaboration 2015, Science) replicated 100 studies: only 36% reached statistical significance (39% judged subjectively to have replicated), and replication effect sizes were about half the originals. Many Labs 1 (2014) replicated 10/13 effects; Many Labs 2 (2018, N = 15,305, 36 countries) replicated 14/28 (50%); Many Labs 3 replicated 3/10. The economics replication efforts were somewhat higher (Camerer et al. 2016 ≈ 61%; 2018 ≈ 62%). The pattern that matters for sales psychology: classic cognitive/judgment effects (anchoring) and sequential-request compliance effects (DITF, FITD) held up far better than "social priming" and many flashier social-psychology effects. The pricing/choice effects (decoy, choice overload, price-order, valuation anchoring) sit in the contested middle.
Tier 1 — cite confidently with recent evidence (defend against the skeptic):
Tier 2 — use, but state the conditions:
Tier 3 — flag as weak; don't overclaim:
Benchmarks that would change these calls: a large pre-registered field replication of price-order or decoy effects showing a >5% conversion lift would upgrade them to Tier 2; a failed high-powered replication of Genschow-style DITF would downgrade DITF. For the user's own funnels: run A/B tests — effect sizes of r ≈ .10–.15 (DITF/FITD) mean you need large samples to detect them reliably, and lab effect sizes routinely overstate commercial impact.