Content is user-generated and unverified.

Same brain, different bottlenecks: how modality shapes reading comprehension

Across three decades of neuroscience, cognitive psychology, and education research, a clear but nuanced consensus has emerged: the brain's high-level semantic machinery is largely blind to whether words arrive through the eyes or ears, yet the affordances of the medium itself—pacing, regression, haptic feedback, prosody, transience—systematically tilt comprehension outcomes. For skilled adult readers of narrative material, print, screen, and audio are roughly interchangeable. For expository, inferentially demanding, or time-pressured reading, print retains a small but robust advantage (Hedges' g ≈ −0.21 to −0.32 against screens; Clinton-Lisell 2022 g = 0.36 favoring reading for inferential comprehension over listening). The most important variables moderating modality effects are text genre (narrative vs. expository), reader skill (decoding automaticity), task demands (time pressure, inferential depth), and self-pacing. This review traces the empirical case for modality neutrality, then the evidence that medium nonetheless matters—first for print-vs-digital, then for audio-vs-text—before examining cognitive-load and prosody mechanisms that explain the pattern.

PART 1: Evidence for modality neutrality (the "same brain" argument)

1.1 Shared neural substrates for semantic comprehension

The strongest neuroscientific case for modality-invariance comes from Deniz, Nunez-Elizalde, Huth, and Gallant (2019), who used voxelwise encoding models on fMRI data from 9 participants who both listened to and read (via rapid serial visual presentation) over two hours of narrative stories from The Moth Radio Hour. Using a 985-parameter semantic feature space built from Wikipedia, Project Gutenberg, and Reddit corpora, semantic encoding models trained on one modality successfully predicted voxel responses in the other modality across most of the cerebral cortex. Semantic tuning maps were, in the authors' words, "almost identical" across listening and reading; modality-specific activation was confined to early visual cortex (for reading) and early auditory cortex (for listening). Their conclusion was unambiguous: "The representation of language semantics is independent of the sensory modality through which the semantic information is received."

This finding was anticipated by Regev, Honey, Simony, and Hasson (2013) in the Journal of Neuroscience, whose inter-subject correlation analysis during a 7-minute narrative showed "remarkably similar" response time courses in the superior temporal gyrus, inferior frontal gyrus, bilateral precuneus, medial prefrontal cortex, and angular gyrus regardless of whether input was spoken or written. Modality selectivity emerged only in peripheral sensory and higher-order parietal/frontal regions.

The activation-likelihood meta-analysis by Binder, Desai, Graves, and Conant (2009) pooled 120 neuroimaging studies (1,642 participants, 1,145 activation foci) and identified a left-lateralized seven-region semantic network—angular gyrus, middle temporal gyrus, fusiform/parahippocampal gyri, dorsomedial and ventromedial PFC, inferior frontal gyrus, and posterior cingulate—that activates for semantic processing regardless of input modality and overlaps substantially with the default mode network. Follow-up work from the Gallant lab (Popham et al., 2021, Nature Neuroscience) mapped the alignment of visual and linguistic semantic representations at the boundary of visual cortex, and Deniz et al. (2023) demonstrated that the semantic system becomes maximally visible only with narrative or sentence context, suggesting previous isolated-word studies underestimated its extent. Earlier work by Jobard, Vigneau, Mazoyer, and Tzourio-Mazoyer (2007) in NeuroImage and Lindenberg and Scheef (2007) in Neuropsychologia had already localized a shared supramodal comprehension network in the left temporal lobe. In children, Berl et al. (2010) identified an analogous "comprehension cortex" along the left superior temporal sulcus (BA 21/22).

(Note: the query referenced "Jessica Gaby" and "Stephen Willard" as researchers in this space; no publications by these exact names were located in the modality-invariance literature. They may have been confused with other researchers or misremembered.)

1.2 Automaticity of decoding and the Simple View of Reading

The neural convergence is paralleled behaviorally by the Simple View of Reading (SVR). Gough and Tunmer (1986) proposed that reading comprehension equals the product of decoding and linguistic comprehension (R = D × C), framing reading as listening comprehension plus decoding. Hoover and Gough's (1990) longitudinal work on bilingual children in Grades 1–4 confirmed three predictions: the linear and product combination of decoding and listening comprehension explained substantial variance in reading comprehension, and the relative weight of decoding shrinks with age as listening comprehension's weight grows. Foorman et al. (2015) reported that decoding and language comprehension together explain 94–98% of reading comprehension variance in early primary school, and Lervåg, Hulme, and Melby-Lervåg's (2018) five-year longitudinal study of 198 children (N = 198) replicated this at 96% variance explained.

Meta-analytic work cements the developmental shift. Florit and Cain's (2011) meta-analysis in Educational Psychology Review showed that in transparent orthographies (Italian, Finnish, Greek), listening comprehension becomes the stronger predictor of reading comprehension even for beginners, because decoding is acquired quickly and uniformly. García and Cain's (2014) meta-analysis of 110 studies found an average correlation of r = .74 between decoding and reading comprehension in English, with listening comprehension moderating this relationship (R² = .40): the better a reader's oral language, the less decoding skill constrains comprehension.

The most direct modality comparison comes from Clinton-Lisell's (2022) meta-analysis in the Review of Educational Research pooling 46 studies (N = 4,687). Overall, reading and listening comprehension do not differ reliably (g = 0.07, p = .23). But two moderators emerge sharply: reading shows an advantage when self-paced (g = 0.13, p = .049) and for inferential comprehension (g = 0.36, p = .02), while literal comprehension shows no modality difference (g = −0.01).

LaBerge and Samuels's (1974) classic automaticity framework, updated by Kuhn, Schwanenflugel, and Meisinger (2010) and by Reichle, Pollatsek, and Rayner's (2012) E-Z Reader computational model, explains why: once word recognition runs off automatically, attentional resources previously devoted to decoding are freed for comprehension, and parafoveal pre-processing delivers a 20–40% speed benefit that readers do not consciously notice (Schotter, Angele, and Rayner, 2012). Sticht and colleagues' (1974) classic auding-reading model placed the equalization age—when reading comprehension catches up to listening comprehension—at roughly Grade 7–8, and Gernand and Moser (2017) found in a sample of 938 students that mean reading scores now exceed listening scores by Grade 4, suggesting equalization has migrated earlier.

The neuroscience and psychometric evidence converge on a single claim: once decoding is fluent, comprehension is largely the same mental process regardless of input channel.

PART 2: Evidence for modality dependence (the "medium matters" argument)

2.1 Print vs. digital: a small but stubborn screen inferiority

The flagship meta-analysis by Delgado, Vargas, Ackerman, and Salmerón (2018), "Don't throw away your printed books," pooled 54 studies published 2000–2017 covering 171,055 participants. Paper yielded a small but reliable advantage over screens (Hedges' g = −0.21), an effect comparable to roughly two-thirds of a typical year's reading-comprehension growth in elementary school. Critically, three moderators emerged: (a) screen inferiority was greater under time pressure than self-pacing; (b) the effect appeared for expository and mixed texts but not for purely narrative texts; and (c) the gap widened rather than narrowed from 2000 to 2017, directly contradicting the "digital natives" hypothesis.

Clinton's (2019) independent meta-analysis of 33 randomized studies (N = 2,799) in the Journal of Research in Reading replicated this pattern with g = −0.25 overall, g = −0.32 for expository texts, and g = −0.04 (null) for narrative texts. Reading time did not differ (g = 0.08), so paper's advantage comes without a time cost. Crucially, calibration—readers' ability to accurately predict their own test performance—was better on paper (g = 0.20), indicating that screen reading induces systematic overconfidence. Kong, Seo, and Zhai (2018) and Schwabe et al. (2022) independently confirmed the pattern, with Schwabe's narrative-specific meta-analysis of 32 studies finding no reliable medium effect for fiction.

Anne Mangen's research has supplied converging experimental evidence and the leading mechanistic account. Mangen, Walgermo, and Brønnick (2013) randomly assigned 72 Norwegian 10th-graders to read 1,400–2,000-word linear texts on paper or computer screen; paper readers scored significantly higher on comprehension even after controlling for decoding and vocabulary. Mangen and Kuiken's (2014) "Lost in an iPad" study found that iPad readers reported dislocation within the text and awkwardness handling the medium independent of prior e-reading experience, along with reduced narrative coherence and transportation. Mangen, Olivier, and Velay (2019) showed that Kindle readers performed worse on locating events in time and space within a long narrative—supporting the "haptic dissonance" hypothesis that the fixed spatial-kinesthetic layout of a physical book provides location cues that help readers build coherent situation models. Støle, Mangen, and Schwippert (2020) extended the print advantage to 1,139 Norwegian 10-year-olds.

The mechanistic work of Rakefet Ackerman's lab pinpoints metacognition, not raw encoding, as the key failure mode. Ackerman and Goldsmith (2011) found that under fixed study time, comprehension scores were equivalent across media, but screen readers showed ~10 percentage points of overconfidence in predicted performance; when study time was self-regulated, screen readers performed 9 percentage points worse (63.2% vs. 72.3%) despite investing extra time, because their monitoring signals were biased. Ackerman and Lauterman (2012) and Sidi et al. (2017) showed that screen inferiority appears specifically under time pressure or low-importance framing—contextual cues that signal shallow processing is acceptable—and disappears when participants are forced into equivalent depth. Lauterman and Ackerman (2014) demonstrated that post-reading keyword generation eliminates screen inferiority, supporting the interpretation that screens prime shallow processing habits learned from social media and texting—the "shallowing hypothesis" articulated by Nicholas Carr and empirically tested by Annisette and Lafreniere (2017), who found in 149 undergraduates that texting frequency and social-media use negatively correlated with need for cognition and reflective thought independent of Big Five traits.

Scrolling imposes its own cost. Sanchez and Wiley (2009) showed that paginated formats produced significantly better essay comprehension than scrolled formats (p < .01), and the deficit was magnified for readers lower in working-memory capacity. Krenca, Taylor, and Deacon (2024) recently replicated the scrolling penalty in Grades 3–5, with combined scrolling plus hyperlinks producing the steepest decrements. Altamura, Vargas, and Salmerón (2025)—a meta-analysis of 40 effect sizes totaling N = 469,564—found that habitual leisure digital reading correlates only weakly with comprehension (r = .055), compared with medium-sized correlations for print reading habits.

The print-vs-digital literature supports a qualified conclusion: screens are not bad for reading, but they evoke a shallower processing stance that compounds under time pressure, expository content, and lengthy or scrolled texts.

2.2 Audio vs. text: equivalence for narrative, disadvantage for density

The most-cited direct comparison is Rogowsky, Calhoun, and Tallal (2016) in SAGE Open, who randomly assigned 91 college-educated adults to consume sections of Laura Hillenbrand's Unbroken as digital audiobook, e-text, or simultaneous dual modality. No significant differences emerged in comprehension at immediate test or after a 2-week delay (means 25.5, 27.5, 25.5 out of 38). Rogowsky, Calhoun, and Tallal (2020) replicated the null modality effect in 5th graders—though listening comprehension significantly exceeded reading comprehension at this still-decoding-constrained age.

Daniel Willingham's analysis, articulated in The Reading Mind (2017) and his 2018 New York Times essay "Is Listening to a Book the Same Thing as Reading It?", aligns with this evidence: once decoding is automatic, the same comprehension processes apply, and audiobook listening is substantively equivalent to reading for narrative material. He identifies three conditions, however, that shift the balance. First, skilled narrators aid comprehension of syntactically complex prose (Shakespeare being his example) via prosodic disambiguation. Second, print enables regressions that audio effectively denies, hurting comprehension of dense expository material. Third, audio is typically consumed while multitasking, inflating mind-wandering and memory losses.

Each claim has strong empirical grounding. On mind-wandering, Varao Sousa, Carriere, and Smilek (2013) showed within-subjects that listening produced the most mind-wandering, the worst memory performance, and the lowest rated interest among silent reading, reading aloud, and listening conditions. Varao-Sousa, Smilek, and Kingstone (2018) found that memory for audiobook content was significantly worse in naturalistic "wild" settings than in the lab, and the meta-analytic correlation of mind-wandering with reading comprehension across studies is approximately r = −0.21 (Feng et al., 2023). On regressions, the Schotter, Tran, and Rayner (2014) Psychological Science experiment is decisive: when a trailing-mask technique eliminated the ability to regress, comprehension dropped from ~84% to near-chance (~50–56%), including for unambiguous sentences—demonstrating that the 10–15% of saccades that are regressive (Rayner, 1998) do real comprehension work. Audio listeners have no equivalent repair mechanism.

On expository content specifically, Daniel and Woody (2010) in Teaching of Psychology randomly assigned 185 undergraduates to read a primary-source article or listen to the identical content as a podcast; podcast listeners scored ~28 percentage points lower on the quiz (roughly d ≈ 0.75). Furnham, Gunter, and Green (1990) found print superior to audio and audio-visual presentations for recall of scientific material, and Furnham (2001) replicated the print advantage for fiction in adults, though Furnham, De Siena, and Gunter (2002) found the opposite pattern in 11- and 13-year-old children—a crossover consistent with decoding automaticity. Diakidoy, Stylianou, Karefillidou, and Papageorgiou (2005) showed in 612 Greek students across Grades 2, 4, 6, and 8 that the listening-reading gap closes with age, and by Grade 8 reading exceeds listening regardless of text type, with the weakening listening advantage especially pronounced for expository material. Rubin, Hafer, and Arata (2000) identified a classic genre × modality interaction: literate-style discourse (dense, embedded clauses) favors reading, while oral-style discourse favors listening. The Clinton-Lisell (2022) meta-analysis quantifies the pattern: the reading advantage reaches g = 0.36 for inferential comprehension, vanishes for literal recall, and depends on self-pacing.

Podcast-learning results diverge by stimulus quality. Tang et al. (2022) in Cureus found that professionally produced podcasts yielded equal or better learning gains than textbook reading in 61 medical trainees, reversing the Daniel and Woody finding—likely because high-production-value podcasts include deliberate prosodic emphasis and redundancy absent from audio versions of written articles. The Singh and Alexander (2022) systematic review in Educational Psychology Review concluded that for younger and struggling readers, audiobook comprehension equals or exceeds print (effect sizes g = 0.28 to 0.58), and that audiobook-plus-print co-presentation dramatically outperforms print alone for struggling readers and EFL learners (g = 0.32 to 1.67).

The audio-vs-text literature converges on the same moderators as print-vs-digital: narrative content, self-pacing equivalence, and reader skill dissolve differences; expository content, inferential demands, and multitasking expose them.

2.3 Cognitive load, prosody, and working memory

Mechanistic work explains why audio sometimes helps and often hurts. Prosody—the pitch, rhythm, and stress contours of speech—disambiguates syntax and scaffolds comprehension, and skilled silent readers generate analogous implicit prosody. Breen (2014) reviewed evidence that readers covertly represent intonation, phrasing, and stress during silent reading, and Ashby and Clifton (2005) and Breen and Clifton (2011, 2013) showed using eye-tracking and self-paced reading that lexical-stress expectations measurably affect reading times. Breen, Fitzroy, and Oraa Ali (2019) recorded ERPs during silent reading of stress-alternating noun-verb homographs and found that mismatched stress elicited early negativities (80–155 ms, 325–375 ms) and later positivities (365–435 ms) paralleling the ERP signatures of explicit prosodic violations in listening. Steinhauer, Alter, and Friederici's (1999) Closure Positive Shift (CPS)—evoked by prosodic boundaries in speech—has even been observed in silent reading when commas signal phrasing (Drury et al., 2016). Miller and Schwanenflugel (2008) found in a longitudinal sample of 92 children that adult-like Grade 1 prosody predicted Grade 2 reading comprehension, and Breen et al. (2016) linked imitated prosodic fluency to silent reading comprehension in high school poor comprehenders. This mechanism explains why skilled narration can enhance comprehension of syntactically complex narrative and why implicit prosody preserves comprehension during silent reading.

Dual-coding theory (Paivio, 1971, 1986, 2007; Sadoski & Paivio, 2013) proposes that concrete material is represented in both verbal (logogens) and imagistic (imagens) subsystems, yielding additive memory benefits. Clark and Paivio (1991) in Educational Psychology Review mapped DCT to instruction, arguing that combining modalities engages both subsystems. Richard Mayer's Cognitive Theory of Multimedia Learning builds on Paivio and Baddeley with three core assumptions (dual-channel, limited capacity, active processing) and 12 evidence-based principles. The modality principle—that narration plus graphics beats on-screen text plus graphics—was quantified by Ginns's (2005) meta-analysis at a median d ≈ 0.72–1.02 across ~40 studies. The redundancy principle—that adding identical on-screen text to narrated graphics hurts learning (Mayer, Heiser, and Lonn, 2001; d ≈ 0.79)—shows that more is not always better.

Baddeley's working memory model (Baddeley & Hitch, 1974; Baddeley, 2000, 2012) underlies these effects. The phonological loop handles verbal material, with auditory input entering automatically while visual-verbal input (text) must be subvocally recoded via articulation. Baddeley, Thomson, and Buchanan (1975) demonstrated that the word-length effect disappears for visual material under articulatory suppression, confirming that silent reading engages phonological representations. Kintsch and van Dijk's (1978) construction-integration model and Gernsbacher's (1990) Structure Building Framework argue that the core comprehension operations (laying foundations, mapping, shifting, suppression) are modality-general—though Wolf et al. (2019) in Reading and Writing found that reading and listening share only part of their variance, leaving substantial modality-specific components.

The central cognitive-load explanation for when audio hurts is the transient information effect within Sweller's Cognitive Load Theory. Mousavi, Low, and Sweller (1995) showed the classic modality effect (audio+visual beats visual-only geometry examples), but Leahy and Sweller (2011, 2016) demonstrated that with long, complex spoken material the modality effect reverses, because audio information disappears and cannot be reviewed, exceeding working memory capacity. Wong, Leahy, Marcus, and Sweller (2012) generalized the transient effect to animations. Tabbers, Martens, and van Merriënboer (2004) showed that learner-controlled pacing abolishes or reverses the audio advantage; Ginns (2005) confirmed the modality effect is larger under system-controlled pacing. This is the theoretical grounding for Willingham's regression argument: reading's permanence is an offload for limited working memory.

The absence of visual cues also hurts comprehension of material dense in unfamiliar names, technical terms, spelling distinctions, and formulae—a likely contributor to Daniel and Woody's 28-point podcast decrement. For L2 listeners, van Zeeland and Schmitt (2013) showed that listening requires ~95% lexical coverage to reach adequate comprehension (vs. ~98% typically cited for reading, but with far more variability at 90%), and Lund (1991) found L2 learners produced substantially more correct propositions after reading than after listening to the same passage. For readers with dyslexia, Milani, Lorusso, and Molteni's (2010) 5-month randomized study in 40 adolescents found audiobook access improved reading accuracy, school performance, and psychosocial adjustment, and the Wood, Moxley, Tighe, and Wagner (2018) meta-analysis of 22 studies estimated a text-to-speech benefit of g ≈ 0.35 for students with reading disabilities, larger for the most severe decoders. ADHD-specific RCT evidence on audiobooks, however, remains thin and largely observational.

A synthesis: what the evidence actually supports

The field's current consensus is best summarized as "central equivalence, peripheral divergence." The neural machinery of semantic comprehension is essentially modality-invariant (Deniz et al., 2019; Binder et al., 2009), and once decoding is automatic the behavioral reading-listening correlation grows until the two are nearly interchangeable (Clinton-Lisell, 2022; Florit & Cain, 2011). But every medium imposes a distinct affordance profile that interacts with text genre, reader skill, and task demands in predictable ways.

Text type matters enormously. For narrative fiction, the three modalities are functionally equivalent in skilled adults—print, screen, and audiobook meta-analyses all converge on null or trivial differences (Clinton, 2019, g = −0.04 narrative screen effect; Schwabe et al., 2022, null narrative effect; Rogowsky et al., 2016). For expository, informational, or technical text, print maintains a small-to-moderate advantage over screens (g ≈ −0.25 to −0.32) and reading maintains a larger advantage over audio for inferential tasks (g = 0.36). The pattern points to a common mechanism: dense text benefits from the regression, self-pacing, and spatial anchoring that print and self-paced reading provide, and which continuous audio and scroll-heavy digital reading attenuate.

Reader characteristics matter. Children and struggling readers who are still bottlenecked by decoding often comprehend better from listening (Diakidoy et al., 2005; Furnham et al., 2002; Singh & Alexander, 2022) or from reading-while-listening (Chang, 2009; Clinton-Lisell, 2023). Dyslexic and L2 readers consistently benefit from audio support. Skilled adult readers of transparent orthographies show minimal modality effects. The age of "equalization" has apparently migrated earlier than Sticht's Grade 7–8 estimate (Gernand & Moser, 2017).

Task demands matter. Under time pressure, screen inferiority grows and metacognitive miscalibration exposes itself (Ackerman & Lauterman, 2012; Sidi et al., 2017; Delgado & Salmerón, 2021). Under self-paced study conditions, modality differences shrink. For gist and main-idea tasks, modalities converge; for integrative, inferential, and long-term retention tasks, print reading pulls ahead.

Purpose matters. Casual narrative reading is well served by any modality the reader finds enjoyable. Studying for retention of complex expository material favors print, self-paced reading, and active processing; audiobooks and digital texts are adequate supplements but should not be treated as substitutes without mitigation strategies like keyword generation (Lauterman & Ackerman, 2014).

Open questions and remaining debates include: (1) whether "digital native" exposure will eventually shrink screen inferiority—Delgado et al.'s (2018) widening gap and Halamish and Elbaz's (2020) finding that children are unaware of the effect even after experiencing it suggest not; (2) whether skilled professional narration can eliminate audio disadvantages for expository content, as Tang et al.'s (2022) podcast-favorable medical trainee result hints; (3) the mechanistic weight of the four candidate explanations for screen inferiority—shallowing habits, metacognitive miscalibration, scrolling-induced working-memory load, and disrupted haptic/spatial cognition—which are not mutually exclusive; (4) whether AI-generated or personalized audio narration changes the calculus; and (5) whether reading-while-listening produces real comprehension gains for proficient readers (Vu et al., 2026, recently found reading-only > read-while-listen > listen-only in intermediate-advanced L2 English learners, contradicting earlier claims).

The practical bottom line for readers, educators, and learners is neither the technophobic position that print is uniquely legitimate nor the modality-agnostic position that it does not matter. It is this: choose the medium whose affordances match the material and purpose—audio and digital for narrative enjoyment, accessibility, and multitasking consumption; print and self-paced reading for expository study, dense inference, and long-term retention; and for struggling or developing readers, audio or dual-modality as a scaffold while decoding matures.

References

Ackerman, R., & Goldsmith, M. (2011). Metacognitive regulation of text learning: On screen versus on paper. Journal of Experimental Psychology: Applied, 17(1), 18–32.

Ackerman, R., & Lauterman, T. (2012). Taking reading comprehension exams on screen or on paper? A metacognitive analysis of learning texts under time pressure. Computers in Human Behavior, 28(5), 1816–1828.

Altamura, L., Vargas, C., & Salmerón, L. (2025). Do new forms of reading pay off? A meta-analysis on the relationship between leisure digital reading habits and text comprehension. Review of Educational Research.

Annisette, L. E., & Lafreniere, K. D. (2017). Social media, texting, and personality: A test of the shallowing hypothesis. Personality and Individual Differences, 115, 154–158.

Ashby, J., & Clifton, C., Jr. (2005). The prosodic property of lexical stress affects eye movements during silent reading. Cognition, 96(3), B89–B100.

Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417–423.

Baddeley, A. D. (2012). Working memory: Theories, models, and controversies. Annual Review of Psychology, 63, 1–29.

Baddeley, A. D., Gathercole, S. E., & Papagno, C. (1998). The phonological loop as a language learning device. Psychological Review, 105(1), 158–173.

Baddeley, A. D., Thomson, N., & Buchanan, M. (1975). Word length and the structure of short-term memory. Journal of Verbal Learning and Verbal Behavior, 14(6), 575–589.

Berl, M. M., Duke, E. S., Mayo, J., Rosenberger, L. R., Moore, E. N., VanMeter, J., et al. (2010). Functional anatomy of listening and reading comprehension during development. Brain and Language, 114(2), 115–125.

Binder, J. R., Desai, R. H., Graves, W. W., & Conant, L. L. (2009). Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cerebral Cortex, 19(12), 2767–2796.

Breen, M. (2014). Empirical investigations of the role of implicit prosody in sentence processing. Language and Linguistics Compass, 8(2), 37–50.

Breen, M., & Clifton, C., Jr. (2011). Stress matters: Effects of anticipated lexical stress on silent reading. Journal of Memory and Language, 64(2), 153–170.

Breen, M., Fitzroy, A. B., & Oraa Ali, M. (2019). Event-related potential evidence of implicit metric structure during silent reading. Brain Sciences, 9(8), 192.

Breen, M., Kaswer, L., Van Dyke, J. A., Krivokapić, J., & Landi, N. (2016). Imitated prosodic fluency predicts reading comprehension ability in good and poor high school readers. Frontiers in Psychology, 7, 1026.

Clark, J. M., & Paivio, A. (1991). Dual coding theory and education. Educational Psychology Review, 3(3), 149–210.

Clinton, V. (2019). Reading from paper compared to screens: A systematic review and meta-analysis. Journal of Research in Reading, 42(2), 288–325.

Clinton-Lisell, V. (2022). Listening ears or reading eyes: A meta-analysis of reading and listening comprehension comparisons. Review of Educational Research, 92(4), 543–582.

Clinton-Lisell, V. (2023). Does reading while listening to text improve comprehension compared to reading only? Educational Research: Theory & Practice, 34(3).

Daniel, D. B., & Woody, W. D. (2010). They hear, but do not listen: Retention for podcasted material in a classroom context. Teaching of Psychology, 37(3), 199–203.

Delgado, P., & Salmerón, L. (2021). The inattentive on-screen reading: Reading medium affects attention and reading comprehension under time pressure. Learning and Instruction, 71, 101396.

Delgado, P., Vargas, C., Ackerman, R., & Salmerón, L. (2018). Don't throw away your printed books: A meta-analysis on the effects of reading media on reading comprehension. Educational Research Review, 25, 23–38.

Deniz, F., Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L. (2019). The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39), 7722–7736.

Deniz, F., Tseng, C., Wehbe, L., Dupré la Tour, T., & Gallant, J. L. (2023). Semantic representations during language comprehension are affected by context. Journal of Neuroscience, 43(17), 3144–3158.

Diakidoy, I.-A. N., Stylianou, P., Karefillidou, C., & Papageorgiou, P. (2005). The relationship between listening and reading comprehension of different types of text at increasing grade levels. Reading Psychology, 26(1), 55–80.

Drury, J. E., Baum, S. R., Valeriote, H., & Steinhauer, K. (2016). Punctuation and implicit prosody in silent reading: An ERP study investigating English garden-path sentences. Frontiers in Psychology, 7, 1375.

Florit, E., & Cain, K. (2011). The simple view of reading: Is it valid for different types of alphabetic orthographies? Educational Psychology Review, 23(4), 553–576.

Foorman, B. R., Koon, S., Petscher, Y., Mitchell, A., & Truckenmiller, A. (2015). Examining general and specific factors in the dimensionality of oral language and reading in 4th–10th grades. Journal of Educational Psychology, 107(3), 884–899.

Furnham, A. (2001). Remembering stories as a function of the medium of presentation. Psychological Reports, 89, 483–486.

Furnham, A., De Siena, S., & Gunter, B. (2002). Children's and adults' recall of children's news stories in both print and audio-visual presentation modalities. Applied Cognitive Psychology, 16(6), 695–703.

Furnham, A., Gunter, B., & Green, A. (1990). Remembering science: The recall of factual information as a function of the presentation mode. Applied Cognitive Psychology, 4(3), 203–212.

García, J. R., & Cain, K. (2014). Decoding and reading comprehension: A meta-analysis to identify which reader and assessment characteristics influence the strength of the relationship in English. Review of Educational Research, 84(1), 74–111.

Gernand, K. R., & Moser, G. P. (2017). Revisiting Sticht: The changing nature of the relationship between listening comprehension and reading comprehension. Literacy Research and Instruction, 56(2), 110–124.

Gernsbacher, M. A. (1990). Language comprehension as structure building. Lawrence Erlbaum.

Gernsbacher, M. A., Varner, K. R., & Faust, M. E. (1990). Investigating differences in general comprehension skill. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(3), 430–445.

Ginns, P. (2005). Meta-analysis of the modality effect. Learning and Instruction, 15(4), 313–331.

Gough, P. B., & Tunmer, W. E. (1986). Decoding, reading, and reading disability. Remedial and Special Education, 7(1), 6–10.

Halamish, V., & Elbaz, E. (2020). Children's reading comprehension and metacomprehension on screen versus on paper. Computers & Education, 145, 103737.

Hoover, W. A., & Gough, P. B. (1990). The simple view of reading. Reading and Writing, 2(2), 127–160.

Jobard, G., Vigneau, M., Mazoyer, B., & Tzourio-Mazoyer, N. (2007). Impact of modality and linguistic complexity during reading and listening tasks. NeuroImage, 34(2), 784–800.

Kendeou, P., Bohn-Gettler, C., White, M. J., & van den Broek, P. (2008). Children's inference generation across different media. Journal of Research in Reading, 31(3), 259–272.

Kendeou, P., van den Broek, P., White, M. J., & Lynch, J. S. (2009). Predicting reading comprehension in early elementary school. Journal of Educational Psychology, 101(4), 765–778.

Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363–394.

Kong, Y., Seo, Y. S., & Zhai, L. (2018). Comparison of reading performance on screen and on paper: A meta-analysis. Computers & Education, 123, 138–149.

Krenca, K., Taylor, G. B., & Deacon, S. H. (2024). Scrolling and hyperlinks: The effects of two prevalent digital features on children's digital reading comprehension. Journal of Research in Reading, 47(3), 269–291.

Kuhn, M. R., Schwanenflugel, P. J., & Meisinger, E. B. (2010). Aligning theory and assessment of reading fluency: Automaticity, prosody, and definitions of fluency. Reading Research Quarterly, 45(2), 230–251.

LaBerge, D., & Samuels, S. J. (1974). Toward a theory of automatic information processing in reading. Cognitive Psychology, 6(2), 293–323.

Lauterman, T., & Ackerman, R. (2014). Overcoming screen inferiority in learning and calibration. Computers in Human Behavior, 35, 455–463.

Leahy, W., & Sweller, J. (2011). Cognitive load theory, modality of presentation and the transient information effect. Applied Cognitive Psychology, 25(6), 943–951.

Leahy, W., & Sweller, J. (2016). Cognitive load theory and the effects of transient information on the modality effect. Instructional Science, 44(1), 107–123.

Lervåg, A., Hulme, C., & Melby-Lervåg, M. (2018). Unpicking the developmental relationship between oral language skills and reading comprehension. Child Development, 89(5), 1821–1838.

Lindenberg, R., & Scheef, L. (2007). Supramodal language comprehension: Role of the left temporal lobe for listening and reading. Neuropsychologia, 45(10), 2407–2415.

Lund, R. J. (1991). A comparison of second language listening and reading comprehension. The Modern Language Journal, 75(2), 196–204.

Mangen, A. (2008). Hypertext fiction reading: Haptics and immersion. Journal of Research in Reading, 31(4), 404–419.

Mangen, A., & Kuiken, D. (2014). Lost in an iPad: Narrative engagement on paper and tablet. Scientific Study of Literature, 4(2), 150–177.

Mangen, A., Olivier, G., & Velay, J.-L. (2019). Comparing comprehension of a long text read in print book and on Kindle. Frontiers in Psychology, 10, 38.

Mangen, A., Walgermo, B. R., & Brønnick, K. (2013). Reading linear texts on paper versus computer screen. International Journal of Educational Research, 58, 61–68.

Mayer, R. E. (2014). The Cambridge handbook of multimedia learning (2nd ed.). Cambridge University Press.

Mayer, R. E., Heiser, J., & Lonn, S. (2001). Cognitive constraints on multimedia learning. Journal of Educational Psychology, 93(1), 187–198.

Milani, A., Lorusso, M. L., & Molteni, M. (2010). The effects of audiobooks on the psychosocial adjustment of pre-adolescents and adolescents with dyslexia. Dyslexia, 16(1), 87–97.

Miller, J., & Schwanenflugel, P. J. (2008). A longitudinal study of the development of reading prosody. Reading Research Quarterly, 43(4), 336–354.

Mousavi, S. Y., Low, R., & Sweller, J. (1995). Reducing cognitive load by mixing auditory and visual presentation modes. Journal of Educational Psychology, 87(2), 319–334.

Paivio, A. (1986). Mental representations: A dual coding approach. Oxford University Press.

Popham, S. F., Huth, A. G., Bilenko, N. Y., Deniz, F., Gao, J. S., Nunez-Elizalde, A. O., & Gallant, J. L. (2021). Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature Neuroscience, 24(11), 1628–1636.

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422.

Regev, M., Honey, C. J., Simony, E., & Hasson, U. (2013). Selective and invariant neural responses to spoken and written narratives. Journal of Neuroscience, 33(40), 15978–15988.

Reichle, E. D., Pollatsek, A., & Rayner, K. (2012). Using E-Z Reader to simulate eye movements in nonreading tasks. Psychological Review, 119(1), 155–185.

Rogowsky, B. A., Calhoun, B. M., & Tallal, P. (2016). Does modality matter? The effects of reading, listening, and dual modality on comprehension. SAGE Open, 6(3), 1–9.

Rogowsky, B. A., Calhoun, B. M., & Tallal, P. (2020). Providing instruction based on students' learning style preferences does not improve learning. Frontiers in Psychology, 11, 164.

Rubin, D. L., Hafer, T., & Arata, K. (2000). Reading and listening to oral-based versus literate-based discourse. Communication Education, 49(2), 121–133.

Sadoski, M., & Paivio, A. (2013). Imagery and text: A dual coding theory of reading and writing (2nd ed.). Routledge.

Sanchez, C. A., & Wiley, J. (2009). To scroll or not to scroll: Scrolling, working memory capacity, and comprehending complex texts. Human Factors, 51(5), 730–738.

Schotter, E. R., Angele, B., & Rayner, K. (2012). Parafoveal processing in reading. Attention, Perception, & Psychophysics, 74(1), 5–35.

Schotter, E. R., Tran, R., & Rayner, K. (2014). Don't believe what you read (only once): Comprehension is supported by regressions during reading. Psychological Science, 25(6), 1218–1226.

Schreiber, P. A. (1991). Understanding prosody's role in reading acquisition. Theory Into Practice, 30(3), 158–164.

Schwabe, F., Lind, F., Kosch, L., & Boomgaarden, H. G. (2022). No negative effects of reading on screen on comprehension of narrative texts compared to print: A meta-analysis. Media Psychology.

Sidi, Y., Shpigelman, M., Zalmanov, H., & Ackerman, R. (2017). Understanding metacognitive inferiority on screen by exposing cues for depth of processing. Learning and Instruction, 51, 61–73.

Singer, L. M., & Alexander, P. A. (2017). Reading on paper and digitally: What the past decades of empirical research reveal. Review of Educational Research, 87(6), 1007–1041.

Singer Trakhman, L. M., Alexander, P. A., & Berkowitz, L. E. (2019). Effects of processing time on comprehension and calibration in print and digital mediums. Journal of Experimental Education, 87(1), 101–115.

Singh, A., & Alexander, P. A. (2022). Audiobooks, print, and comprehension: What we know and what we need to know. Educational Psychology Review, 34(2), 677–715.

Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience, 2(2), 191–196.

Sticht, T. G., Beck, L. J., Hauke, R. N., Kleinman, G. M., & James, J. H. (1974). Auding and reading: A developmental model. Human Resources Research Organization.

Støle, H., Mangen, A., & Schwippert, K. (2020). Assessing children's reading comprehension on paper and screen: A mode-effect study. Computers & Education, 151, 103861.

Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer.

Tabbers, H. K., Martens, R. L., & van Merriënboer, J. J. G. (2004). Multimedia instructions and cognitive load theory: Effects of modality and cueing. British Journal of Educational Psychology, 74(1), 71–81.

Tang, C., Chen, S., Zhang, J., & Nanjappan, V. (2022). Multimodal evaluation of podcast learning, retention, and EEG-measured attention in medical trainees. Cureus, 14(11).

van Zeeland, H., & Schmitt, N. (2013). Lexical coverage in L1 and L2 listening comprehension. Applied Linguistics, 34(4), 457–479.

Varao Sousa, T. L., Carriere, J. S. A., & Smilek, D. (2013). The way we encounter reading material influences how frequently we mind wander. Frontiers in Psychology, 4, 892.

Varao-Sousa, T. L., Smilek, D., & Kingstone, A. (2018). In the lab and in the wild: How distraction and mind wandering affect attention and memory. Cognitive Research: Principles and Implications, 3, 42.

Willingham, D. T. (2017). The reading mind: A cognitive approach to understanding how the mind reads. Jossey-Bass.

Willingham, D. T. (2018, December 8). Is listening to a book the same thing as reading it? The New York Times.

Wolf, M. C., Muijselaar, M. M. L., Boonstra, A. M., & de Bree, E. H. (2019). The relationship between reading and listening comprehension: Shared and modality-specific components. Reading and Writing, 32(7), 1747–1767.

Wong, A., Leahy, W., Marcus, N., & Sweller, J. (2012). Cognitive load theory, the transient information effect and e-learning. Learning and Instruction, 22(6), 449–457.

Wood, S. G., Moxley, J. H., Tighe, E. L., & Wagner, R. K. (2018). Does use of text-to-speech and related read-aloud tools improve reading comprehension for students with reading disabilities? A meta-analysis. Journal of Learning Disabilities, 51(1), 73–84.

Content is user-generated and unverified.