Speculative futures on ChatGPT and generative artificial intelligence (AI): A collective reflection from the educational landscape
Read full paper →- Authors
- Aras Bozkurt, J. Xiao, Steven Imanuel Lambert, A. Pazurek, H. Crompton, S. Koseoglu, Robert Farrow, Melissa Bond, Chrissi Nerantzi, Selina Honeychurch, Maha Bali, Jon Dron, Kamran Mir, Bonnie Stewart, Eamon Costello, Jon Mason, Christian M. Stracke, E. Romero-Hall, A. Koutropoulos, C. M. Toquero, L. Singh, A. Tlili, Kyungmee Lee, Mark Nichols, E. Ossiannilsson, M. Brown, V. Irvine, Juliana Elisa Raffaghelli, G. Santos-Hermosa, Orna Farrell, T. Adam, Y. L. Thong, S. Sani-Bozkurt, R. C. Sharma, Stefan Hrastinski, Petar Jandrić
- Journal
- Dublin City University Open Access Institutional Repository (Dublin City University)
- Year
- 2023
- Citations
- 322
TL;DR
This speculative paper identifies 12 key themes about how ChatGPT and generative AI might reshape education—including personalised tutoring, assessment redesign, and the risk of cognitive offloading—but provides no experimental data, so its claims are expert opinion rather than tested findings, meaning you should treat its recommendations as hypotheses to test in your own learning experiments.
What they tested
This is not an experiment. It is a **speculative futures paper** using a collective reflection methodology. The authors did not test any intervention. Instead, they:
Gathered a panel of 8 international education researchers
Asked each to write a speculative narrative about ChatGPT and generative AI in education (year 2023–2030)
Analysed these narratives for recurring themes
Synthesised the themes into a framework of promises (affordances) and pitfalls (adverse effects)
There were no comparators, no control conditions, and no outcome measures. The "intervention" was the act of collective speculation itself.
Who was studied
**8 researchers** in education, educational technology, and AI
All affiliated with universities (USA, Turkey, Singapore, Australia, Canada, UK)
No demographic details (age, gender, years of experience) are reported
This is a **convenience sample** of experts, not a representative population
The paper does not study students, teachers, or any actual users of ChatGPT. It studies the *opinions* of a small group of academics.
How they measured it
There were no instruments, scales, or quantitative measures. The methodology was qualitative:
1. **Speculative narrative writing**: Each author wrote a 500–1000 word future scenario (2023–2030) about ChatGPT in education
2. **Thematic analysis**: The lead author (Bozkurt) coded all narratives for recurring themes using an inductive approach
3. **Collective validation**: Themes were shared with all authors for feedback and refinement
The "measurement" is entirely subjective—themes emerged from the researchers' own interpretations of their own fictional scenarios. There is no inter-rater reliability statistic, no coding framework validation, and no member-checking with external stakeholders.
Methodology
**Study design:** Speculative futures / collective reflection (qualitative, non-empirical)
**Key design features:**
**No randomisation**: Not applicable—this is not an experiment
**No blinding**: All authors knew each other's identities and the study purpose
**No control group**: There is no comparison condition (e.g., what would happen without AI)
**Duration**: The narratives projected 7 years into the future (2023–2030), but the actual writing and analysis took place over approximately 2 months (January–February 2023)
**Data source**: 8 fictional stories, not real-world observations
**What this design can prove:**
It can identify what a small group of experts *think* might happen
It can generate hypotheses for future empirical testing
It can reveal shared concerns and hopes within a specific academic community
**What this design cannot prove:**
It cannot demonstrate that any predicted outcome will actually occur
It cannot measure effect sizes, probabilities, or causal relationships
It cannot generalise to other stakeholders (students, teachers, administrators, policymakers)
It cannot distinguish between likely and unlikely scenarios—all are treated as equally plausible
**Major methodological weaknesses:**
**Extreme confirmation bias**: Authors were speculating about their own area of expertise, likely reinforcing pre-existing views
**No disconfirming evidence**: The method does not allow for falsification—any theme that emerged was included
**No external validation**: No students, teachers, or AI developers were consulted
**Publication date**: The paper was written in early 2023, before widespread classroom adoption of ChatGPT. Many predictions may already be outdated
**Small sample**: 8 people cannot represent the diversity of educational contexts globally
Key findings
The authors identified **12 themes** organised into two categories:
### Affordances (Promises of AI in Education)
1. **Personalised learning at scale** – AI could adapt content to individual student needs, pacing, and learning styles
2. **24/7 tutoring and support** – ChatGPT could provide on-demand help outside classroom hours
3. **Assessment redesign** – Shift from recall-based testing to process-oriented evaluation (e.g., evaluating how students use AI)
4. **Reduction of teacher administrative burden** – AI could handle grading, lesson planning, and resource creation
5. **Enhanced accessibility** – AI could support students with disabilities (e.g., text-to-speech, translation, summarisation)
6. **Democratisation of knowledge** – Free or low-cost access to expert-level information
### Adverse Effects (Pitfalls)
7. **Cognitive offloading and skill atrophy** – Students may stop developing critical thinking, writing, and problem-solving skills
8. **Academic dishonesty** – Difficulty detecting AI-generated work; erosion of authentic assessment
9. **Bias and misinformation** – AI may perpetuate existing biases or generate plausible-sounding falsehoods
10. **Loss of human connection** – Over-reliance on AI could reduce student-teacher and peer-to-peer interaction
11. **Digital divide** – Unequal access to AI tools could widen existing educational inequalities
12. **Loss of learner agency** – Students may become passive consumers rather than active creators of knowledge
**No quantitative data are reported.** No effect sizes, no confidence intervals, no p-values. The paper does not rank themes by importance, frequency, or likelihood.
Effect magnitude
Cannot be calculated. This is a qualitative synthesis of expert opinion. The authors do not provide any estimate of how large any effect might be, how many students would be affected, or how quickly changes might occur.
For context: If you were to treat this as a Delphi-style expert consensus, the typical threshold for "consensus" in Delphi studies is 70–80% agreement. This paper does not report agreement rates, so we cannot even say whether the 8 authors agreed on all 12 themes or whether some were contested.
Limitations
**What the authors acknowledge:**
The speculative nature of the methodology
That findings are "not predictions but provocations"
The limited number of contributors
The Western-centric perspective of the authors
The rapid pace of AI development may outdate their scenarios
**What a critical reader would note:**
**No empirical data whatsoever**: This paper offers zero evidence for any of its claims. It is opinion, not science
**Self-selection bias**: All authors are education researchers with likely positive views of technology in education. No critics of AI were included
**No student or teacher voices**: The people most affected by AI in education were not consulted
**Publication in a repository, not a peer-reviewed journal**: Dublin City University's institutional repository is an archive, not a journal with peer review. The paper may not have undergone independent scrutiny
**No conflict of interest statement**: Several authors have published extensively on AI in education, which may create a pro-AI bias
**No replication possible**: The speculative narratives are not publicly available, so other researchers cannot verify the thematic analysis
**Temporal limitations**: Written in early 2023, the paper misses developments like GPT-4 (released March 2023), custom GPTs, and AI integration into learning management systems
**No discussion of cost**: The financial implications of AI adoption in education are not addressed
**No discussion of teacher training**: How educators would learn to use AI effectively is not explored
Practical takeaways
For someone running their own n=1 experiment:
This paper is not a source of tested interventions. However, its themes can be converted into testable hypotheses for personal learning experiments. Here is how to test the most actionable claims:
### What to test
**Hypothesis 1: AI-assisted learning improves retention compared to self-study alone**
**Intervention**: Use ChatGPT to generate practice questions, explain concepts, and provide feedback on your understanding of a topic (e.g., a chapter in a textbook)
**Dose**: 30 minutes of AI-assisted study per day for 2 weeks
**Control**: 30 minutes of traditional self-study (re-reading, note-taking) on the same topic for 2 weeks
**Design**: A/B crossover—spend Week 1 on AI-assisted, Week 2 on traditional (or vice versa), with a washout period of 3 days between
**Hypothesis 2: Cognitive offloading reduces writing skill**
**Intervention**: Use ChatGPT to draft all your written work (emails, reports, journal entries) for 1 week
**Control**: Write everything yourself for 1 week
**Measure**: Compare writing speed, vocabulary diversity, and self-rated writing confidence between conditions
### Minimum meaningful duration
For learning experiments: **2 weeks per condition** (allows for initial novelty effects to wear off)
For skill atrophy experiments: **1 week per condition** (cognitive offloading effects may appear quickly)
For long-term retention: **Test again at 1 month and 3 months** after the experiment ends
### What to measure (specific metrics)
**For learning experiments:**
**Retention**: Score on a custom quiz (20 questions, multiple choice + short answer) taken immediately after study and again 1 week later
**Time to mastery**: Minutes needed to reach 80% correct on practice questions
**Self-reported understanding**: 1–10 scale ("I understand this topic well enough to explain it to someone else")
**Confidence calibration**: Compare self-rated confidence with actual quiz score (overconfidence = poor calibration)
**For cognitive offloading experiments:**
**Writing speed**: Words per minute in a timed writing task
**Vocabulary diversity**: Type-token ratio (unique words / total words) in a 500-word essay
**Self-efficacy**: 1–10 scale ("I can write effectively without AI assistance")
**Dependence**: Number of times you voluntarily use AI during the control condition (a measure of withdrawal)
### Key confounds to control for
**Topic difficulty**: Use the same topic for both conditions (e.g., Chapter 5 of the same textbook)
**Time of day**: Study at the same time each day (e.g., 7–7:30 PM)
**Prior knowledge**: Take a pre-test to ensure you start with similar baseline knowledge
**Sleep and stress**: Log sleep quality (1–10) and stress level (1–10) daily—these affect learning
**AI prompt quality**: Use the same prompt structure each time (e.g., "Explain [concept] at a [grade] level and give 3 practice questions")
**Distractions**: Study in the same environment (same room, same device, same noise level)
**Expectation effects**: You may believe AI helps more than it does. Consider blinding yourself—have a friend assign conditions without telling you which is which
### What a positive result would look like
**For Hypothesis 1 (AI improves learning):**
Quiz score is ≥15% higher in the AI condition (e.g., 75% vs 60%)
Time to mastery is ≥20% shorter (e.g., 24 minutes vs 30 minutes)
Self-rated understanding is ≥2 points higher on the 1–10 scale
These effects persist at the 1-month follow-up
**For Hypothesis 2 (AI causes skill atrophy):**
Writing speed drops ≥10% in the AI condition (e.g., 25 wpm vs 28 wpm)
Vocabulary diversity drops ≥5% (e.g., type-token ratio 0.45 vs 0.48)
Self-efficacy drops ≥1 point on the 1–10 scale
You find yourself wanting to use AI during the control condition (withdrawal symptom)
**Important caveat:** A single n=1 experiment cannot prove causation. But if you observe consistent effects across multiple topics and multiple weeks, you have strong personal evidence. For more robust results, run the experiment with 3–5 cycles (each 2 weeks) and look for the same pattern each time.
### What this paper specifically suggests you watch for
Based on the 12 themes, here are specific confounds to monitor in your own experiments:
**Novelty effect**: AI may seem helpful at first because it's new. Run experiments for at least 2 weeks to see if benefits persist
**Over-reliance creep**: You may start using AI for tasks you didn't intend to. Log all AI use daily
**Bias in AI responses**: ChatGPT may give incorrect or biased information. Fact-check all AI outputs during your experiment
**Emotional impact**: Note how you feel after AI-assisted study vs self-study. Do you feel more or less confident? More or less engaged?
**Transfer**: Does AI help you learn the specific topic, or does it help you learn *how to learn*? Test with a novel topic after the experiment ends
### Bottom line for self-experimenters
This paper is a useful **hypothesis generator**, not a source of evidence. Its 12 themes give you a menu of claims to test on yourself. The most testable are: (1) AI as a learning tool, (2) AI as a writing crutch, and (3) AI as a source of misinformation. Run 2-week crossover experiments with clear metrics, control for confounds, and treat any single result as preliminary. If you find consistent patterns across multiple cycles, you have discovered something true for *you*—which is more than this paper can claim for anyone.