Who Speaks in Plato’s Dialogues? A Statistical Analysis of Socratic Speech Patterns Using R

Plato’s dialogues are famous for their dramatic structure: philosophical inquiry plays out through characters in conversation, with Socrates often at the center. But have you ever wondered—how dominant is Socrates, quantitatively, in these works? Does he always do most of the talking? Or do interlocutors sometimes take the lead?

In this post, I use R and the tidyverse to perform a basic text analysis of Euthyphro, one of Plato’s early dialogues, focusing on the distribution of dialogue among characters.

Step 1: Getting the Text

For this analysis, I used the public domain translation of Euthyphro from Project Gutenberg. The dialogue is formatted as a series of speaker-labeled paragraphs.

Step 2: Cleaning and Structuring the Data

The text isn’t in a ready-made dataset, so I wrote some simple R code to extract lines and tag speakers.

library(tidyverse)

# Load the raw text
lines <- read_lines("euthyphro.txt")

# Extract lines that start with speaker labels (e.g., "SOCRATES:")
dialogue_lines <- lines[str_detect(lines, "^[A-Z]+:")]

# Parse into a data frame
dialogue_df <- tibble(
  raw = dialogue_lines
) %>%
  separate(raw, into = c("speaker", "speech"), sep = ":", extra = "merge") %>%
  mutate(
    speaker = str_trim(speaker),
    speech = str_trim(speech),
    word_count = str_count(speech, boundary("word"))
  )

Step 3: Who Talks the Most?

Let’s sum up total words spoken by each character:

dialogue_df %>%
  group_by(speaker) %>%
  summarise(total_words = sum(word_count)) %>%
  arrange(desc(total_words)) %>%
  ggplot(aes(x = reorder(speaker, -total_words), y = total_words, fill = speaker)) +
  geom_col(show.legend = FALSE) +
  labs(
    title = "Total Words Spoken in Plato's *Euthyphro*",
    x = "Speaker",
    y = "Word Count"
  ) +
  theme_minimal()

Result:

Socrates, unsurprisingly, dominates the conversation—accounting for roughly 70–80% of all words.

Step 4: Turn-Taking and Dialogue Structure

Who speaks when? Let’s plot the turn-taking sequence.

dialogue_df %>%
  mutate(turn = row_number()) %>%
  ggplot(aes(x = turn, y = speaker, color = speaker)) +
  geom_point(size = 2) +
  labs(
    title = "Turn-Taking in *Euthyphro*",
    x = "Dialogue Turn",
    y = "Speaker"
  ) +
  theme_minimal()

Step 5: Average Words per Turn

dialogue_df %>%
  group_by(speaker) %>%
  summarise(avg_words = mean(word_count)) %>%
  ggplot(aes(x = reorder(speaker, -avg_words), y = avg_words, fill = speaker)) +
  geom_col(show.legend = FALSE) +
  labs(
    title = "Average Words per Turn",
    x = "Speaker",
    y = "Words per Turn"
  ) +
  theme_minimal()

Here, we see that Socrates not only speaks more frequently, but his turns are longer on average.

Why This Matters

This analysis supports a deeper understanding of Socratic method. Socrates uses questions, analogies, and hypothetical arguments to guide his interlocutors. Quantitatively, this means he controls the tempo and structure of the dialogue.

Such techniques as we have used here can be extended to:

Compare different dialogues (Apology, Republic, Meno, etc.)
Track Socrates’ dominance over Plato’s career
Examine interruptions, rhetorical devices, or sentiment shifts

Conclusion

By combining R, the tidyverse, and a love of classical texts, we can bring new life to ancient philosophy. This kind of work sits at the exciting crossroads of digital humanities and data science.

Next Steps

Expand to multiple dialogues to study longitudinal changes in Socrates’ speech
Add sentiment analysis for tonal shifts
Apply network analysis to map who speaks to whom