Open Yoda Corpus

The language of one of the Star Wars franchise’s most enigmatic and powerful characters, the tiny green Jedi Master Yoda, has attracted a fair bit of attention from linguists due to its idiosyncrasies. A full review and bibliography of Yoda linguistics is beyond the scope of this blog post, but see for instance here, here and here.

Most people agree on the basic facts: what Yoda really likes to do is take a predicate or a verb phrase and stick it at the start of the clause. However, there is variation, and interesting nuances to be explored. As a service to the Yoda linguistics community I’ve collected all of Yoda’s lines from the movie and made them available here. Do what you want with them! (Of course, I make no claim to own Yoda or any of his utterances – more’s the pity.)

The format is tab-separated, with the line itself in the first column and a code for what movie it’s from in the second column. The sources are, in the following order:

  • Empire Strikes Back (ESB): own watching and transcription
  • Return of the Jedi (RJ): own watching and transcription
  • Phantom Menace (PM): this script
  • Attack of the Clones (AC): this script
  • Revenge of the Sith (RS): this script
  • The Last Jedi (LJ): own watching and transcription, plus this link

With the prequel scripts, I’ve made some slight editorial tweaks to fix obvious typos and weird punctuation, but otherwise remained faithful. Yoda doesn’t appear in A New Hope or The Force Awakens.

The corpus itself can be found here.

Featured image: Yoda statue in California; photo from Wikimedia Commons, by GPS (CC-BY-SA 2.0).