The Folger Shakespeare (formerly Folger Digital Texts) uses eXtensible Markup Language (XML) to encode our master files. XML is a semantic encoding language that allows encoders to include information “behind the scenes” of the visible text, which can then be used for special searching and analysis, visualizations, and other applications. Types of special information that are included in our encoded texts include details about which characters are entering or exiting a scene, which character is delivering a speech, and even when each character dies.
The Folger Shakespeare follows the guidelines of the Text Encoding Initiative (TEI), a set of guidelines that has become the standard for encoding literary texts. If you’re new to XML or the TEI guidelines and want to learn more, some helpful online resources to get you started are W3School’s XML Tutorial and TEI By Example.
“HTML” stands for HyperText Markup Language, and it’s the default language of the Internet. These files provide quality reading texts that you can download and open in any browser, regardless of internet connection, and provide web developers with a simple framework for creating their own non-commercial applications.
“PDF,” or Portable Document Format, is a stable, static format that is designed to display the same way regardless of what type of device or software is used to view it. These PDF files can be used for offline reading, as printable content, or for reading on any ereading device that supports PDF display.
The Folger Shakespeare provides .doc format versions of all its plays, poems, and sonnets. This format is a great choice for when you want to use Folger texts for scripts, select excerpts for publications or syllabi, or any other project that requires that the text be editable in word processing software. Unlike the XML, HTML, or PDF files, the .doc texts are missing critical editing marks and some formatting (such as indentation of linked lines). This format is similar to the .txt format, but it includes more sophisticated formatting, and special characters. It is recommended that you use .doc over .txt unless using a completely unadorned text is a priority.
The Folger Shakespeare provides .txt format files for projects and applications where simplicity and/or stability is the highest priority. These ASCII 7-encoded files are the most likely to render properly in the widest number of applications and the least likely to present conversion errors when being incorporated into text analysis tools. However, they also lack formatting, critical editing marks, and special characters. It is important to note that because special characters are not present, accents on words will be missing, which will change the meter of those lines. It is recommended that you use one of the other formats offered unless using a completely unadorned text is a priority.
TEI Simple aims to define a new highly-constrained and prescriptive subset of the Text Encoding Initiative (TEI) Guidelines suited to the representation of early modern and modern books, a formally-defined set of processing rules which permit modern web applications to easily present and analyze the encoded texts, mapping to other ontologies, and processes to describe the encoding status and richness of a TEI digital text.
The major goal of recasting the texts in this manner is to make them interoperable with a large corpus of early modern texts derived from the EEBO-TCP transcriptions and encoded in TEI Simple with linguistic annotation.
Dr. Martin Mueller, Professor Emeritus of English and Classics at Northwestern University, in collaboration with Michael Poston, Encoding Architect and Digital Editor of The Folger Shakespeare, has adapted The Folger Shakespeare’s original XML files into TEI Simple, and is sharing them online for those interested in using this format for their own research and projects.