Transcription Conventions and Guidelines
Table of contents
In general, follow diplomatic transcription conventions. To the greatest extent possible, write exactly what you see on the page, including all misspellings, abbreviations, characters (“&”, “$”, “@”, “£”), and punctuation marks. Record all diacritics and note their presence in the tracker. Other features, such as underlining, do not need to be recorded. Avoid the use of special characters for punctuation (e.g. the en dash “–”, the em dash “—”, underscores, curly quotes, curly apostrophes).
Crossed-out text
Treat crossed-out text as illegible.
If you can make out the text without great difficulty, place the crossed out text in square brackets.
For extended passages of crossed-out text where the text is still largely legible, please transcribe the text normally.
Superscript writing
Transcribe all superscript letters with a “^” preceding the superscript text:
Words broken across a line
Regardless of the punctuation used in the document (“-“, “=”, or “:”), transcribe with “-“. If the writer did not note the line break, do not use a dash.
Illegibles
Always make your best guess at illegibles and notate them in the project log for a supervisor to review. For things that are truly illegible (e.g. damage on the page, heavily crossed out, etc): place underscores between brackets, with one underscore for each illegible character (make your best guess at the number of characters): [__]
Long periods
Transcribe all long periods as a simple dash followed by a space: -
Long S
Transcribe long S as a standard lowercase s.
Rounded R
Transcribe the rounded R (R rotunda) as a standard lowercase r.
Per sign
Transcribe the per sign as the word “per”.
Ye
Transcribe “ye” as written, including any capitalization or superscripts. This means you will usually write “y^e”. Do not transcribe as “the”.
Insertions
In eScriptorium, insertions receive their own line. Transcribe the insertion as written on its own line. On the line into which it is being inserted, include a carat “^” where the insertion is supposed to go. In the reading order, the insertion should immediately precede the line it is being inserted into.
Insertions contained within the mask of the main line do not need to be on their own line, and can just be transcribed as superscript characters.
Marginalia
In eScriptorium, marginalia receives its own line. Transcribe as written. When reordering the page at the end, place the marginalia after the line it is associated with or if it is a general comment at the end of the page.
Parentheses
Transcribe unrounded parentheses as a slash “/”. If the parentheses look like modern rounded parentheses, or if they resemble brackets, transcribe them as normal parentheses.
Currency Notations
Transcribe the long mark for “shillings” as “s”.
Uncertainty
Note that the machine learning models we are training cannot accomodate uncertainty. In the event that you truly cannot figure out how to transcribe something, please indicate the problematic passage with curly braces like so:
{uncertain text}
This will indicate to a supervisor that the passage should be double checked and potentially excluded from training.