- Digitization
- Conversion to text
- Encoding text
- Providing text and images together
- Images of Manuscript
- TIF or PNG (for high resolution image quality)
- Conversion to text (Getting Text)
- Is it handwritten? -- Transcribe (use tools like [From the Page] (https://fromthepage.com/), [Transkribus] (https://readcoop.eu/transkribus/))
- Is it typed? -- OCR (Optical Character Recognition) -- OCR for Typescripts --- Online Systems --- Phone Apps --- Commercial Systems ---- ABBYY ---- Omnipage --- OCR for Handwritten (Handwritten Text Recognition) ---- [Amazon Textract] (aws.amazon.com/textract) ---- [Online OCR] (www.onlineocr.net) ---- [Transkribus NN] (www.transkribus.eu/Transkribus) --- Text Processor ---- MS Word (No-structure, proprietary) ---- Notepad++ (for Windows) ---- Bbedit (for Apple) --- [Getting started with Transcription] (https://tinker.edu.au/resources/recipe/getting-started-with-transcription-from-the-page/)
- Wny text and page Image together? -- Presenting the original image together with the transcript is a more robust research method. -- allows to verify the transcript, and perhaps improve it -- may reveal other aspects to the document you didn't notice -- make the document available for future research
- TEI is a standard set of XML
- XML is a longstanding and widely used technology
- XML marks up the structure of a document and not the appearance (unlike HTML)
- XML provides a way to validate documents against a defined schema
- [TEI Doc] (https://tei-c.org/release/doc/tei-p5-doc/en/html/SG.html)