SensusAccess can improve the accessibility of a variety of document types and make them easier to work with. Depending on the document type of the source document, a number of conversion options are available for automated document remediation.
The document remediation features of SensusAccess are further explained in Module 4 of the SensusAccess e-learning course (opens new page).
Image-type documents
Image-type documents include image-only PDF, Text PDF, Tagged PDF, JPG pictures as well a wide range of image formats such as TIFF, GIF, bitmaps and more. Image-only PDF documents may come from a publisher, be retrieved from a database of scientific journals, reside in a learning management system or be created by scanning a paper document. JPG pictures of text may be the result of students using the camera-function of their smartphones to digitise papers. Other image formats may have been used for archiving purposes. Irrespectively, image-type documents are amongst the most inaccessible document types as the text is is not selectable, similarly, the reading order of the image-type documents may be difficult to determine, especially for users who are unable to see the documents.
In case multiple JPG pictures are uploaded, SensusAccess offers the option of combining these into a single document rather than handling the pictures one at a time as separate documents. This option is especially useful of the JPG pictures represent pages of a digitised document.
SensusAccess converts image-type documents into a variety of mainstream formats, including the following:
Word as either DOC or DOCX
These general-purpose formats are suitable for reading or if the document is to be further edited. SensusAccess has a special option for converting image-type documents in Arabic or Arabic/English bilingual into Word documents.
Rich Text Format (RTF)
This general-purpose format is useful when editing in a wide variety of Word processors, and also a good format if the document is to be read on a Braille notetaker, where RTF is usually well-supported.
Tagged PDF
Tagged PDF is an accessible form of PDF capable of containing not only the visual contents of a document such as text and illustrations, but also the semantic structure of the document. When converting image-type documents into tagged PDF, any structural elements recognised in the process are stored in the resulting PDF document. Tagged PDF is suitable if the document is going to be read using assistive technology or read using the text-to-speech capabilities of Adobe Acrobat.
Many instances of the SensusAccess web form have two options for converting PDF and image-type documents into tagged PDF: “pdf – Tagged PDF (text over image)” and “pdf – Tagged PDF (image over text)”, both in the drop-down menu in the accessibility conversion options section.
Selecting the first option will cause PDF and image-type documents to be OCR processed and returned with the recognised text in a layer on top of the original image. Selecting the second option will cause PDF and image-type documents to be OCR processed and returned with the original image in a layer on top of the recognised text. The quality of the text recognition is identical in the two options.
In most cases, presenting the recognised text on top of the original image will result in much clearer text. However, logos and other graphical elements may appear blurred or even appear disfigured. Presenting the original image on top of the recognised text will retain all original graphical elements, but the visual presentation of the text will not be sharpened.
Spreadsheet as XLS, XSLS, or CSV
In cases where image-type documents contain statistics, timetables or other sorts of tabular information, users may find it easier to work with the document if it’s presented as a spreadsheet. SensusAccess has options to convert image-type documents into Excel in either XLS or XLSX format, or into a more generic spreadsheet format as a comma-separated file in CSV.
HTML
SensusAccess supports conversion of image-type documents into HTML if the user’s preferred reading platform is a web browser.
Plain text
As plain text, only recognised text is included in the resulting document. All formatting as well as any illustrations and similar are stripped from the document. The format is especially useful for blind readers and in situations where the semantic structure of a document is to be recreated manually from scratch. SensusAccess has a special option for converting image-type documents in Arabic or Arabic/English bilingual into plain text files.
PowerPoint presentations
PowerPoint presentations can be difficult to use and navigate for blind users with screen readers. To support screen reader users, SensusAccess can convert PowerPoint presentations into the following formats Tagged PDF with or without handout notes, HTML projects and RTF-outline documents.
Tagged PDF
The presentation can subsequently be opened in a PDF reader or e-book reader supporting PDF. SensusAccess attempts to retain all existing accessibility features in the PowerPoint presentation, such as titles, a logical reading order, colours, contrasts, and alternative textual image descriptions in the resulting PDF document.
SensusAccesscan also convert PowerPoint note pages into note handouts in tagged PDF. With this option, the source slides are included as images followed by the notes in plain text.
HTML
The presentation can subsequently be opened in a web browser. Supported browsers include Microsoft Edge, Microsoft Internet Explorer, and Firefox. As is the case with Tagged PDF output, SensusAccess attempts to preserve any accessibility features already in the PowerPoint presentation in the resulting HTML version.
RTF-outline
The presentation can subsequently be opened in a word processor or loaded on to a Braille notetaker. Only the textual contents of the PowerPoint presentation are kept in the RTF outline document. Slide titles are preserved as headings.
E-books
SensusAccess can convert unprotected e-books in EPUB format into both plain text and Rich Text Format (RTF). SensusAccess is unable to convert an e-book if it is protected with DRM or similar.
Plain text
Only the textual contents of the source document is returned. The format is especially in situations where the semantic structure of a document is to be recreated manually from scratch.
Rich Text Format
This general-purpose format is useful when editing in a wide variety of Word processors, and also a good format if the document is to be read on a Braille notetaker, where RTF is usually well-supported.
TeX and LaTeX documents
Source documents in TeX and LaTeX containing mathematical equations can be converted into HTML5 with the mathematical contents represented as MathML. Documents may be uploaded as individual .TEX files or as self-contained ZIP archives.