A directory of resources inthe field of technical communication.

Germann, Ryan


About this Site | Advanced Search | Localization | Site Maps

 

1.
#33797

Text Extraction from Graphical Objects During XML Conversion

Materials that include ornamentation and complex design features have long been challenging to convert to XML, even by hand. The problem is two-fold: complex documents usually contain a variety of graphics, some of which may be simple ornamentation, with others actually fundamental to the subject matter. In addition, these graphics can consist of images overlaid either with text that is integral to the image content, or with actual body text. The analysis and extraction of such content into a meaningful order in the converted XML file is not currently possible via scripting conversion tools, and can be time-consuming and arduous to tag manually.

Germann, Ryan. IDEAlliance (2004). Articles>Information Design>Graphic Design>XML