Next: , Previous: , Up: Introduction   [Contents]


1.4 Character offsets

An external document produced by stand-off mode looks like this:

(setq standoff-markup-read-function-dumped
 (quote
  (("ac06be81-d86e-4fe5-b84e-4952b1e571c9"
    "http://beispiel.fernuni-hagen.de/ontologies/beispiel#beispiel"
    28095
    28100
    "Dante")
   ("a29ca667-0f99-4933-b0aa-8a7b1c1929e9"
    "http://beispiel.fernuni-hagen.de/ontologies/beispiel#konzept"
    28057
    28070
    "Große Dichter")
   ...)))

(setq standoff-source-md5-dumped "a2997fcd8c318048abf34889212c1982")

That’s a representation of the external markup in so called s-expressions, which is used by the dummy back-end which simply dumps emacs lisp data into a file. There is also a JSON back-end. See for more info about representation formats and back-ends.

All text spans from the source document, that where annotated, make up a list of markup elements, stored in the variable standoff-markup-read-function-dumped. Each markup element is delimited by paratheses and has positional arguments.

The first positional arguments is made from 32 letters and digits, separted by ’-’ into 5 blocks, and is a universally unique identifier (UUID) of the markup element.

The second positional argument, which has the form of a URI, gives the type or class of the markup element. In the first case, it’s a “beispiel” (german for “example”) from the namespace http://beispiel.fernuni-hagen.de/ontologies/beispiel, in the second case it is “konzept” (german for “term”) from the same namespace.

The next two positional arguments give the charater offsets of the start and end of the text span in the source document. Here, the annotated passage spans from character 28095 to 28100.

The fifth positional entry is the string from the source document, that spans from the start to the end character offset. The latin capital letter D in Dante has position 28095, the e in the end of the name has position 28099.

Further down there is the checksum for the source document. The checksum is calculated, when the document is opened. As soon as the source document is changed, the checksum changes and stand-off mode alarms you with an error, because the character offsets are likely to be incorrect now.


Next: , Previous: , Up: Introduction   [Contents]