Working with external markup tables

This chapter provides the information required to work with external markup tables. It describes the format of external markup tables so that you can modify them or create new ones. The user exit mechanism of markup tables and its entry points are described to allow for customized processing of documents at different stages. Finally, a parser application programming interface provides some of OpenTM2’s internal functions to expand the possibilities of user exits.

The contents of external markup tables are described in terms of the SGML syntax. You should be familiar with SGML to modify or create markup tables. For a complete description of SGML refer toISO 8879, Information Processing – Text and Office Systems – Standard Generalized Markup Language (SGML).

[—ATOC—]
[—TAG:h3—]

Creating new markup tables

You can create your own markup table by exporting an existing markup table in external SGML format, modifying it with any text editor, and importing it back intoÂ OpenTM2Â under a different name. Markup tables need to be available in an SGML-based format to be imported into OpenTM2. Notice that an exported markup table contains only the nondefault entries.

To become familiar with the content of markup tables you might want to export a markup table and study it before you create a new markup table. SeeÂ Exporting a markup tableÂ for details.

When you have exported one of the markup tables provided byÂ OpenTM2Â you might see a second tag in the second line Â userexitÂ .Â userexitÂ is the name of the dynamic-link library (DLL) containing the user exit code. This tag is only required if a user exit is to be used. For more information, refer toÂ Creating user exits for markup tables.

Layout and content of a markup table

The general layout and content of a markup table are as follows:

A markup table must begin with a tag and end with a tag.
Following the tag areÂ header tagsÂ that are descriptive or of general purpose for the markup table. These header tags do not declare individual markup data. You can use them to give the markup table a name and a description, to specify a character set for conversion, or to specify substitution characters. Header tags in a markup table are optional. SeeÂ Table 9Â for a list of allowed header tags and a detailed description.

An example of a header tag in a markup table is descriptiveÂ name, which lets you specify a name for the markup table that is different from its file name.

Next, a list ofÂ markup tag definitionsÂ follows. These definitions are the core of a markup table. Each definition describes a specific formatting tag, for example, a header tag, or a soft line feed. The definition always includes the name of the markup tag, and either its length or the delimiting characters. A markup tag definition can include further information, for example, whether the text associated with a markup tag needs to be translated. SeeÂ Table 10Â for a list of allowed tags to define a markup tag in detail.

A single markup tag definition always starts with the start tag and ends with the corresponding end tag . An example of a markup tag definition is:


 [soft line feed]
 16
 STNEUTRAL
 SEGNEUTRAL

which defines the markup of a soft line feed. The keyword [softÂ lineÂ feed] is defined as [softÂ lineÂ feed] and has a length of 16 characters. STNEUTRAL specifies that this markup tag has no influence on segmenting, and SEGNEUTRAL specifies that this markup tag does not influence the segmenting status.

Markup tags often haveÂ attributesÂ that specify additional characteristics. For example, a markup tag for tables and figures in a document might use a width attribute to specify the width of the element. You need to define all attributes of a markup language in your markup table as well. The definition of attributes is similar to the definition of markup tags, except that each attribute definition is enclosed between the and tags. SeeÂ Table 10Â for a list of allowed tags to define an attribute in detail.

An example of an attribut definition is:


WIDTH=%
' .\r\n'

which defines the markup of a WIDTH attribute. Here, you will notice that the keyword WIDTH is supposed to be delimited by one of four delimiting characters, as opposed to the previous example, where an explicit length is specified.

In summary, a markup table has the following layout:


Header tags, as required

markup tag definition


markup tag definition


attribute definition (optional)

â‹®

attribute definition (optional)

Notice that all entries use the SGML syntax. All SGML tags must be enclosed in “<” and “>”. There are always a start tag and an end tag.

Your markup table can contain up to 1000 entries.

An SGML markup tag or attribute must be at least specified withÂ STRINGÂ andÂ ENDDELIMÂ , orÂ STRINGÂ andÂ LENGTHÂ .

After you have edited the markup table, you can import it intoÂ OpenTM2. If you import it into an existing markup table, this table is overwritten.

Substitution characters in a markup table

Your markup tag and attribute definitions in a markup table might require that you specify variable parts. An example is the definition of the WIDTH attribute in the previous section (WIDTH=%). Because a document can contain any value for the WIDTH attribute, the percentage signÂ % is used as a substitution character.

You can use the following two substitution characters in a markup table:

The percentage character (%) substitutes any number of characters.
The question mark (?) substitutes a single character.

The substitution characters do not distinguish between numeric and alphabetic characters.

Note that these substitution characters can be redefined in the markup table header.

SGML tags for markup table header

The following table contains the definition of the SGML tags that you can use in a markup table header.

Table 9. SGML tags for markup table header

SGML tag	Definition
DESCRIPTION	Specifies a markup table description, which is shown in the “Markup Table Properties” window and the “Markup Table List” window.
DESCRNAME	Specifies a descriptive name for this markup table. For example, the specification ofASCII in the markup table EQFASCII would give it the name ASCII. If nothing is specified, the file name of the markup table is used.
CHARSET	Specifies the character set to be used for import and export of documents that use this markup table. The documents will be converted using the selected character set without the need to do the conversion in a user exit. Specify one of the following character sets: ASCII ANSI UTF8 UNICODE
SINGLESUBST	Specifies the substitution character to use for single character substitution. The default character isÂ ?.
MULTSUBST	Specifies the substitution character to use for multiple character substitution. The default character isÂ % .
USEUNICODE	Specifies whether segmented source and target files in subdirectories SSOURCE and STARGET are stored in Unicode UTF-16 format. Specify one of the following: YES NOÂ This is the default.
REFLOW	Specifies whether CRLF are allowed to be changed during translation or not. EQFMRI is an example of a markup where RELOW is specified and set to NO. Specify one of the following: YESÂ This is the default. NO
SEGMENTEXIT	Contains the name of the user exit, if the markup table uses one.

SGML tags for markup tags and markup attributes

The following table contains the definition of the SGML tags that you can use to define markup tags and markup attributes in a markup table.
Table 10. SGML tags for markup tags and markup attributes

SGML tag	Definition
STRING	Specifies the name of the markup tag or markup attribute. The specification of STRING is required for an entry in the markup table.
ENDDELIM	Specifies one character as end delimiter of the markup tag or markup attribute, if it has any. You can enter more than one end delimiter.Â OpenTM2Â checks for all possible string combinations to determine the end of the tag or attribute. A string as end delimiter is not possible.When a tag or attribute has an end delimiter, the specification of its length is omitted or can be set to 0. If a tag or attribute has no end delimiter, its length must be specified.The specification of ENDDELIM is required for an entry in the markup table, if LENGTH is not defined.
LENGTH	Defines the length of a markup tag or markup attribute. It must be specified only if the length of the tag or attribute cannot be determined by a delimiter specified by ENDDELIM.
COLPOSITION	Specifies the column position where the markup tag starts. If a markup tag has no special start position and can occur anywhere in a line, COLPOSITION is omitted or can be set to 0. The default is 0.
TYPE	Defines the type of the markup tag. If TYPE is not specified, STDEL is taken as the default.The following types are possible: STDEL Indicates the start of a new text segment. ENDDEL Indicates the end of a text segment. SELFC The markup tag is self-contained, that is, it is a text segment by itself. STNEUTRAL The markup tag is a start tag, which has no influence on segmenting. ENDNEUTRAL The markup tag is an end tag, which has no influence on segmenting.
SEGINFO	Determines whether the text following the markup tag is to be segmented. If SEGINFO is not specified, SEGNEUTRAL is taken as the default. SEGOFF Sets segmenting off, that is, no segmentation is done until the next markup tag is found that sets segmenting on again. If two tags follow each other that set segmenting off, it needs two tags that set segmenting on to start segmentation again. SEGON Sets segmenting on again. SEGNEUTRAL Does not influence the segmenting status. SEGRESET Resets the segmenting status to on, even if the segmenting level requires more than one SEGON tag to set segmentation on. PROTECTON All following text, including segmentation control flags, is protected until a markup tag withÂ PROTECTOFFÂ is encountered. PROTECTOFF Turns off text protection. The following text is handled using normal segmentation rules.
ASSTEXT	Defines types of text following the markup tag. If ASSTEXT is not specified, NOEXPL is taken as the default. TSNL Text follows on the same or the next line and will be associated with the markup tag. TSL Text follows on the same line and will be associated with the makeup tag. NOEXPL No special processing for associated text is required.
ADDINFO	Specifies whether specific text is to be ignored when segments are aligned during the creation of anÂ Initial Translation MemoryÂ : 4Â Marks the start of an area to be ignored. 6Â Marks the start of an area to be partly ignored. This applies to tags containing aÂ % sign, for example HEADER]%. 8Â Marks the end of an area to be ignored. 10Â Marks the end of an area to be partly ignored. This applies to tags containing aÂ % sign, for example HEADERÂ %.
CLASSID	Specifies how the contents of STRING is handled. The only class isÂ CLS_HEADÂ This means that the text specified for STRING becomes an entry of the table of contents that you can display during the translation of a document using theÂ Special go to…Â dialog.
ATTRINFO	Specifies whether a markup tag has attached attributes (YES/NO). NO is the default. If YES is specified, the ATTRIBUTE SGML tag must be used to specify the attributes.
TRANSLATEINFO	Specifies whether the segment associated with the markup tag or markup attribute must be translated or not (YES/NO). If TRANSLATEINFO is not specified, NO is taken as the default.

Examples of markup data and corresponding markup tags

If a document contains, for example, [soft line feed] as markup data, it is usually meant as a so-called inline tag, which means that it is contained in the segment. It has no influence on the segmentation of the document. The corresponding markup tag definition in a markup table looks as follows:


 [soft line feed]
 16
 STNEUTRAL
 SEGNEUTRAL

â€¦ defines the markup string, and â€¦ specifies its length. Because the length is specified, no ENDDELIM tag is required. STNEUTRAL<â€¦ defines that this markup string has no influence on segmentation. All other markup table SGML tags will be set to the default and therefore need not be specified.

Assumed that such markup tag causes segmentation, we define this as follows:


 [soft line feed]
 16
 STDEL
 SEGNEUTRAL

The following table lists some imaginary markup data with a description.

Markup data	Definition
[bold]Â textÂ [â„bold]	The text following this tag (until the end tag) is printed bold; this tag is part of the segment and has no influence on segmenting.
[HeadingÂ xÂ ]text	This tag describes a heading; the heading text must follow on the same line;Â xÂ is the level of heading and goes from 1 to 9; this tag ends the previous segment and starts a new segment.
[page: even]	A page break; the following text starts on an even page; this tag always starts on the first column and has no text following in the same line; a blank must separate the attributeÂ evenÂ from the tag.
[page: odd]	A page break; the following text starts on an odd page; this tag always starts on the first column and has no text following in the same line; a blank must separate the attributeÂ oddÂ from the tag.
[paragraph]	A paragraph; this tag ends the previous segment and starts a new segment; the tag occurs at the end of the previous paragraph.
Â %	Stands for any number of characters. For example, in b%,Â % stands for the characters old.
[break]	Starts a new segment. You use this tag to split an existing segment into two or more segments.
[*%]	* indicates the start of a comment and % stands for the comment text.

This markup data would lead to the following markup table definitions. The defaults will not be shown.

Markup definition	Explanation
[bold] 6 STNEUTRAL or [bold ] STNEUTRAL or [b% ] STNEUTRAL	The markup tag should be part of the segment, therefore STNEUTRAL is used. All examples have the same result, you can specify this markup tag by its length or end delimiter. You can also substitute part of the inline tag byÂ %.
[HeadingÂ ? ] SEGRESET TSL YES	Single substitution is used for the heading level; the end of the tag is ]; the heading requires the reset of segmenting with SEGRESET; the text associated with the tag occurs on the same line; the text associated with the tag is translatable.
[page: YES 1	The markup tag ends with a blank; attributes may follow; the tag always starts at the first column in a line.
[paragraph ] ENDDEL or [paragraph] 11 ENDDEL	The tag ends with ] or is defined by its length; the tag should end the previous segment, therefore ENDDEL is used.
even ]	This is an attribute; it ends with ].
odd ]	This is an attribute; it ends with ].
[break] 7 STDEL	Indicates that a new segment starts.
*% \r\n/ENDDELIM> 1	Indicates a comment that ends at the end of the line. COLPOSITION defines that the asterisk is only recognized as the start of a comment if it appears in the first column of a line.

Creating user exits for markup tables

There are document formats that require a user exit for their markup table:

Binary documents, for example MicrosoftÂ ^(R)Â Word for WindowsÂ ^(R)Â documents
Documents that require code page conversion, for example ANSI documents
Documents that have a fixed record layout
Documents that contain nontranslatable text parts, for example, RTF documents
Binary documents like Lotus Notes database files and template files that require context-dependent processing.

OpenTM2Â provides two markup tables that are already combined with a user exit:

The user exit part of the EQFHTML4 markup table converts the code page and preprocesses JavaScripts to limit segments to 2048 characters. The markup table part controls text segmentation and the recognition of inline tags.
The user exit part of the EQFANSI markup table converts the code page, and the markup table part inserts segment breaks after empty lines.

In addition,Â OpenTM2Â provides a user exit that you can use with the appropriate markup table. This user exit is a dynamic-link library (DLL) with predefined entry points. The code for the exit can be written in any programming language that supports PASCAL calling conventions. The include file EQF_API.H contains the definitions required for a user exit written in C.

The user exit is activated using the tag of the markup table (see alsoÂ Segment exitÂ described inÂ Creating new markup tables).

General user exit entry points

The user exit entry points (their names start with EQF) are called at different stages during the analysis, translation, and export of a document.

During the analysis (see Figure 164):
- EQFPRESEG2Â is calledÂ beforeÂ the text is segmented. It can be used to preprocess a document and decide whether text segmentation is done by OpenTM2 after EQFPRESEG2.
- EQFPOSTSEGWÂ is calledÂ afterÂ the text is segmented. It can be used to postprocess a document.
- EQFPOSTTMWÂ is calledÂ afterÂ Translation Memory matches are processed and terms lists are created. It can be used to modify segments.

Figure 164. Analysis of a document using the user exit

During the translation:

EQFCHECKSEGWÂ is called after a segment is translated but before it is saved in the Translation Memory. It can be used to modify a segment.
EQFSHOWÂ is called when the user selects the “Show translation” menu item.

During the export (see Figure 165):

EQFPREUNSEGWÂ is calledÂ beforeÂ OpenTM2 removes the segmentation from a document. It can be used for the same purpose, or whatever is required at this step.
EQFPOSTUNSEG2Â is calledÂ afterÂ OpenTM2 (or EQFPREUNSEG2) removed the segmentation. It can be used, for example, to establish the external document format.
Alternatively,Â EQFPOSTUNSEGWcanÂ be calledÂ afterÂ OpenTM2 (or EQFPREUNSEG2) removed the segmentation. If EQFPOSTUNSEGW entry point exists, OpenTM2 uses EQFPOSTUNSEGW, without regard of the existence of EQFPOSTUNSEG2. EQFPOSTUNSEGW requires that the input text is always UTF16. If EQFPOSTUNSEGW entry point exists, OpenTM2s’ “Undo text segmentation” step outputs an UTF16 file.

Figure 165. Export of a document using the user exit
The following sections describe the individual entry points in detail. Note that entry points from earlier versions of OpenTM2 (without the trailing letter W) are supported, and the calling syntax remains unchanged. However, you should use the entry points as listed in this section. SeeÂ Compatibility notes concerning Unicode supportÂ for details.

EQFPRESEG2

Purpose

EQFPRESEG2Â is called during the analysis of a document before the text is segmented. It preprocesses the document, for example converts code pages, and decides whether text segmentation is done byÂ OpenTM2Â orÂ EQFPRESEG2Â itself. If an error occurs, it can stop the analysis.

Format

Parameters

MarkupTable

The pointer to the name of a markup table.

Editor

The pointer to the name of the editor.

Path

The pointer to the program path.

SourceFile

The pointer to the name of the source file (with full path).

Buffer

The pointer to the buffer containing the name of the temporary output file.

OutputFlag

The output flag indicating whether the text is to be segmented by EQFPRESEG2 instead ofÂ OpenTM2.

SliderWindowHandle

The handle of the slider window.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPRESEGEX

Purpose

EQFPRESEGEXÂ is called during the analysis of a document before the text is segmented. It preprocesses the document, for example converts code pages, and decides whether text segmentation is done by OpenTM2 or EQFPRESEGEX itself. If an error occurs, it can stop the analysis. The EQFPRESEGEX entry point is identical toÂ EQFPRESEG2Â except for the additional parameter Analsysis handle.

Format

Parameters

MarkupTable

The pointer to the name of a markup table.

Editor

The pointer to the name of the editor.

Path

The pointer to the program path.

SourceFile

The pointer to the name of the source file (with full path).

Buffer

The pointer to the buffer containing the name of the temporary output file.

OutputFlag

The output flag indicating whether the text is to be segmented by EQFPRESEGEX instead ofÂ OpenTM2.

SliderWindowHandle

The handle of the slider window.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

AnalysisHandle

The analysis handle. This handle is required for the API callsÂ EQFSETTAOPTIONSÂ andÂ EQFGETTAOPTIONS.

EQFPOSTSEGW

Purpose

EQFPOSTSEGWÂ is called during the analysis of a document after the text is segmented. It postprocesses the document, for example adjusts segment boundaries. If an error occurs, it can stop the analysis.

Format

Parameters

MarkupTable

The pointer to the name of a markup table.

Editor

The pointer to the name of the editor.

Path

The pointer to the program path.

SourceFile

The pointer to the name of the source file (with full path).

TargetFile

The pointer to the name of the target file.

SegmentationTags

The pointer to the tags inserted during text segmentation.

SliderWindowHandle

The handle of the slider window.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPOSTSEGWEX

Purpose

EQFPOSTSEGWEXÂ is called during the analysis of a document after the text is segmented. It postprocesses the document, for example adjusts segment boundaries. If an error occurs, it can stop the analysis. The EQFPOSTSEGWEX entry point is identical toÂ EQFPOSTSEGWÂ except for the additional parameter Analysis handle.

Format

Parameters

MarkupTable

The pointer to the name of a markup table.

Editor

The pointer to the name of the editor.

Path

The pointer to the program path.

SourceFile

The pointer to the name of the source file (with full path).

TargetFile

The pointer to the name of the target file.

SegmentationTags

The pointer to the tags inserted during text segmentation.

SliderWindowHandle

The handle of the slider window.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

AnalysisHandle

The analysis handle. This handle is required for the API callsÂ EQFSETTAOPTIONSÂ andÂ EQFGETTAOPTIONS.

EQFPOSTTMW

Purpose

EQFPOSTTMWÂ is called during the analysis of a document afterÂ Translation MemoryÂ matches have been inserted and terms lists have been created. It is used to modify the segments. If an error occurs, it can stop the analysis.

Format

Parameters

Editor

The pointer to the name of the editor.

Path

The pointer to the program path.

SegmentedSourceFile

The pointer to the name of the segmented source file.

SegmentedTargetFile

The pointer to the name of the segmented target file.

SegmentationTags

The pointer to the tags inserted during text segmentation.

SourceTargetFlag

The flag indicating if the segmented source differs from the segmented target.

SliderWindowHandle

The handle of the slider window.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFCHECKSEGW

Purpose

proEQFCHECKSEGWÂ is called during the translation of a document after a segment has been translated but not saved yet in theÂ Translation Memory. It can modify the segment, for example change lowercase characters to uppercase, and prevent the segment from being saved, for example if specific length limits have been exceeded.

EQFCHECKSEGWÂ is also called when exact matches are automatically substituted during the analysis of a document.

Format

Parameters

PreviousSourceSegment

The pointer to the text of the previous source segment.

CurrentSourceSegment

The pointer to the text of the current source segment.

Translation

The pointer to the translation of the current segment.

ModifyFlag

The pointer to the flag that is set when the user exit has modified the translated segment.

MessageFlag

The flag indicating whether a message box is shown.

Return code

The return code indicates if the segment can be saved.

EQFSHOW

Purpose

EQFSHOWÂ is called during the translation of a document when the user selects the “Show Translation” menu item. It is up to the user exit to prepare and display the document in a window. The user exit can use the API callsÂ EQFGETNEXTSEG,Â EQFGETNEXTSEGW,Â EQFGETPREVSEG,Â EQFGETPREVSEGW,Â EQFGETCURSEG,Â EQFGETCURSEGWÂ andÂ EQFGETINFOÂ to retrieve the document segments and to get other document information.

Format

Parameters

lInfo

A handle to the target document. This handle has to be specified in the API calls for accessing the segment text.

hwndParent

The handle of the window which should be specified as parent window for the window displaying the document.

Return code

The user exit should return TRUE if the document could be displayed and FALSE in case of errors.

EQFGETCURSEG

Purpose

EQFGETCURSEGÂ returns a specific segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer as a zero terminated string. The variable pointed to by pusSegNum contains the number of the requested segment.

Format

Parameters

lInfo

The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.

pusSegNum

The pointer to a ULONG variable containing the segment number.

pBuffer

The pointer to a buffer for the segment text.

pusBufSize

The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.

Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETCURSEGW

Purpose

EQFGETCURSEGWÂ returns a specific segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer in UTF16-encoding and is terminated by 0x0000. The variable pointed to by pulSegNum contains the number of the requested segment.

Format

Parameters

lInfo

The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.

pulSegNum

The pointer to a ULONG variable containing the segment number.

pBuffer

The pointer to a buffer for the segment text in UTF-16 encoding.

pusBufSize

The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer in number of UTF-16 characters.

Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETNEXTSEG

Purpose

EQFGETNEXTSEGÂ returns the next segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer as a zero-terminated string. The API call increments the segment number automatically.

Format

Parameters

lInfo

The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.

pusSegNum

The pointer to a USHORT variable containing the segment number. This variable should be set to 1 before the first call. The segment number is automatically incremented.

pBuffer

The pointer to a buffer for the segment text.

pusBufSize

The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.

Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETNEXTSEGW

Purpose

EQFGETNEXTSEGWÂ returns the next segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer in UTF-16 encoding and is terminated by 0x0000. The API call increments the segment number automatically.

Format

Parameters

lInfo

The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.

pulSegNum

The pointer to a ULONG variable containing the segment number. This variable should be set to 1 before the first call. The segment number is automatically incremented.

pBuffer

The pointer to a buffer for the segment text in UTF-16 encoding.

pusBufSize

The pointer to a USHORT variable containing the size of the buffer in number of UTF-16 characters.

Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETPREVSEG

Purpose

EQFGETPREVSEGÂ returns the previous segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer as a zero-terminated string. The API call decrements the segment number automatically.

Format

Parameters

lInfo

The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.

pulSegNum

The pointer to a USHORT variable containing the segment number. The segment number is automatically decremented.

pBuffer

The pointer to a buffer for the segment text.

pusBufSize

The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.

Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETPREVSEGW

Purpose

EQFGETPREVSEGWÂ returns the previous segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer in UTF16-encoding and is terminated by 0x0000. The API call decrements the segment number automatically.

Format

Parameters

lInfo

The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.

pulSegNum

The pointer to a ULONG variable containing the segment number. The segment number is automatically decremented.

pBuffer

The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer in number of UTF-16 characters.

pusBufSize

The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.

Return code

The function returns zero if successful otherwise an error code is returned.

EQFBUILDDOCPATH

Purpose

EQFBUILDDOCPATHÂ creates the fully qualified file name for a OpenTM2 document using the folder object name and the document long name. This function can be used to access documents stored in OpenTM2 folders.

Format

Parameters

szFolObjName

The folder object name as returned using EQFGETINFO with the GETINFO_FOLDEROBJECT ID.

szDocLongName

The document long name.

PathID

The ID of the requested document path, valid IDs are:PATHID_SOURCE to build the path to the source documentPATHID_SEGSOURCE to build the path to the segmented source documentPATHID_SEGTARGET to build the path to the segmented target documentPATHID_TARGET to build the path to the target document

pchBuffer

The pointer to a buffer receiving the fully qualified document path, the size of this buffer has to be at least 60 bytes.

Return code

function completed successfully

ERROR_INVALID_PARAMETER

wrong or missing parameter

ERROR_PATH_NOT_FOUND

the folder did not exist

ERROR_FILE_NOT_FOUND

the document does not exist

Examples

The folder “AnotherTestFolder” contains the document “myTest.HTML”. The folder is located on drive “E:” and has a short name of “ANOTH000.F00”. The document short name is “MYTESTHT.000”. The primary drive of the OpenTM2 installation is “C:”.

EQFBUILDDOCPATH( “C:\EQF\ANOTH000.F00”, “myTest.HTML”, PATHID_SOURCE, szBuffer ) would return ” E:\EQF\ANOTH000.F00\SOURCE\ MYTESTHT.000″ in szBuffer.

EQFGETINFO

Purpose

EQFGETINFOÂ returns specific on the document currently being processed in the EQFSHOW function of the user exit. This function is used by the user exit to get more information concerning the document and its location.

Format

Parameters

lInfo

The info handle passed to the user exit in the EQFSHOW call.

InfoID

The ID of the requested information, valid IDs are:

GETINFO_MARKUPÂ to retrieve the markup table of the document

GETINFO_FOLDEROBJECTÂ to retrieve the object name of the folder containing the document

GETINFO_FOLDERLONGNAMEÂ to retrieve the long name (in ASCII) of the folder containing the document

GETINFO_DOCFULLPATHÂ to retrieve the fully qualified path of the document segmented target file

GETINFO_DOCLONGNAMEÂ to retrieve the document long name

pchBuffer

The pointer to a buffer receiving the requested information, if this parameter is NULL the size of the requested information is returned using the pusBufSize parameter.

pusBufSize

The pointer to a USHORT value containing the buffer size, on return this value contains the size of the returned information.

Return code

function completed successfully

ERROR_INVALID_PARAMETER

unknown InfoID or missing parameter

ERROR_INVALID_HANDLE

invalid lInfo handle

ERROR_NOT_ENOUGH_MEMORY

not enough memory / memory allocation failed

ERROR_INSUFFICIENT_BUFFER

buffer is not large enough for the returned information, *pusBufSize contains required buffer size

Examples

Assuming the document “myTest.HTML” located in folder “AnotherTestFolder” is opened using EQFSHOW. The folder is located on drive “E:” and has a short name of “ANOTH000.F00”. The document short name is “MYTESTHT.000”. The primary drive of the OpenTM2 installation is “C:”

usBufSize = sizeof(szBuffer);

EQFGETINFO( lInfo, GETINFO_MARKUP, szBuffer, &usBufSize) would return “IBMHTM32” in szBuffer

usBufSize = sizeof(szBuffer);

EQFGETINFO( lInfo, GETINFO_FOLDEROBJECT, szBuffer, &usBufSize) would return “C:\EQF\ANOTH000.F00” in szBuffer

usBufSize = sizeof(szBuffer);

EQFGETINFO( lInfo, GETINFO_FOLDERLONGNAME, szBuffer, &usBufSize ) would return “AnotherTestFolder” in szBuffer

usBufSize = sizeof(szBuffer);

EQFGETINFO( lInfo, GETINFO_DOCFULLPATH, szBuffer, &usBufSize ) would return “E:\EQF\ANOTH000.F00\STARGET\MYTESTHT.000” in szBuffer

usBufSize = sizeof(szBuffer);

EQFGETINFO( lInfo, GETINFO_DOCLONGNAME, szBuffer, &usBufSize ) would return “MyTest.HTML” in szBuffer

EQFPREUNSEGW

Purpose

EQFPREUNSEGWÂ is called during the export of a document before the segmentation tags inserted byÂ OpenTM2Â are removed. It decides whether the segmentation tags are removed byÂ OpenTM2Â orEQFPREUNSEGWÂ itself. However, it is normally used to remove the segmentation tags. If an error occurs, it can stop the export.

Format

Parameters

Editor

The pointer to the name of the editor.

Path

The pointer to the program path.

SegmentedTargetFile

The pointer to the name of the segmented target file (with full path).

Buffer

The pointer to the buffer containing the name of the temporary output file.

SegmentationTags

The pointer to the tags inserted during text segmentation.

OutputFlag

The output flag indicating whether the segmentation tags are removed by EQFPREUNSEGW instead ofÂ OpenTM2.

SliderWindowHandle

The handle of the slider window.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPOSTUNSEGW

Purpose

EQFPOSTUNSEGWÂ is called during the export of a document after the segmentation tags have been removed from the text. The text must be in UTF16. It is normally used to establish the external document format. If an error occurs, it can stop the export.

Format

Parameters

MarkupTable

The pointer to the name of a markup table.

Editor

The pointer to the name of the editor.

Path

The pointer to the program path (with full path).

TargetFile

The pointer to the name of the target file (with full path).

SegmentationTags

The pointer to the tags inserted during text segmentation.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPOSTUNSEG2

Purpose

EQFPOSTUNSEG2Â is called during the export of a document after the segmentation tags have been removed from the text. It is normally used to establish the external document format. If an error occurs, it can stop the export.

Format

Parameters

MarkupTable

The pointer to the name of a markup table.

Editor

The pointer to the name of the editor.

Path

The pointer to the program path (with full path).

TargetFile

The pointer to the name of the target file (with full path).

SegmentationTags

The pointer to the tags inserted during text segmentation.

ReturnFlag

The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

API calls for user exits

This group contains the API calls which can be called by the markup table user exits to access and modify OpenTM2 settings. Currently these are

EQFGETTAOPTIONSÂ to get the active analysis settings. This API call can be called by the user exit during theÂ EQFPRESEGEX, andÂ EQFPOSTSEGWEXÂ processing.
EQFSETTAOPTIONSÂ to modify the analysis settings. This API call can be called by the user exit during theÂ EQFPRESEGEX, andÂ EQFPOSTSEGWEXÂ processing.

The following sections describe the individual API calls in detail.API calls for user exitsAPI calls for user exits

EQFGETTAOPTIONS

Purpose

EQFGETTAOPTIONSÂ can be used by the markup table user exit to retrieve the currently active analysis settings. The settings are returned in anÂ EQFTAOPTIONSÂ structure. The analysis handle used by this call is passed to the user exit by the user exit entry pointsÂ EQFPRESEGEX, andÂ EQFPOSTSEGWEX.

Format

Parameters

AnalysisHandle

The analysis handle passed to the user exit by the entry pointsÂ EQFPRESEGEX, andÂ EQFPOSTSEGWEX.

Options

The pointer to aÂ EQFTAOPTIONSÂ structure receiving the currently active analysis settings.

EQFSETTAOPTIONS

Purpose

programming interface callsEQFSETTAOPTIONS EQFSETTAOPTIONSÂ EQFSETTAOPTIONSÂ can be used by the markup table user exit to change the currently active analysis settings. The settings are passed to the API call in anÂ EQFTAOPTIONSÂ structure. The analysis handle used by this call is passed to the user exit by the user exit entry pointsÂ EQFPRESEGEX, andÂ EQFPOSTSEGWEX.

Format

Parameters

AnalysisHandle

The analysis handle passed to the user exit by the entry pointsÂ EQFPRESEGEX, andÂ EQFPOSTSEGWEX.

Options

The pointer to aÂ EQFTAOPTIONSÂ structure containing the analysis settings being modified.

EQFTAOPTIONS

Purpose

The structureÂ EQFTAOPTIONSÂ is used by the API callsÂ EQFSETTAOPTIONSÂ andÂ EQFGETTAOPTIONSÂ to get or set the analysis options.

Fields

fAdjustLeadingWS

This flag represents the “Adjust leading whitespace to whitespace of source segment” flag of the GUI.

fAdjustTrailingWS

This flag represents the “Adjust trailing whitespace to whitespace of source segment” flag of the GUI.

bForFutureUse

Area for future enhancements. Currently not in use.

User exit entry points for context-dependent translations

The following user exit entry points support context-dependent translations, where translation proposals and automatic translations not only depend on text matches but also on the type of document containing the text. These entry points are designed to support the translation of Lotus Notes and Domino design elements, such as Notes database files, template files, and application templates. When OpenTM2 imports these documents (using the LOTUSNGD markup table), it maintains context-dependent information about these design elements together with existing translations in the Translation Memory. If the user exit is used by the markup table, OpenTM2 uses the context information and the translation proposals to identify matches on the segments to be translated.

EQFGETCONTEXTINFOÂ is called once when a markup table is loaded. It returns information about the number and the names of context strings used in the Translation Memory, and it controls (based on the availability of context information) whether further context information processing is performed.

EQFGETSEGCONTEXTÂ is called before a translated segment is saved in the Translation Memory. It gets the context strings from the user exit and passes them to the Translation Memory.

- EQFUPDATECONTEXTÂ is called subsequently for every segment during the analysis of a document and updates the user exit with the context strings from the Translation Memory for the current segment.

EQFCOMPARECONTEXTÂ is called for every segment and compares and ranks a segment’s context information against Translation Memory proposals.

OpenTM2 uses these user exit entry points to support the translation of Lotus Notes forms that contain the Form, Subform, Title, and Subtitle context strings.

EQFGETCONTEXTINFo

Purpose

EQFGETCONTEXTINFOÂ is called once when a new markup table is loaded into the Translation Memory. It returns the number of context strings that are used by this markup and the names of these context strings (for example, PanelÂ ID for MRI markup). If a markup table user exit does not support this entry point, or returns an error code, no further context information processing is performed for this markup table (neitherÂ EQFGETSEGCONTEXT,Â EQFUPDATECONTEXT, norÂ EQFCOMPARECONTEXTÂ is called).

Format

Parameters

pusNumOfContextStrings

The pointer to a USHORT variable receiving the number of context strings that are used by this markup.

pContextNames

The pointer to a UTF16 buffer for the context names. This buffer has a size of MAX_CONTEXT_LEN(4096) characters. The context names are stored as a list of UTF-16 strings, and the list is terminated by 0x0000.Currently the names will not be used. In a later version these names will be used in the translation environment to display the context of a segment.

Return code

The return code indicates whether context information could be returned.

EQFGETSEGCONTEXT

Purpose

EQFGETSEGCONTEXTÂ returns the context strings for a given segment and passes them to the Translation Memory functions before a segment is about to be saved in the Translation Memory. This function is used by the editor during the translation. Using the supplied document handle the function can go backward or forward to other segments if necessary (for example, for an MRI markup it is necessary to go back to the segment containing the panel ID).

Format

Parameters

pCurSeg

The pointer to a zero-terminated UTF-16 string containing the text of the current segment.

pPrevSeg

The pointer to a zero-terminated UTF-16 string that contains the text of the previous segment (NULL, if there is none).

pNextSeg

The pointer to a zero-terminated UTF-16 string that contains the text of the next segment (NULL, if there is none).

pContextStrings

The pointer to a UTF16 buffer for the context strings. This buffer has a size of MAX_CONTEXT_LEN (4096) characters. The context strings are stored as a list of UTF-16 strings, and the list is terminated by 0x0000.

hEditor

The handle of type HANDLE, which is required for the EQFGetNextSeg and EQFGetPrevSeg functions.

Return code

The return code indicates whether context strings could be returned.

EQFUPDATECONTEXT

Purpose

EQFUPDATECONTEXTÂ is called subsequently during the analysis of a document. If the current segment in the Translation Memory contains context information, this function updates the user exit with the context strings for this segment. The retrieved context strings are used to identify exact context matches with theÂ EQFCOMPARECONTEXTÂ function.

Format

Parameters

pSeg

The pointer to a zero-terminated UTF-16 string containing the text of the current segment.

pContextStrings

The pointer to a UTF16 buffer containing the current context strings and receiving the updated context strings. This buffer has a size of MAX_CONTEXT_LEN(4096) characters. The context strings are stored as a list of UTF-16 strings, and the list is terminated byÂ 0x0000Â .

Return code

The return code indicates whether context strings could be updated.

EQFCOMPARECONTEXT

Purpose

EQFCOMPARECONTEXTÂ is called for every segment that has an exact text match and context information available. The function compares the context strings of a segment against the context strings of a Translation Memory proposal and ranks the match between 0 and 100. 0 means no context match at all, and 100 means an exact context match.

During an analysis only exact text matchesÂ andÂ exact context matches of a segment lead to automatic substitutions. During a translation, the ranks are used to identify the best translation proposals.

Format

Parameters

pContextStrings1

The pointer to a buffer containing the context strings of the current segment. The context strings are stored as a list of UTF-16 strings, and the list is terminated by 0x0000.

pContextStrings2

The pointer to a buffer containing the context strings of the proposal. The context strings are stored as a list of UTF-16 strings, and the list is terminated byÂ 0x0000Â .

pusRanking

The pointer to the variable receiving the ranking for the context strings.

Return code

The return code indicates whether context information could be compared.

Parser application programming interface

The following functions are internal OpenTM2 parsing functions that are made available to expand the possibilities of user exists. Their main purposes are:

To access and modify segmented documents on a segment base.

Documents can be loaded, and their segments can be retrieved and modified. Segments can be converted into an SGML tagged format. Code conversions can be done, and some document properties can be retrieved. Modified documents can be saved.

To access and tokenize markup tables to get information about markup tags and property information.

Markup tables can be loaded and tokenized, and the properties of markup tags can be accessed.

Because these are basically parsing functions, their names start with “Pars”. Function names ending with “W” are for Unicode documents, and for markup tables to be used with Unicode documents.

Note that these functions are not called at defined OpenTM2 processing steps (as opposed to the descriptions inÂ General user exit entry pointsÂ andÂ User exit entry points for context-dependent translations. However, they are well suited to be used in the code of one or more of these entry points. For example, they can be used to create or clean up markup tables. A sample parser that uses these parser API functions can be found in file parssamp.c in directoryÂ \eqf\nondde\Â .

Further details about these functions, like the definition of data types, can be found in fileÂ eqfpapi.hÂ in the same directory.

The following sections describe the parser API functions in detail. Where applicable, the parser API functions are enabled for Unicode UTF-16 support.

ParsInitialize

Purpose

ParsInitializeÂ initializes the parser API environment and creates a parser API handle that is to be used in most of the other parser API functions.

Format

Parameters

Type	Parameter	Description
HPARSER	phParser	The pointer to the buffer for the parser API handle.
CHAR	pszDocPathName	The pointer to the zero-terminated document path name.

Return code

Integer ofÂ 0Â , if the environment is successfully initialized, or an error code.

ParsBuildTempName

Purpose

ParsBuildTempNameÂ builds a temporary file name based on the fully qualified file name of the source document.

Format

Parameters

Type	Parameter	Description
PSZ	pszSourceName	The pointer to the zero-terminated fully qualified file name of the source document. The name serves as the model for the temporary file name.
PSZ	pszTempName	The pointer to the zero-terminated temporary file name. The buffer for the file name should have a size of 128 bytes or more.

Return code

Integer ofÂ 0Â , if the file name is successfully built, or an error code.

ParsLoadSegFile

Purpose

ParsLoadSegFileÂ loads a segmented file into memory.

Format

Prameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
CHAR	pszFileName	The pointer to the zero-terminated fully qualified file name of the document to be loaded into memory.
HPARSSEGFILE	phSegFile	The pointer to the buffer in memory that receives the segmented file.

Return code

Integer ofÂ 0Â , if the file is successfully loaded, or an error code.

ParsGetSegNum

Purpose

ParsGetSegNumÂ returns the number of segments of the segmented file loaded into memory.

Format

Parameters

Type	Parameter	Description
HPARSSEGFILE	phSegFile	The handle of the segmented file in memory.
LONG	plSegCount	The pointer to the buffer that receives the number of segments.

Return code

Integer ofÂ 0Â , if the number is successfully retrieved, or an error code.

ParsGetSeg

Purpose

ParsGetSegÂ gets a segment from the segmented file loaded into memory. If the segment in Unicode format, useÂ ParsGetSegW.

Format

Parameters

Type	Parameter	Description
HPARSSEGFILE	hSegFile	The handle of the segmented file in memory.
LONG	lSegNum	The number of the segment to get.
PPARSSEGMENT	pSeg	The pointer to the buffer that receives the segment data.

Return code

Integer ofÂ 0Â , if the segment is successfully retrieved, or an error code.

ParsGetSegW

Purpose

ParsGetSegWÂ gets a segment from the segmented file loaded into memory. If the segment not in Unicode format, useÂ ParsGetSeg.

Format

Parameters

Type	Parameter	Description
HPARSSEGFILE	hSegFile	The handle of the segmented file in memory.
LONG	lSegNum	The number of the segment to get.
PPARSSEGMENTW	pSeg	The pointer to the buffer that receives the segment data.

Return code

Integer ofÂ 0Â , if the segment is successfully retrieved, or an error code.

ParsUpdateSeg

Purpose

ParsUpdateSegÂ updates a segment of the segmented file loaded into memory. If the segment is in Unicode format, useÂ ParsUpdateSegWÂ .

Format

Parameters

Type	Parameter	Description
HPARSSEGFILE	hSegFile	The handle of the segmented file in memory.
LONG	lSegNum	The number of the segment to update.
PPARSSEGMENT	pSeg	The pointer to the buffer that holds the new segment data.

Return code

Integer ofÂ 0Â , if the segment is successfully updated, or an error code.

ParsUpdateSegW

Purpose

ParsUpdateSegWÂ updates a segment of the segmented file loaded into memory. If the segment is not in Unicode format, useÂ ParsUpdateSeg.

Format

Parameters

Type	Parameter	Description
HPARSSEGFILE	hSegFile	The handle of the segmented file in memory.
LONG	lSegNum	The number of the segment to update.
PPARSSEGMENTW	pSeg	The pointer to the buffer that holds the new segment data.

Return code

Integer ofÂ 0Â , if the segment is successfully updated, or an error code.

ParsWriteSegFile

Purpose

ParsWriteSegFileÂ writes the segmented file in memory to an external file.

Format

Parameters

Type	Parameter	Description
HPARSSEGFILE	hSegFile	The handle of the segmented file in memory.
CHAR	pszFileName	The pointer to the zero-terminated fully qualified file name of the document.

Return code

Integer ofÂ 0Â , if the file is successfully written, or an error code.

ParsMakeSGMLSegment

Purpose

ParsMakeSGMLSegmentÂ builds an SGML tagged segment as used in segmented files. If the segment is in Unicode format, useÂ ParsMakeSGMLSegmentW.

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
PPARSSEGMENT	pSegment	The pointer to the buffer that holds the segment data.
CHAR	pszBuffer	The pointer to the buffer that receives the zero-terminated SGML-tagged segment. The buffer size for the segment should be at least twice the maximum segment size.
INT	iBufferSize	The size ofÂ pszBuffer.
BOOL	fSourceFile	TRUE Create SGML for a segmented source file. FALSE Create SGML for a segmented target file.

Return code

Integer ofÂ 0Â , if the segment is successfully built, or an error code.

ParsMakeSGMLSegmentW

Purpose

ParsMakeSGMLSegmentWÂ builds an SGML tagged segment as used in segmented files. If the segment is not in Unicode format, useÂ ParsMakeSGMLSegment.

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
PPARSSEGMENTW	pSegment	The pointer to the buffer that holds the segment data.
WCHAR*	pszBuffer	The pointer to the buffer that receives the zero-terminated SGML-tagged segment (in Unicode UTF-16 format). The buffer size for the segment should be at least twice the maximum segment size.
INT	iBufferSize	The size ofÂ pszBuffer.
BOOL	fSourceFile	TRUE Create SGML for a segmented source file. FALSE Create SGML for a segmented target file.

Return code

Integer ofÂ 0Â , if the segment is successfully built, or an error code.

ParsConvert

Purpose

ParsConvertÂ performs an in-place conversion from ASCII to ANSI, or vice versa.

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializefunction.
PARSCONVERSION	Conversion	The conversion mode: ASCIItoANSI ANSItoASCII
CHAR	pszData	The pointer to the zero-terminated data to be converted.
USHORT	usLen	The length of the data to convert.

Return code

Integer ofÂ 0Â , if the conversion is successful, or an error code.

ParsGetDocName

Purpose

ParsGetDocNameÂ returns the long document name.

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
CHAR	pszDocName	The pointer to the buffer that receives the zero-terminated long document name. The size of the buffer should be 256 bytes.

Return code

Integer ofÂ 0Â , if the document name is successfully returned, or an error code.

ParsGetDocLang

Purpose

ParsGetDocLangÂ returns the language settings of the current document.

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
CHAR	pszSourceLang	The pointer to the buffer that receives the zero-terminated source language, or NULL. The buffer size should be 40 bytes or more.
CHAR	pszTargetLang	The pointer to the buffer that receives the zero-terminated target language, or NULL. The buffer size should be 40 bytes or more.

Return code

Integer ofÂ 0Â , if the language setting are successfully returned, or an error code.

ParsSplitSeg

Purpose

ParsSplitSegÂ splits text data into segments by using OpenTM2’s morphological functions. The function looks for segment breaks in the supplied data by applying the morphology for the document source language. The segment breaks are returned as a list of segment breaks. This list is a list of offsets of segment breaks within the data. The last element in this list is zero.

If the buffer for this list is too small, the function returns an error and the first element of the list contains the required size of the list (in number of list elements).

If the text data is in Unicode format, useÂ .

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
CHAR	pszData	The pointer to the zero-terminated text data that is to be split into segments.
USHORT	usDataLength	The length of the text data, as number of characters.
USHORT	pusSegBreaks	The pointer to the buffer that receives the list of segment breaks.
USHORT	usElements	The size of the buffer that receives the list of segment breaks, in number of list elements.

Return code

Integer ofÂ 0Â , if the segment is successfully split, or an error code.

ParsSplitSegW

Purpose

ParsSplitSegWÂ splits text data into segments by using OpenTM2’s morphological functions. The function looks for segment breaks in the supplied data by applying the morphology for the document source language. The segment breaks are returned as a list of segment breaks. This list is a list of offsets of segment breaks within the data. The last element in this list is zero.

If the buffer for this list is too small, the function returns an error and the first element of the list contains the required size of the list (in number of list elements).

If the text data is not in Unicode format, useÂ .

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
WCHAR*	pszData	The pointer to the zero-terminated text data (in Unicode UTF-16 format) that is to be split into segments.
USHORT	usDataLength	The length of the text data, as number of UTF-16 characters.
USHORT	pusSegBreaks	The pointer to the buffer that receives the list of segment breaks.
USHORT	usElements	The size of the buffer that receives the list of segment breaks, in number of list elements.

Return code

Integer ofÂ 0Â , if the segment is successfully split, or an error code.

ParsFreeSegFile

Purpose

ParsFreeSegFileÂ frees a segmented file from memory.

Format

Parameters

Type	Parameter	Description
HPARSSEGFILE	hSegFile	The handle of the segmented file in memory.

Return code

Integer ofÂ 0Â , if the memory is successfully freed, or an error code.

ParsLoadMarkup

Purpose

ParsLoadMarkupÂ loads a markup table into memory for usage with theÂ ParsTokenizeÂ orÂ ParsTokenizeWÂ function. The markup table is loaded from the \eqf\table directory.

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializeÂ function.
HPARSMARKUP*	phMarkup	The pointer to the buffer in memory that receives the markup handle.
CHAR	pszMarkup	The pointer to the zero-terminated markup table name (without path and extension, for example,Â EQFANSIÂ ).

Return code

Integer ofÂ 0Â , if the markup table is successfully loaded, or an error code.

ParsTokenize

Purpose

ParsTokenizeÂ looks for tags in the supplied text area of the markup table loaded into memory. The result is a tag token list that can be processed by theÂ ParsGetNextTokenÂ function.

If the supplied text area is in Unicode format, useÂ ParsTokenizeW.

Format

Parameters

Type	Parameter	Description
HPARSMARKUP	hMarkup	The markup handle, created by theÂ ParsLoadMarkupÂ function.
CHAR*	pszData	The pointer to the zero-terminated text area that is to be tokenized.

Return code

Integer ofÂ 0Â , if the markup table is successfully tokenized, or an error code.

ParsTokenizeW

Purpose

ParsTokenizeWÂ looks for tags in the supplied text area of the markup table loaded into memory. The result is a tag token list that can be processed by theÂ ParsGetNextTokenÂ function. If the supplied text area is not in Unicode format, useÂ ParsTokenize.

Format

Parameters

Type	Parameter	Description
HPARSMARKUP	hMarkup	The markup handle, created by theÂ ParsLoadMarkupÂ function.
WCHAR*	pszData	The pointer to the zero-terminated Unicode text area that is to be tokenized.

Return code

Integer ofÂ 0Â , if the markup table is successfully tokenized, or an error code.

ParsGetNextToken

Purpose

ParsGetNextTokenÂ returns the next token from the token list created by theÂ ParsTokenizeÂ andÂ ParsTokenizeWÂ functions. At the end of the token list a token with a token ID of PARSTOKEN_ENDOFLIST is returned.Â The PARSTOKEN structureÂ describes the token structure in detail.

Format

Parameters

Type	Parameter	Description
HPARSMARKUP	hMarkup	The markup handle, created by theÂ ParsLoadMarkupÂ function.
PPARSTOKEN	pToken	The pointer to a PARSTOKEN structure (seeÂ The PARSTOKEN structure) that receives the data of the token.

Return code

Integer ofÂ 0Â , if the next token is returned, or an error code.

The PARSTOKEN structure

This structure holds the token information of a token that is returned by theÂ ParsGetNextTokenÂ function.

Type	Name	Usage
INT	iTokenID	The token ID of the token returned. The token ID represents the position of the tag in the markup table. A token ID of PARSTOKEN_ENDOFLIST represents the end of the tag token list. A token ID of PARSTOKEN_TEXT (text token) represents text which is not recognized as a tag.
INT	iStart	The start position (in characters, not bytes) of the token in the text area (see … parameterÂ pszDataÂ of theÂ ParsTokenizeÂ orParsTokenizeWÂ function).
INT	iLength	The length of the token (in number of characters, not bytes).
USHORT	usFixedID	A fixed token ID, or NULL if none is specified for the tag in the markup table.
USHORT	usAddInfo	Additional tag information, or NULL if none is specified for the tag in the markup table.
USHORT	usClassID	A Class ID, or NULL if none is specified for the tag in the markup table.

ParsFreeMarkup

Purpose

ParsFreeMarkupÂ frees a markup table loaded with theÂ ParsLoadMarkupÂ function from memory.

Format

Parameters

Type	Parameter	Description
HPARSMARKUP	hMarkup	The markup handle, created by theÂ ParsLoadMarkupfunction.

Return code

Integer ofÂ 0Â , if the markup table is freed from memory, or an error code.

ParsTerminate

Purpose

ParsTerminateÂ terminates the parser API environment.

Format

Parameters

Type	Parameter	Description
HPARSER	hParser	The parser API handle, created by theÂ ParsInitializefunction.

Return code

Integer ofÂ 0Â , if the environment is successfully terminated, or an error code.