Share via


Extending Office PDF Export

Summary: Create a COM add-in for Office 2024, Office LTSC 2024, and Microsoft 365 Version 2408 and later applications with your own logic for exporting to PDF format. The technique described requires knowledge of C++ and COM.

Applies to: Excel, OneNote, PowerPoint, Publisher, Visio, and Word in Office 2024, Office LTSC 2024, Microsoft 365 Version 2408 and later.

Introduction to the Office (2024) Fixed-Format Export Feature

This article explains how third-party software developers can hook in to the fixed-format export feature available in the Office 2024, Office LTSC 2024, Microsoft 365 Version 2408 and later applications so that they can add their own exporter.

The applications include built in exporters for Microsoft XML Paper Specification (XPS) and Portable Document Format (PDF). Fixed-file formats expose the content of a document in a paginated form that is both application-independent and platform-independent.

Software developers can add their own exporter, by writing an Office add-in that implements the IMsoDocExporter COM interface. This article describes IMsoDocExporter and its interaction with a hosting Microsoft 365 application, such as Word.

Fixed-format export has been available since the Office 2007 release, and this article includes information on the features that are new in the Office 2024, Office LTSC 2024, Microsoft 365 Version 2408 releases.

Important

The fixed-format export feature is available in all the applications listed in the preceding Applies to section. However, the discussion below uses Publisher as an example application, except in those cases where an explanation is more relevant to a different application.

Initializing Add-Ins

For the user to access add-in functionality, the add-in should add a new menu item or a new toolbar button to application. When the user selects this menu item or button, the add-in should use the Microsoft Office Object Model to obtain a pointer to the active document. It should then call the active document's ExportAsFixedFormat method with an IUnknown interface pointer that supports the IMsoDocExporter interface through a call to the QueryInterface method. The object model parameter for the interface pointer is a VARIANT with VT_UNKNOWN type.

Note

For OneNote, the add-in calls the Publish method with a string parameter that is the class ID of the add-in's implementation of the IMsoDocExporter interface. OneNote then calls CoCreateInstance with the class ID to get an IUnknown interface pointer from the add-in's class factory.

After Publisher has a pointer to the IMsoDocExporter interface, it calls back the add-in through the methods exposed by IMsoDocExporter. Through these callbacks, Word provides the add-in with document content and other information about the document.

An excellent source of information about building COM add-ins for Microsoft Office applications is the codeproject.com article Building an Office2K COM Add-in with VC++/ATL.

IMsoDocExporter

The IMsoDocExporter interface exposes the following methods.

Table 1. Methods exposed by the IMsoDocExporter interface

Method

Description

HrCreateDoc

Called at the start of the fixed-format export process.

HrAddPageFromEmf

Called to pass the add-in an enhanced metafile (EMF) that represents a rendered view of the content to export.

HrAddDocumentMetadataString

Called to specify string-format metadata for the document.

HrAddDocumentMetadataDate

Called to specify date-format metadata for the document.

HrSetDefaultLcid

Called to specify the default locale ID (LCID) for the content to export.

HrAddOutlineNode

Called to specify user-navigable document outline information.

HrGetPageBreaks

Called to obtain pagination information from the add-in.

HrSetPageHeightForPagination

Called to specify the page height to enable the add-in to paginate the document.

HrFinalize

Called at the end of the fixed-format export process. Allows the add-in to perform any final processing.

HrBeginStructNode

Called to pass the add-in the starting structure for a document-structure node that spans multiple pages.

HrEndStructNode

Called to pass the add-in the ending structure for a document-structure node that spans multiple pages.

EnableCancel

Called to pass the add-in a pointer to an IDocExCancel interface.

GetOutputOption

Called to retrieve fixed-format output options.

SetOutputOption

Called by Office to set fixed-format output options.

SetDocExporterSite

Called to provide the add-in with a pointer to an IMsoDocExporterSite interface for extended color support.

In addition, IMsoDocExporter also exposes the following methods that are inherited from the IUnknown interface.

Table 2. Methods inherited from the IUnknown interface

Method

Description

AddRef

Increments the reference count.

QueryInterface

Returns pointers to supported interfaces. The add-in's implementation of QueryInterface should support returning an IMsoDocExporter interface pointer from IID_IMsoPdfWriter.

Release

Decrements the reference count.

For information about implementing the IUnknown interface methods, see IUnknown (COM).

Call Flow

The following diagram shows the sequence in which Publisher calls the methods exposed in IMsoDocExporter. Not all of the methods are used by each Microsoft Office application and not all of the methods are used for every document that is exported.

Figure 1. Calling methods from the IMsoDocExporter interface

Calling methods from the IMsoDocExporter interface

The following sections further describe the methods exposed by the IMsoDocExporter interface. The methods are described in approximately the order in which they would be called by Publisher.

GetOutputOption and SetOutputOption

Publisher calls the GetOutputOption and SetOutputOption methods to retrieve and set output options for the fixed-format export process.

void GetOutputOption(
    MSODOCEXOPTION docexoption, 
    DWORD* pdwVal 
);
void SetOutputOption(
    MSODOCEXOPTION docexoption, 
    DWORD dwVal 
);

The docexoption parameter specifies the output option and the (p)dwVal parameter specifies the value for the option.

While the built in exporter in Office uses GetOutputOption and SetOutputOption, an add-in can implement its own method of getting and setting options and it own user experience for the options.

Microsoft Office Calls GetOutputOption Only with msodocexOptionTargetDPIColor for Fixed-Format Add-Ins

For the implementation of fixed-format export in Office, Publisher calls the GetOutputOption method to retrieve output options for display to the user in the Publish as PDF or XPS dialog box. For add-ins developed by third-party software developers, Publisher calls GetOutputOption with only the msodocexOptionTargetDPIColor value. This is the only value that an add-in needs to support. If the add-in's implementation of GetOutputOption is called with this value, it should return the target dots-per-inch (DPI) for 3-D effect rasterization.

Microsoft Office Calls SetOutputOption for Fixed-Format Add-Ins

For both the implementation of fixed-format export in Office and for add-in implementations, Publisher calls SetOutputOption at the beginning of the fixed-format export process. In the implementation in Office, the parameter values passed in specify fixed-format output options. However, if the add-in implements its own set of options, the add-in can disregard the options passed to it by Publisher.

EnableCancel

Publisher calls the EnableCancel method to pass the add-in a pointer to an IMsoDocExCancel interface. The add-in can use this interface to query whether a user chooses to cancel a long document-export operation.

void EnableCancel(
    IMsoDocExCancel* pdec
);

HrBeginStructNode

Publisher calls the HrBeginStructNode method to specify the start of a document-structure node for content that encompasses multiple complete pages in the document. Document-structure nodes for elements of the document that reside entirely within a page (for example, paragraphs) are embedded by Publisher in the enhanced metafile (EMF) itself using the DocExComment_BeginStructNode and DocExComment_EndStructNode structures. For more information about document-structure nodes, see the sections HrAddPageFromEmf and DocExComment_BeginStructNode in this article.

HRESULT HrBeginStructNode(
    int idNodeParent, 
    int iSortOrder, 
    const MSODOCEXSTRUCTNODE* pnode, 
    BOOL fNoEndNode
);

The idNodeParent parameter specifies the ID of the node that is the parent of the node being passed to the add-in. If this parameter is 0, the node is located under the root of the document-structure tree. Multiple sibling nodes may be located under the root. If this parameter is -1, the node is located under the currently open node, that is, under the last node specified by HrBeginStructNode that has not been closed by a call to HrEndStructNode.

The iSortOrder parameter specifies the sort order of the structure node among its siblings. No two nodes can have the same sort order. However, the set of integers that constitute the sort order need not be contiguous. A value of -1 indicates that the sibling sort order is the same order in which the nodes appear in the EMF comments.

The pnode parameter points to an MSODOCEXSTRUCTNODE structure, which has the following declaration:

typedef struct _MsoDocexStructNode
{
    int idNode;
    MSODOCEXSTRUCTTYPE nodetype;
    WCHAR* pwchAltText;
    union
    {
        int iHeadingLevel;
        ULONG idPara;
        ULONG idDropCap;
        int iPage;
        WCHAR* pwchActualText;
        MSODOCEXLINEBREAKTYPE bt;
        int iListLevel;
        MSODOCEXLISTTYPE listType;
        ULONG idAtn;
        long cpLim;
        int shapeProperty;
        MsoDocexTableAttr tableAttr;
        long cpNoteRef;
        WCHAR* idTableHeader;
        long cpXchAtnMainDod;
        int iTargetParentId;
        WCHAR* wzMathMlText;
        MsoDocexListAttr* pListAttr;
    };
} MSODOCEXSTRUCTNODE;

The idNode member specifies the ID of the node being passed in the call to HrBeginStructNode. This member may not have a value of 0. A value of -1 indicates that child nodes do not use the idNodeParent parameter to specify this node as their parent. Instead, this node can be a parent only by enclosing child nodes in the EMF. Multiple nodes can have an ID of -1. If the ID is not -1, the value is unique across the document.

The embedded union at the end of the MSODOCEXSTRUCTNODE is interpreted differently depending on the type of node:

  • iHeadingLevel is the heading level for an msodocexStructTypeHeading.
  • idPara is the paragraph id for a P, TOCI, or ListBody.
  • idDropCap is the id of an msodocexStructTypeDropCap.
  • iPage is the page number for an msodocexStructTypePage.
  • bt is the line break type for an msodocexStructTypeTextLine.
  • iListLevel is the list level for an msodocexStructTypeList or msodocexStructTypeListItem.
  • listType is the list type for an msodocexStructTypeListItem.
  • idAtn is the id of an msodocexStructTypeAnnotationBegin or msodocexStructTypeAnnotationEnd.
  • cpLim is used to determine the nesting order of tables within tables for an msodocexStructTypeTable, msodocexStructTypeTOC, or msodocexStructTypeListBody.
  • shapeProperty is for a msodocexStructTypeFigure where the content is a shape, text box, or table cell and contains bit fields from the MSODOCEXSHAPEPROPERTY enumeration.
  • tableAttr is the table cell attributes for a msodocexStructTypeTH or msodocexStructTypeTD.
  • cpNoteRef is used to link msodocexStructTypeIntLinkNoteRef with msodocexStructTypeFootnote/msodocexStructTypeEndnote. This is explained in more detail later in this section.
  • idTableHeader is the unique id for an msodocexStructTypeTH or msodocexStructTypeTD.
  • cpXchAtnMainDod is used to link msodocexStructTypeCommentAnchor with msodocexStructTypeAnnot. This is explained in more detail later in this section.
  • iTargetParentId is the id of the node to reparent an msodocexStructTypeDiagram to.
  • wzMathMlText is MathML string for msodocexStructTypeEquation.
  • pListAttr is list attributes for msodocexStructTypeList.

Note: cpNoteRef, cpXchAtnMainDod, wzMathMlText, and pListAttr are available when Word.Document.ExportAsFixedFormat3 is called with ImproveExportTagging = true. The minimum version required is Microsoft 365 Beta Channel 16.0.18720.20000.

Table 3. Enumerated values of MSODOCEXLINEBREAKTYPE

Value

Description

msodocexLineBreakTypeNormal

Normal line break.

msodocexLineBreakTypeManual

Manual line break.

msodocexLineBreakTypeEOP

End of paragraph.

Table 4. Enumerated values of MSODOCEXLISTTYPE

Value

Description

msodocexListTypeNone

No bullets or numbering.

msodocexListTypeBulletDisc

Disc-shaped bullets.

msodocexListTypeBulletCircle

Circle-shaped bullets.

msodocexListTypeBulletSquare

Square-shaped bullets.

msodocexListTypeBulletDecimal

Decimal numbering.

msodocexListTypeUpperRoman

Uppercase Roman numeral numbering.

msodocexListTypeLowerRoman

Lowercase Roman numberal numbering.

msodocexListTypeUpperAlpha

Uppercase alphabetic numbering.

msodocexListTypeLowerAlpha

Lowercase alphabetic numbering.

Table 5. Enumerated values of MSODOCEXSHAPEPROPERTY bit fields

Value

Numeric Value

Description

msodocexShape

0x00000001

The object is a shape or text box.

msodocexShapeText

0x00000002

The object has non-whitespace text.

msodocexShapePath

0x00000004

The object has a fill and/or outline.

msodocexShapeAltText

0x00000008

The object has Alt Text.

msodocexShapeEquation

0x00000010

The object has text that contains an equation.

msodocexShapeTabelCell

0x00000020

The object is a cell in a table.

MsoDocexTableAttr

The MsoDocexTableAttr structure fits in 32 bits and includes the row and column span and header scope information for a table cell.

struct MsoDocexTableAttr
{
    static constexpr unsigned int MaxSpanBits = sizeof(unsigned int) * 8 / 2 - 1;
    static constexpr unsigned int MaxSpanValue = (1u << MaxSpanBits) - 1;

    unsigned int rowSpan : MaxSpanBits;
    unsigned int fRowScope : 1;
    unsigned int colSpan : MaxSpanBits;
    unsigned int fColScope : 1;
};

The members of MsoDocexTableAttr structure are as follows:

  • MaxSpanBits   Specifies the number of bits available for the rowSpan and colSpan values, which is 15.

  • MaxSpanValue   Specifies the maximum value that can be specified for the rowSpan and colSpan.

  • rowSpan   Specifies the number of rows that a table cell spans.

  • fRowScope   Specifies whether the header is Row/Both or Column.

  • colSpan   Specifies the number of columns that a table cell spans.

  • fColScope   Specifies whether the header is Column/Both or Row.

MsoDocexListAttr

The MsoDocexListAttr structure includes information for a list.

struct MsoDocexListAttr
{
    int iListLevel;
    long cpLim;
};

The members of MsoDocexListAttr structure are as follows:

  • iListLevel Specifies the nesting order of the list.

  • cpLim Specifies the position in the document where the list ends.

Post-processing tips

In some cases the nodes need to be post-processed to achieve the desired results.

Footnotes/endnotes post-processing

During export, footnote/endnote links will be tagged as msodocexStructTypeIntLinkNoteRef. Footnote/endnote bodies will be tagged as msodocexStructTypeFootnote and msodocexStructTypeEndnote respectively, and they will always be top level nodes under the msodocexStructTypeDocument node. msodocexStructTypeIntLinkNoteRef node and the corresponding msodocexStructTypeFootnote/msodocexStructTypeEndnote node will have the same cpNoteRef value. This can be used to move footnote/endnote nodes under their corresponding link nodes to maintain a logical reading order.

Comments post-processing

During export, each comment anchor will be tagged as msodocexStructTypeCommentAnchor. Comment bodies will be tagged as msodocexStructTypeAnnot, and they will always be top level nodes under the msodocexStructTypeDocument node. msodocexStructTypeCommentAnchor node and the corresponding msodocexStructTypeAnnot node will have the same cpXchAtnMainDod value. This can be used to move annotation nodes under their corresponding comment anchor nodes to maintain a logical reading order.

Layout tables post-processing

During export if we detect a table is a layout table, we will tag it as msodocexStructTypeTable but will set the node’s cpLim to -2 (our constant value to indicate this is a layout table). This value can then be used to determine if table nodes should be re-tagged as paragraph nodes.

Nodes spanning pages post-processing (paragraphs, lists, tables)

For paragraphs, idPara values of two para nodes can be checked to determine if they represent the same paragraph across pages. For tables, the cpLim values can be checked to see if they are the same.

For lists, we added a new class to MsoDocexStructNode, MsoDocexListAttr, which contains the cpLim of a list. This can be used to check if two list nodes have the same cpLim which means they both represent the same list in the document.

For table structure nodes, the union is interpreted as an ordering of the table ends relative to other tables by using cpLim, which can be used to determine the nesting order of tables within tables.

In the context of the DocExComment_BeginStructNode, the add-in can ignore the pwchActualText member of this union.

The pwchAltText member specifies alternate text for the structure node.

The fNoEndNode parameter to HrBeginStructNode specifies whether Publisher calls the HrEndStructNode method to mark the end of the structure node. If fNoEndNode is false, then Publisher calls HrEndStructNode to close off the content bounded by the node. If this parameter has a true value, then the node does not bound any content.

The fNoEndNode parameter affects the interpretation of the parent ID value of subsequent nodes. If fNoEndNode is false, nodes inserted between this call to HrBeginStructNode and the subsequent call to HrEndStructNode, and that have a parent ID of -1, are children of this node. However, if fNoEndNode is true, then nodes inserted after this call to HrBeginStructNode, and that have a parent ID of -1, are not children of this node but are children of the next-most-recently specified node that has fNoEndNode equal to false.

Document structure nodes can be nested to arbitrary depth.

The nodes specified by HrBeginStructNode and those specified by DocExComment_BeginStructNode share the same ID space and exist in the same document structure tree. HrBeginStructNode and DocExComment_BeginStructNode are two alternative ways of adding nodes to this tree. For example, if the most recently opened node was opened by HrBeginStructNode and the next node encountered is from a DocExComment_BeginStructNode EMFcommentrecord with idNodeParent equal to -1, it means that the node from HrBeginStructNode is the parent of the node from the DocExComment_BeginStructNode record.

HrEndStructNode

Publisher calls the HrEndStructNode method to specify the end of a document-structure node for content that encompasses multiple pages in the document. The structure node ended by the HrEndStructNode was begun previously by a call to the HrBeginStructNode method. For more information, see HrBeginStructNode in this article.

HRESULT HrEndStructNode();

HrCreateDoc

Publisher calls the HrCreateDoc method to specify the creation of a new, empty fixed-format document.

HRESULT HrCreateDoc(
    const WCHAR* wzDocExFile
);

Publisher calls the HrCreateDoc method at the beginning of the fixed-format export process to specify the creation of an empty fixed-format document. The wzDocExFile parameter specifies a name for the output file to which to write the fixed-format document.

For an add-in implementation, Publisher calls HrCreateDoc with the file name that the add-in provided in the call to the ExportToFixedFormat method in the Microsoft Office object model. However, because add-ins typically provide configuration UI to allow the user to specify an output file name, the add-in could disregard this file name during the export process.

For Microsoft Office applications that require the add-in to paginate the document, HrCreateDoc is called twice, once at the start of the pagination-calling sequence, and again after the add-in has paginated the document. For more information, see the descriptions for the HrSetPageHeightForPagination method and the HrGetPageBreaks method.

HrSetDefaultLcid

Publisher calls the HrSetDefaultLcid method to specify the default locale ID (LCID) for the content to be exported.

HRESULT HrSetDefaultLcid(
    DWORD lcid
);

For a list of valid LCIDs, see List of Locale ID (LCID) Values as Assigned by Microsoft.

HrAddPageFromEmf

Publisher calls the HrAddPageFromEmf method to pass the add-in a handle to an in-memory EMF that represents the content in the document to export.

HRESULT HrAddPageFromEmf(
    HENHMETAFILE hemf
);

The EMF passed by Microsoft Office to the add-in is the primary source of the content that the add-in exports as a fixed-format file. Microsoft Office calls HrAddPageFromEmf once for each page of content in the application's source document.

EMF Comments Convey Semantic Information

An EMF is a sequence of drawing commands (GDI and GDI+ commands) that specify how to render the visual elements of the document. The EMF does not contain any information beyond these commands (for example, "draw an image here," or "draw a line over there"). In particular, conventional EMF do not support semantic aspects of the document, such as hyperlinks, locale information, and accessibility information. To preserve semantic information in the exported document, Publisher injects special records in the EMF. These records contain the semantic information.

The records that represent the semantic information are implemented as special-formatted EMF comments. The EMF format allows for comment record types that are ignored by the rendering engine for Graphics Device Interface (GDI), but can contain arbitrary information.

As an example, consider a document that contains alternate text. (Alternate text is used by document readers to describe images for users with sight impairments.) Publisher injects EMF comments before and after rendering the image, and these EMF comments specify the alternate text for the image. The add-in interprets the comments and writes the information to the fixed-format export file.

The following table shows the semantic records types supported by the Microsoft Office fixed-format export feature. These types are enumerated by the MSODOCEXSTRUCTTYPE enumeration. Each type corresponds to a structure type that describes the format for the record.

Table 6. Semantic record types supported by fixed-format export

Comment Value

Structure Type

msodocexcommentExternalHyperlink

DocExComment_ExternalHyperlink

msodocexcommentExternalHyperlinkRctfv

DocExComment_ExternalHyperlink

msodocexcommentInternalHyperlink

DocExComment_InternalHyperlink

msodocexcommentInternalHyperlinkRctfv

DocExComment_InternalHyperlink

msodocexcommentColorInfo

DocExComment_ColorInfo

msodocexcommentColorMapEnable

DocExComment_ColorEnable

msodocexcommentBeginTextRun

DocExComment_BeginTextRun

msodocexcommentBeginTextRunRTL

DocExComment_BeginTextRun

msodocexcommentEndTextRun

DocExComment_EndTextRun

msodocexcommentBeginStructNode

DocExComment_BeginStructNode

msodocexcommentEndStructNode    

DocExComment_EndStructNode

msodocexcommentUnicodeForNextTextOut

DocExComment_UnicodeForNextTextOut

msodocexcommentUnicodeForNextTextOutRTL

DocExComment_UnicodeForNextTextOut

msodocexcommentEPSColor

DocExComment_EPSColor

msodocexcommentEPSCMYKJPEG

DocExComment_EPSColorCMYKJPEG

msodocexcommentEPSSpotImage

DocExComment_EPSColorSpotImage

msodocexcommentEPSStart

DocExComment_EPSStart

msodocexcommentPageName

DocExComment_PageName

msodocexcommentTransparent

DocExComment_Transparent

DocExComment_ExternalHyperlink(Rctfv)

The DocExComment_ExternalHyperlink(Rctfv) structure describes a hyperlink that links to outside of the document, for example to a Web site on the Internet.

struct DocExComment_ExternalHyperlink
{
    DWORD ident {};
    DWORD iComment {};
    union
    {
        RECT  rcdvRegion;
        struct
        {
            float xLeft;
            float yTop;
            float dxWidth;
            float dyHeight;
        } rctfvRegion;
    };
    WCHAR wzLink[MAX_PATH];
};

The members of DocExComment_ExternalHyperlink(Rctfv) structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentExternalHyperlink or msodocexcommentExternalHyperlinkRctfv.

  • rcdvRegion and rctfvRegion   A union that specifies the region of the page that is the source ___location of the hyperlink. The region can be represented as a RECT type (rcdvRegion) that uses device pixels as the unit of measure, or as a structure that contains floating-point coordinates (rctfvRegion), in which case the unit of measure is points.

    If the iComment member is equal to msodocexcommentExternalHyperlink, the add-in should use rcdvRegion. In this case, the add-in needs to apply the current EMF transformation matrix to rcdvRegion to convert it to the page space.

    If the iComment member is equal to msodocexcommentExternalHyperlinkRctfv, the add-in should use rctfvRegion. In this case, rctfvRegion is already in the page space, so no transformation is needed.

  • wzLink[MAX_PATH]   Specifies the destination URL for this hyperlink.

DocExComment_InternalHyperlink(Rctfv)

The DocExComment_InternalHyperlink(Rctfv) structure describes a hyperlink that links to a ___location within the document. Note that, although Publisher passes a separate EMF for each page of the document, the destination of the hyperlink specified by DocExComment_InternalHyperlink(Rctfv) could be on a different page than the source ___location.

struct DocExComment_InternalHyperlink
{
    DWORD ident {};
    DWORD iComment {};
    union
    {
        RECT  rcdvRegion;
        struct
        {
            float xLeft;
            float yTop;
            float dxWidth;
            float dyHeight;
        } rctfvRegion;
    };
    DWORD iTargetPage {};
    float xtfvTarget {};
    float ytfvTarget {};
    float dytfTargetPage {};
};

The members of DocExComment_InternalHyperlink(Rctfv) structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentInternalHyperlink or msodocexcommentInternalHyperlinkRctfv.

  • rcdvRegion and rctfvRegion   As with the DocExComment_ExternalHyperlink structure, this member is a union that specifies the region of the page that is the source ___location of the hyperlink. The region can be represented as a RECT type (rcdvRegion) that uses device pixels as the unit of measure, or as a structure that contains floating-point coordinates (rctfvRegion), in which case the unit of measure is points.

    If the iComment member is equal to msodocexcommentInternalHyperlink, the add-in should use rcdvRegion. In this case, the add-in needs to apply the current EMF transformation matrix to rcdvRegion to convert it to the page space.

    If the iComment member is equal to msodocexcommentInternalHyperlinkRctfv, the add-in should use rctfvRegion. In this case, rctfvRegion is already in the page space, so no transformation is needed.

  • iTargetPage   Specifies the page number of the destination page within the document.

  • xtfvTarget   Specifies the x-coordinate of the target ___location on the destination page. The unit of measure for this value is points.

  • ytfvTarget   Specifies the y-coordinate of the target ___location on the destination page. The unit of measure for this value is points.

  • dytfTargetPage   The height of the destination page in points. The offset specified by the ytfvTarget member is relative to the upper-left corner of the page. However, some fixed-format types use a coordinate system that is relative to the bottom-left corner of the page. For these types of documents, the page height is required to convert the offset.

DocExComment_ColorInfo

The DocExComment_ColorInfo structure specifies color-state information for the EMF. For more information about this structure, see the section Extended Color Support.

struct DocExComment_ColorInfo
{
    DWORD ident {};
    DWORD iComment {};
    COLORREF clr { 0 };
    BOOL fForeColor {};
};

The members of the DocExComment_ColorInfo structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentColorInfo.

  • clr   Specifies a color ID that represents a current color state in the EMF.

  • fForeColor   Specifies whether the color ID in the clr member represents a foreground color or a background color. If this member has a value of true, the color ID represents a foreground color. If this member has a value of false, the color ID represents a background color.

DocExComment_ColorEnable

The DocExComment_ColorEnable structure specifies whether color mapping is enabled for subsequent content in the EMF. For more information about this structure, see the section Extended Color Support.

struct DocExComment_ColorEnable
{
    DWORD ident {};
    DWORD iComment {};
    BOOL fEnable {};
};

The members of the DocExComment_ColorEnable structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentColorMapEnable.

  • fEnable   Specifies whether color mapping is enabled for subsequent content. A value of true indicates that color mapping is enabled. A value of false indicates that color mapping is disabled.

DocExComment_BeginStructNode

The DocExComment_BeginStructNode structure marks the start of a document structure node. Structure nodes serve one of two possible purposes:

  • Structure nodes can identify the type of content they contain and specify the hierarchical relationship between that content and other content in the document.

  • Structure nodes can specify alternate text for elements in the document.

If the fContentNode member has a true value, the DocExComment_BeginStructNode is followed later in the document by a DocExComment_EndStructNode. The DocExComment_EndStructNode marks the end of the content that is wrapped by the information in the DocExComment_BeginStructNode.

The collection of structure nodes within the document forms a tree; each node has a parent node and may also have sibling nodes. The idNodeParent and iSortOrder members describe the structure of this tree. Note that a child node may or may not appear between the DocExComment_BeginStructNode and DocExComment_EndStructNode structures of the parent node in the EMF.

struct DocExComment_BeginStructNode
{
    DWORD ident {};
    DWORD iComment {};
    int idNodeParent {};
    int iSortOrder {};
    MSODOCEXSTRUCTNODE desn;
    BOOL fContentNode {};
    int cwchAltText {};
};

The members of the DocExComment_BeginStructNode structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentBeginStructNode.

  • idNodeParent   Specifies the ID of the parent node. A value of 0 specifies the root node. A value of -1 specifies the currently open structure node, that is, the enclosing structure node.

  • iSortOrder   Specifies the sort order of the structure node among its sibling nodes. The sort order enables the add-in to order the content correctly in the exported document.

    No two nodes can have the same sort order. However, the set of integers that constitute the sort order do not need to be contiguous.

    A value of -1 indicates that the sibling order is the same order in which the nodes appear in the EMF comments. Note that the order in which the content appears in the EMF is not necessarily the order in which the content is consumed by a user of the document.

  • desn   Specifies a MSODOCEXSTRUCTTYPE structure, which is defined earlier in the document.

The idNode member specifies the ID of the node. This member may not have a value of 0. A value of -1 indicates that child nodes do not use the idNodeParent member to specify this node as their parent. Instead, this node can be a parent only by enclosing child nodes in the EMF. Multiple nodes can have a ID of -1. If the ID is not -1, the value is unique across the document.

The nodetype specifies the type of structure node. This member is equal to one of the values from the MSODOCEXSTRUCTTYPE enumeration type. The following table lists examples of document structure node types.

Table 7. Document structure node types

Type Value

Description

msodocexStructTypePara

A block of text within an article. Its parent node must be an article.

msodocexStructTypeFigure

A graphical element (for example, an image or collection of shapes) that has a textual representation. The textual representation is the alternate text used for reading or searching the document.

msodocexStructTypeArticle

A group of nodes forming a single flow of text that should be read or searched as a contiguous block of content. Some documents have a single article and others have multiple articles.

msodocexStructTypeHeading

A heading in the text.

msodocexStructTypeTable

A block of text forming a table.

msodocexStructTypeTR

A block of text forming a single row of a table.

msodocexStructTypeTD

A block of text forming a single cell in a table row.

msodocexStructTypeTH

A block of text forming a single header cell in a table row.

msodocexStructTypeList

A block of text forming a list.

msodocexStructTypeListItem

A block of text forming a list item.

msodocexStructTypeListBody

A block of text forming the body of a list item.

msodocexStructTypeDocument

A document.

msodocexStructTypePage

A page in the document.

msodocexStructTypeTOC

A table of contents.

msodocexStructTypeTOCI

An item in a table of contents.

msodocexStructTypeExtLink

A link to an external resource.

msodocexStructTypeIntLink

A link to an internal resource.

msodocexStructTypeFootnote

A footnote.

msodocexStructTypeEndnote

An endnote.

msodocexStructTypeTextbox

A text box.

msodocexStructTypeHeader

A block of text forming a header.

msodocexStructTypeFooter

A footer.

msodocexStructInlineShape

An inline shape.

msodocexStructAnnotation

An annotation.

msodocexStructTypeSpanBlock

A block of text.

msodocexStructTypeWorkbook

A workbook.

msodocexStructTypeWorksheet

A worksheet.

msodocexStructTypeMacrosheet

A macrosheet.

msodocexStructTypeChartsheet

A chartsheet.

msodocexStructTypeDialogsheet

A dialogsheet.

msodocexStructTypeSlide

A slide.

msodocexStructTypeChart

A chart.

msodocexStructTypeDiagram

A SmartArt diagram.

msodocexStructTypeBulletText

Buller text.

msodocexStructTypeTextLine

A line of text.

msodocexStructTypeDropCap

A drop cap.

msodocexStructTypeSection

A section.

msodocexStructTypeAnnotationBegin

The beginning of an annotation.

msodocexStructTypeAnnotationEnd

The end of an annotation.

msodocexStructTypeParaRTLAttr

A block of text within an article with right-to-left layout.

msodocexStructTypeTableRTLAttr

A block of text forming a table with right-to-left layout.

msodocexStructTypeHeadingRTLAttr

A heading in the text with right-to-left layout.

msodocexStructTypeListItemRTLAttr

A block of text forming a list item with right-to-left layout.

msodocexStructTypeParaUnannotatableAttr

A block of text within an article that is not annotatable.

msodocexStructTypeTHead

The header row area in a table.

msodocexStructTypeTBody

The body area in a table, i.e. the portion between the THead and TFoot.

msodocexStructTypeLabel

A label.

msodocexStructTypeEquation

An equation.

msodocexStructTypeIntLinkNoteRef

A footnote or endnote reference mark link.

msodocexStructTypeTFoot

The footer row area in a table.

msodocexStructTypeTitle

A title in the text.

msodocexStructTypeBlockQuote

A paragraph quote or intense quote.

msodocexStructTypeCommentAnchor

Some text that is linked to a comment.

msodocexStructTypeAnnot

Contents of a single comment.

msodocexStructTypeQuote

An inline quote.

msodocexStructTypeCaption

A caption for an equation/figure/table.

Note: msodocexStructTypeTitle, msodocexStructTypeBlockQuote, msodocexStructTypeCommentAnchor, msodocexStructTypeAnnot, msodocexStructTypeQuote, and msodocexStructTypeCaption are available when Word.Document.ExportAsFixedFormat3 is called with ImproveExportTagging = true. The minimum version required is Microsoft 365 Beta Channel 16.0.18720.20000.

fContentNode   Specifies whether a DocExComment_EndStructNode structure marks the end of this structure node. If fContentNode is true, a DocExComment_EndStructNode structure closes off the content bounded by the node. If this fContentNode has a false value, then the node does not bound any content.

The fContentNode member affects the interpretation of the parent ID value of subsequent nodes. If fContentNodeis true, nodes that are inserted between this DocExComment_BeginStructNode and a subsequent DocExComment_EndStructNode, and that have a parent ID of -1, are children of this node. However, if fContentNode is true, nodes inserted after this DocExComment_BeginStructNode, and that have a parent ID of -1, are not children of this node. They are children of the next-most-recently specified node that has fContentNode equal to false.

You can nest document structure nodes to arbitrary depth.

cwchAltText   Specifies the number of Unicode characters in the block of alternate text that follows the structure. This Unicode string specifies alternate text for the node (for example, alternate text for an image).

DocExComment_EndStructNode

The DocExComment_EndStructNode structure marks the end of the content that is decorated by the information in the DocExComment_BeginStructNode.

struct DocExComment_EndStructNode
{
    DWORD ident {};
    DWORD iComment {};
};    

The members of the DocExComment_EndStructNode structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEndStructNode.

DocExComment_BeginTextRun

The DocExComment_BeginTextRun structure identifies the language of a sequence of text in the document and provides the Unicode code points for the text.

Although some text-rendering EMF records use Unicode as the text representation, others use the glyphs that are drawn on the screen, rather than the original source text. A glyph is the index of a given shape in the font, which can be different from font to font.

There can be cases where several Unicode code points are combined into a single glyph or where a single Unicode code point is broken into multiple glyphs. Because the mapping from code points to glyphs is context-dependent, a user cannot text search or copy/paste in a document that contains only glyphs. Therefore, Publisher sometimes provides the Unicode text as well as the glyphs.

struct DocExComment_BeginTextRun
{
    DWORD ident {};
    DWORD iComment {};
    DWORD lcid {};
    int cGlyphIndex {};
    int cwchActualText {};
};

The members of the DocExComment_BeginTextRun structure are as follows:

  • Ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentBeginTextRun.

  • lcid   Specifies the LCID for the text sequence.

  • cGlyphIndex   Specifies the size of an array that follows this structure. This array implements a glyph index table that maps Unicode code points in the actual text to the corresponding glyphs in the EMF. Each element of the array corresponds to a code point in the text. The value of that element specifies the first glyph used to render that code point in the EMF. Two or more adjacent code points may have the same value in the array, which means that they both resolve to the same glyph. The value can also be 0, which means that this code point does not map to any glyph.

  • cwchActualText   Specifies the size of the sequence of Unicode code points that follow the glyph index table. This is the text that a consumer of the document can use for searching, copying/pasting, and accessibility. The value of this member can be 0, which means that no Unicode text is provided.

DocExComment_EndTextRun

The DocExComment_EndTextRun structure marks the end of a text sequence, the beginning of which was marked by a DocExComment_BeginTextRun structure.

struct DocExComment_EndTextRun
{
    DWORD ident {};
    DWORD iComment {};
};    

The members of the DocExComment_EndTextRun structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEndTextRun.

DocExComment_UnicodeForNextTextOut

The DocExComment_UnicodeForNextTextOut structure functions similarly to the DocExComment_BeginTextRun and DocExComment_EndTextRun structures. However, DocExComment_UnicodeForNextTextOut specifies Unicode code points for only the following EMF TextOut record, rather than for a block of EMF content bounded by begin and end structures.

struct DocExComment_UnicodeForNextTextOut
{
    DWORD ident {};
    DWORD iComment {};
    int cGlyphIndex {};
    int cwchActualText {};
};

The members of the DocExComment_UnicodeForNextTextOut structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentUnicodeForNextTextOut.

  • cGlyphIndex   Specifies the size of an array that follows this structure. This array implements a glyph index table that maps Unicode code points in the actual text to the corresponding glyphs in the EMF. Each element of the array corresponds to a code point in the text. The value of that element specifies the first glyph used to render that code point in the EMF. Two or more adjacent code points may have the same value in the array, which means that they both resolve to the same glyph.

  • cwchActualText   Specifies the size of the sequence of Unicode code points that follow the glyph index table. This is the text that a consumer of the document can use for searching, copying/pasting, and accessibility.

DocExComment_EPSColor

The DocExComment_EPSColor structure specifies color information for an encapsulated PostScript (EPS) file embedded in the EMF. For more information about this structure, see the section Extended Color Support.

typedef struct
{
    DWORD ident {};
    DWORD iComment {};
    BYTE colorInfo[];
} DocExComment_EPSColor;

The members of the DocExComment_EPSColor structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEPSColor.

  • colorInfo[]   Specifies the color information for the EPS file. The add-in should pass this information to Publisher using the IMsoDocExporterSite::SetEPSInfo method.

DocExComment_EPSColorCMYKJPEG

The DocExComment_EPSColorCMYKJPEG structure specifies the start, in the EMF, of a binary object that is a CMYKJPEG file stream. For more information about this structure, see the section Extended Color Support.

typedef struct
{
    DWORD ident {};
    DWORD iComment {};
} DocExComment_EPSColorCMYKJPEG;

The members of the DocExComment_EPSColorCMYKJPEG structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEPSCMYKJPEG;

DocExComment_EPSColorSpotImage

The DocExComment_EPSColorSpotImage structure provides spot color information for the subsequent RGB image. For more information about this structure, see the section Extended Color Support.

typedef struct
{
    DWORD ident {};
    DWORD iComment {};
    COLORREF cmykAlt { 0 };
    COLORREF rgbAlt { 0 };
    float flTintMin {};
    float flTintMax {};
    char szSpotName[1];
} DocExComment_EPSColorSpotImage;

The members of the DocExComment_EPSColorSpotImage structure are as follows:

  • ident   Specifies the constant value, msodocexsignature, which identifies this EMF comment as containing semantic information.

  • iComment   Specifies the MSODOCEXCOMMENT value, msodocexcommentEPSSpotImage.

  • cmykAlt   Specifies a CMYK color ID.

  • rgbAlt   Specifies an RGB color ID.

  • flTintMin   Specifies the minimum tint.

  • flTintMax   Specifies the maximum tint.

  • szSpotName[1]   Specifies a variable length, zero-terminated string that contains the spot name.

Extended Color Support

To support extended color spaces in Publisher, additional EMF semantic records and interfaces are needed because EMF only supports RGB (red-green-black) colors. Extended color spaces include CMYK (cyan-magenta-yellow-black) and spot color space, which are commonly used in commercial printing.

Publisher uses color mapping to represent extended colors in the document EMF. Publisher builds a color table for all colors used in the document and replaces actual colors with color IDs in the EMF. The type for the color ID is COLORREF, which is the same type that is used for RGB color. For information about the COLORREF structure, see COLORREF.

To resolve color IDs in the EMF back to the extend color space, the add-in calls back to Publisher through the HrResolveColor method of the IMsoDocExporterSite interface. The add-in passes Publisher an interface pointer to an IDOCEXCOLOR interface as one of the parameters to HrResolveColor. Publisher takes the color IDs, also specified in the call to HrResolveColor, converts them to extended color (RGB, CMYK, or spot color), and passes them back to the add-in through the methods in the IDOCEXCOLOR interface.

Vector Color and Recolored Images

Vector colors are any COLORREF values that the add-in receives from Publisher. For example, text color, line stroke color, and color for metafile recolor. When color mapping is enabled, Publisher uses a color ID for COLORREF rather than a real RGB color value. If Publisher provides the add-in an IMsoDocExporterSite interface pointer by calling the SetDocExporterSite method of the IMsoDocExporter interface, the add-in should always call the IMsoDocExporterSite::HrResolveColor method to convert the COLORREF to an extended color, which the add-in receives through the methods in the IDOCEXCOLOR interface.

To support vector color mapping, the add-in needs to do the following:

  • Implement class support for an IDOCEXCOLOR interface. The methods in this interface enable Publisher to pass extended color back to the add-in.

  • Cache the following color state values from the semantic records in the EMF.

  • Set foreground color for recoloring. This is set through the DocExComment_ColorInfo structure.

  • Set background color for recoloring. This is set through the DocExComment_ColorInfo structure.

  • Determine when color mapping is enabled. This is set through the DocExComment_ColorEnable structure.

  • For a vector color, create an IDOCEXCOLOR interface with the color ID, so that IDOCEXCOLOR::GetUnresolvedRGB returns the color ID. The add-in should call the IMsoDocExporterSite::HrResolveColor method with the IDOCEXCOLOR interface and cached color states. Publisher calls the IDOCEXCOLOR interface methods with the final color, which can be RGB, CMYK, spot, or registration tint.

  • When either foreground color or background color for recoloring is specified from an EMF semantic record, the add-in should recolor images in the add-in (for example, metafiles or raster pictures).

Non-Recolored Images

EMF supports CMYK images using GDI+. Therefore, images in the EMF may be either RGB or CMYK. If the image is a CMYK image, the add-in needs to convert the image to the target color space.

Publisher maintains a target color space for the document. The add-in can use this target color space by calling the IMsoDocExporterSite::HrConvertImageColorSpace method with the image's color space.

Color from EPS Files

Encapsulated Postscript (EPS) is a metafile type that supports extended color spaces. User who embed EPS images in a Publisher document expect the color information to be used in the fixed-format output. Inside Publisher, the EPS is converted to an EMF with EPS-related semantic records. This EMF is then embedded in the page EMF file that the application passes to the add-in.

To support color in EPS files, the add-in needs to do the following:

  • Call the IMsoDocExporterSite::SetEPSInfo method for DocExComment_EPSColor records encountered in the EMF.

  • Extract the CMYK image from the DocExComment_EPSColorCMYKJPEG record in the EMF. This record contains a binary object that is the actual CMYK JPEG file stream. Use it to replace the RGB image specified in the subsequent call to the StretchDIBits function.

  • The DocExComment_EPSColorSpotImage record provides spot color information for the subsequent RGB image, which is always an index image. The add-in needs to convert the spot image to the target color space.

  • The add-in can optionally call the IMsoDocExporterSite:: HrGetSpotRecolorInfo method to obtain the document's target color from Publisher. Then the add-in can recolor the subsequent RGB image by mapping colors from the palette of the RGB image to flTintMin and flTintMax tints specified in the DoxExComment_EPSColorSpotImage record. The luminosity for each color of the palette is used for the mapping.

Note that the DocExComment_EPSStart record is only informational. The add-in can ignore this record.

SetDocExporterSite

Publisher calls SetDocExporterSite to provide the add-in with a pointer to an IMsoDocExporterSite interface. The IMsoDocExporterSite interface exposes methods that enable extended color support.

void SetDocExporterSite(
    IMsoDocExporterSite* pDocExporterSite
);

The pDocExporterSite parameter specifies the interface pointer to the IMsoDocExporterSite interface.

HrSetPageHeightForPagination

An application can call the HrSetPageHeightForPagination method to specify the page height in points.

HRESULT HrSetPageHeightForPagination(
    float dytfPageHeight
);

Some applications maintain the user's document in an unpaginated format. In these cases, the add-in paginates the document using the page height specified by the application in the call to HrSetPageHeightForPagination. The dytfPageHeight parameter specifies the page height in points.

After specifying the page height information, the application passes the add-in the entire document as a single in-memory EMF file in a call to HrAddPageFromEmf. The add-in then uses the page-height and EMF file to paginate the document.

The add-in returns the pagination information back to the application in subsequent calls to the HrGetPageBreaks method.

HrGetPageBreaks

An application can call the HrGetPageBreaks method to obtain the number and ___location of page breaks for documents that are paginated by the add-in.

HRESULT HrGetPageBreaks(
    float* rgdytfPageBreaks,
    int* pcchPageBreaks,
    BOOL* pfCanTrustLastBreakIsEndOfDocument
);

After the add-in paginates a document using the page height specified by the HrSetPageHeightForPagination method, it returns the pagination information in subsequent calls that the application makes to the HrGetPageBreaks method.

The rgdytfPageBreaks parameter is a pointer to an array of float values that specify the locations of the page breaks in points. The first element in the array (index 0) is the ___location of the first page break, the second element is the ___location of the second page break, and so on. Therefore, the values of these elements are successively increasing.

The pcchPageBreaks parameter is a pointer to an integer value that specifies the number of page breaks in the document.

The pfCanTrustLastBreakIsEndOfDocument parameter specifies whether the ___location of the last page break is the end of the document or the beginning of the last page of the document. A true value indicates that the last page break is the end of the document.

The application calls HrGetPageBreaks twice to obtain the pagination information. On the first call, the application calls HrGetPageBreaks to obtain the number of page breaks.

HrGetPageBreaks(NULL, &nPageBreaks, NULL);

The application then calls HrGetPageBreaks a second time to obtain the actual locations. On the second call, the application passes a buffer of sufficient size to hold the array of page-break locations.

HrGetPageBreaks(rgPageBreaks, &nPageBreaks, fCanStopAtLastPageBreak);

After receiving the page break information from the add-in, the application re-initiates the fixed-format export process, beginning with a call to the HrCreateDoc method, followed by a call to HrAddPageFromEmf for each of the pages given by the page-break information.

HrAddOutlineNode

Publisher calls the HrAddOutlineNode method to pass the add-in a structure that describes a node within a user-navigable outline for the exported document.

HRESULT HrAddOutlineNode(
    int idNodeParent 
    const MSODOCEXOUTLINENODE* pNode
);    

The fixed-format export code can use the information passed by the HrAddOutlineNode method to construct a user-navigable outline of the export document. From the user's perspective, each node in the outline is represented by some title text that maps to a particular ___location within the document.

Each call to HrAddOutlineNode specifies information for a single node in this outline. Each node is identified by a node ID that is unique within the outline. An ID of 0 is reserved for the root node. The outline is hierarchical, that is, it has a tree structure in which each node has a single parent and zero or more child nodes.

The first parameter to HrAddOutlineNode provides the ID of the node that is the parent of the node being passed in.

Publisher always calls HrAddOutlineNode for a parent node before calling the method for any of the parent node's children. In other words, the export code is assured of already having the node information for the node identified by the idNodeParent parameter. The only exception is the initial call to HrAddOutlineNode that specifies the root node. For this call, the value of idNodeParent is 0.

Additional information that the export code needs for each node is passed by HrAddOutlineNode in an MSODOCEXOUTLINENODE structure pointed to by the pNode parameter.

typedef struct _MsoDocexOutlineNode
{
    int idNode {};
    WCHAR rgwchNodeText[cwchMaxNodeText];
    int iDestPage {};
    float dytfvDestPage {};
    float dxtfvDestOffset {};
    float dytfvDestOffset {};
} MSODOCEXOUTLINENODE;

The members of the MSODOCEXOUTLINENODE are described as follows:

  • idNode   The ID for the node. A value of -1 indicates that this node cannot have child nodes in the outline. Otherwise, this member has a value that is unique across the document.

  • rgwchNodeText   A Unicode string that represents the title text for each node. This text is not required to be unique across the outline.

  • iDestPage   The page number of the page that contains the destination ___location within the document.

  • dytfvDestPage   The height of the destination page in points. The offset specified by the dytfvDestOffset member is relative to the upper-left corner of the page. However, some fixed-format types use a coordinate system that is relative to the bottom-left corner of the page. For these types of documents, the page height is required to convert the offset.

  • dxtfvDestOffset   The horizontal offset of the destination ___location on the destination page.

  • dytfvDestOffset   The vertical offset of the destination ___location on the destination page.

HrAddDocumentMetadataString

Publisher calls the HrAddDocumentMetadataString method to specify document metadata in the form of a Unicode string.

HRESULT HrAddDocumentMetadataString(
    MSODOCEXMETADATA metadataType, 
    const WCHAR* pwchValue
);

The metadatatype parameter specifies the type of metadata represented by the string. The metadatatype parameter must be one of the following values from the MSODOCEXMETADATA enumeration type.

Table 8. Enumerated values of MSODOCEXMETADATA

Value

Description

msodocexMetadataTitle

The title of the document.

msodocexMetadataAuthor

The author of the document

msodocexMetadataSubject

String that describes the subject matter of the document (for example, business or science).

msodocexMetadataKeywords

Keyword relevant to the document content.

msodocexMetadataCreator

The creator of the document, possibly distinct from the author.

msodocexMetadataProducer

The producer of the document, possibly distinct from the author or creator.

msodocexMetadataCategory

String that describes the type of document (for example, memo, article, or book).

msodocexMetadataStatus

Status of the document. This field can reflect where the document is in the publication process (for example, draft or final).

msodocexMetadataComments

Miscellaneous comments relevant to the document.

For a given document, each metadata type can have only one string associated with it. So, for example, if the document has multiple keywords, they are passed to the add-in as one concatenated string.

The pwchValue parameter specifies a Unicode string that contains the metadata itself.

How the add-in incorporates the text-string metadata into the exported document depends on the implementation details of the export code and the type of fixed-format used in the exported document.

HrAddDocumentMetadataDate

Publisher calls the HrAddDocumentMetadataDate method to specify document metadata in the form of a FILETIME structure.

HRESULT HrAddDocumentMetadataDate(
    MSODOCEXMETADATA metadataType, 
    const FILETIME* pftLocalTime
);

The metadatatype parameter specifies the type of metadata represented by the FILETIME structure. The metadatatype parameter must be one of the following values from the MSODOCEXMETADATA enumeration type.

Table 9. Enumerated values of MSODOCEXMETADATA

Value

Description

msodocexMetadataCreationDate

The creation date for the document.

msodocexMetadataModDate

The last-modified date for the document.

The pftLocalTime parameter specifies a pointer to a FILETIME structure that contains the date and time information for the metadata. The following code snippet demonstrates how to extract this information from the structure.

SYSTEMTIME st = { 0 };
WCHAR s[100];
FileTimeToSystemTime(pfiletime, &st);
swprintf(s, 99, L" %04d-%02d-%02dT%02d:%02d:%02dZ", st.wYear % 10000, 
    st.wMonth % 100, st.wDay % 100, st.wHour % 100, st.wMinute % 100, 
    st.wSecond % 100);

How the add-in incorporates the date and time metadata into the exported document depends on the implementation details of the export code and the type of fixed-format used in the exported document.

HrFinalize

Publisher calls the HrFinalize method at the end of the document-export process.

HRESULT HrFinalize();

The code that implements fixed-format export should use HrFinalize to perform tasks such as flushing data buffers, writing remaining data to disk, and freeing memory and other resources.

Conclusion

You can extend the fixed-format export feature of Office applications by implementing the IMsoDocExporter interface. The methods of this interface provide a channel for Office applications to communicate to the add-in the visual content and semantic information in the document to export. The visual content of the document is provided to the add-in as one or more in-memory enhanced metafiles. The semantic information is provided as specially formatted comment records within this EMF. Additional methods in the interface enable Office applications to communicate metadata and structural information about the document.

Additional Resources

For more information, see the following resources: