All About MFC Serialization

0
28

Introduction

The world of data structures is a vast one. And when we need to write and read those enormous blobs of data to or from the disk, memory, or sockets, MFC serialization is a powerful tool in every programmer’s tool box.

Background

Serialization was part of the MFC (Microsoft Foundation Classes) library since its very first introduction, but I felt it has never received its proper dues because it was largely undocumented. SDK samples that demonstrated the serialization were very limited and covered serialization of the plain old data and CObject derived classes and collections. However with the right extensions we can serialize any data structure in existence, STL collections, user defined collections, any collections (including flat C style arrays). It is undoubtedly is the most powerful, efficient, and blazingly fast way to store and retrieve hierarchical data to and from the disk, memory, or sockets. MFC Serialization supports read write to the disk, memory, and sockets. Writing to the memory is very useful for inter process communications such as clipboard cut/copy/paste operations and writing to sockets is useful when networking with remote machines. I will cover in this article plain old MFC serialization with MFC provided classes, how to serialize STL collections, how to serialize plain Windows SDK data structures, how to serialize C style arrays, how to serialize to process and shared memory and how to serialize to and from sockets. Also I will demonstrate how to use MFC Serialization with or without Document/View architecture such as inside the console applications and TCP/IP servers.

What is Serialization

MSDN documentation gives us the best description:

Serialization is the process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.

MFC serialization implements binary and text serializations. Binary handled via shift operators (<>) and WriteObject / ReadObject functions. Text serialization is handled with ReadString / WriteString functions.

MFC serialization provides serialization of C++ CObject derived classes with versioning. With the right extensions it can provide serialization for non CObject derived classes. However the versioning in those cases need to be handled manually.

How does it work

In the heart of the MFC serialization lays the CArchive object. CArchive has no base class and it is tightly coupled to work with CFile and CFile derived classes, such as CSocketFile, CSharedFile, or CMemFile. CArchive internally encapsulates an array of bytes that are dynamically grown as needed and are written or read to or from the CFile or CFile derived object.

  • CFile – provides serialization to or from disk
  • CMemFile – provides serialization to or from process memory
  • CSharedFile – provides serialization to or from processes shared memory which is accessible by the other processes
  • CSocketFile – provides serialization to or from CSocket for network communications
  • You can also serialize over Named pipes, RPC and other Windows inter process communication mechanisms

CArchive provides serialization of plain old data and C++ CObject derived classes with versioning. To make a CObject class serializable all you need is to add a macro:

 DECLARE_SERIAL(CRoot)   IMPLEMENT_SERIAL(CRoot, CObject, VERSIONABLE_SCHEMA | 1)

Those two macros are adding global extraction operator >> (which calls to CArchive::ReadObject), static function CreateObject, and CRuntimeClass member variable to your class. CRuntimeClass structure has m_lpszClassName member which stores the text representation of your class name. CRuntimeClass also has m_wSchema that holds version information of your class.

These macros internally expand to the following code

 public: static CRuntimeClass classCRoot; virtual CRuntimeClass* GetRuntimeClass() const; static CObject* PASCAL CreateObject(); AFX_API friend CArchive& AFXAPI operator >> (CArchive& ar, CRoot* &pOb);       CObject* PASCAL CRoot::CreateObject() { return new CRoot; }   extern AFX_CLASSINIT _init_CRoot;   AFX_CLASSINIT _init_CRoot (RUNTIME_CLASS(CRoot));   CArchive& AFXAPI operator >> (CArchive& ar, CRoot * &pOb) { pOb = (CRoot *)ar.ReadObject(RUNTIME_CLASS(CRoot)); return ar; }   AFX_COMDAT CRuntimeClass CRoot::classCRoot = { "CRoot",   sizeof(class CRoot),   VERSIONABLE_SCHEMA | 1,   CRoot::CreateObject,   RUNTIME_CLASS(CObject),   NULL,   &_init_CRoot  };   CRuntimeClass* CRoot::GetRuntimeClass() const { return RUNTIME_CLASS(CRoot); }

There is no insertion operator << because CArchive stores CObject derived class through the base class pointer declared in the global namespace.

CArchive& AFXAPI operator<<(CArchive& ar, const CObject* pOb);

Plain old data is handled rather straightforward. Here is an example of reading and writing float data type:

 CArchive& CArchive::operator<<(float f) { if(!IsStoring()) AfxThrowArchiveException(CArchiveException::readOnly,m_strFileName); if (m_lpBufCur + sizeof(float) > m_lpBufMax) Flush(); *(UNALIGNED float*)m_lpBufCur = f; m_lpBufCur += sizeof(float); return *this; }

Following code is loading code for the float data type

 CArchive& CArchive::operator>>(float& f) { if(!IsLoading()) AfxThrowArchiveException(CArchiveException::writeOnly,m_strFileName); if (m_lpBufCur + sizeof(float) > m_lpBufMax) FillBuffer(UINT(sizeof(float) - (m_lpBufMax - m_lpBufCur))); f = *(UNALIGNED float*)m_lpBufCur; m_lpBufCur += sizeof(float); return *this; }

Reading and writing CObject derived classes a bit bore complex. And it will be covered in the next sections.

Word of Caution

Because all data is stored in a continuous byte buffer it must be read in the exact same order as it was stored. Failure to do so will result in CArchiveException thrown during load.

Why can’t I call ar.GetObjectSchema() multiple times?

To simply put it you cannot call GetObjectSchema more than once per object load for the following reason.

 UINT CArchive::GetObjectSchema() { UINT nResult = m_nObjectSchema; m_nObjectSchema = (UINT)-1;  return nResult; }

As to why this is so? My best guess a legacy issues. Member variable CArchive::m_nObjectSchema is very different from CRuntimeClass::m_wSchema in a way that the CArchive object schema is read from the file which can potentially contain many objects with many schemas. It holds schema of an object which is currently being read from a file. Think about it. When you de serialize object such as in the following example (Hypothetically m_nObjectSchema left alone):

void CMyClass::Serialize(CArchive& ar) { if (ar.IsStoring()) { } else { UINT nSchema = ar.GetObjectSchema(); switch(nSchema) { case 1: ar >> m_pObject1; ar >> m_pObject2; ar >> m_pObject3; ar >> m_pObject4; } }   if(ar.IsLoading()) { UINT nSchema = ar.GetObjectSchema(); } }

The object schema in the above example has been changed 4 times by the time you finished the loading section of the code. My guess is to eliminate subtle erroneous behavior the MFC framework decided to cut it short at the very source instead of programmers scratching their head as to why their precious data was hosed away.

The GetObjectSchema can only be called once per object load because framework forcefully resets it to (UINT)-1 after each call to the CArchive::GetObjectSchema.

Even the above example in today’s MFC library is fool proof. Listing from the CArchive::ReadObject has following code

     TRY { pOb = pClassRef->CreateObject(); UINT nSchemaSave = m_nObjectSchema;  m_nObjectSchema = nSchema;  pOb->Serialize(*this);   m_nObjectSchema = nSchemaSave;  }

As you can see it saves current m_nObjectSchema into the nSchemaSave. Assigns current object schema to the m_nObjectSchema. Call Serialize. Pop saved schema back into the m_nObjectSchema. Thus the object schema will never go astray.

Serializing Base and Derived Classes

There are four ways to go around of serialization of the derived and base classes in MFC.

But first let’s look first at the subtle problem. Back in a day of the 16 bit MFC implementation the disk space was a precious commodity, as were the RAM. Thus no matter how many derived classes you have in the class hierarchy, their object schema will be always equal to the final child class schema and will be written only once!

 class CBase : public CObject { DECLARE_SERIAL(CBase) public: int m_i; float m_f; double m_d;   virtual void Serialize(CArchive& ar); };   class CDerived : public CBase { DECLARE_SERIAL(CDerived) public: long m_l; unsigned short m_us; long long m_ll;   virtual void Serialize(CArchive& ar); };   IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1)    void CBase::Serialize(CArchive& ar) { if (ar.IsStoring()) { } else { UINT nSchema = ar.GetObjectSchema();      switch (nSchema) { case 1: ar >> m_i; ar >> m_f; ar >> m_d; break; } } }   IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)    void CDerived::Serialize(CArchive& ar) { CBase::Serialize(ar);   if (ar.IsStoring()) { } else { UINT nSchema = ar.GetObjectSchema();      switch (nSchema) { case 1: case 2: ar >> m_l; ar >> m_us; ar >> m_ll; break; } } }

Why is that? Quick look at the binary file dump reveals that for the CSerializableDerived class the schema is written only once and it is always equals to the instantiated object schema. In this case it is equal CSerializableDerived class schema even if the base class schema equals to something else.

Tracing into the CArchive::WriteObject reveals to us this code:

     CRuntimeClass* pClassRef = pOb->GetRuntimeClass();  WriteClass(pClassRef);    

Tracing into the CArchive::WriteClass framework first writes wNewClassTag WORD value which is equal to 0xFFFF. Then it calls CRuntimeClass::Store function

     *this << wNewClassTag;  pClassRef->Store(*this);    

The CRuntimeClass::Store function obtains the length of the class name and writes object schema followed by the length of the class name and the class name itself. Herein lies the answer to the queston why the object schema written only once for the derived most class.

 void CRuntimeClass::Store(CArchive& ar) const { WORD nLen = (WORD)AtlStrLen(m_lpszClassName); ar << (WORD)m_wSchema << nLen; ar.Write(m_lpszClassName, nLen*sizeof(char)); }

After CRuntimeClass information was written to the file the framework finally calls virtual Serialize function of our object:

     ((CObject*)pOb)->Serialize(*this);    

Exact opposite happens during object load. First the extraction operator is called. This operator is provided by the IMPLEMENT_SERIAL macro.

 CArchive& AFXAPI operator >> (CArchive& ar, CSerializableDerived* &pOb) { pOb = (CSerializableDerived*)ar.ReadObject(RUNTIME_CLASS(CSerializableDerived)); return ar; }

Tracing into the CArchive::ReadObject reveals us following code

       UINT nSchema; DWORD obTag; CRuntimeClass* pClassRef = ReadClass(pClassRefRequested, &nSchema, &obTag);   

CArchive::ReadClass function first reads the object tag

     DWORD obTag; WORD wTag; *this >> wTag;  if (wTag == wBigObjectTag) *this >> obTag; else obTag = ((wTag & wClassTag) << 16) | (wTag & ~wClassTag);     CRuntimeClass* pClassRef; UINT nSchema; if (wTag == wNewClassTag) {   if ((pClassRef = CRuntimeClass::Load(*this, &nSchema)) == NULL) AfxThrowArchiveException(CArchiveException::badClass, m_strFileName);   }  

Following is the listing of the CRuntimeClass::Load function. Please note that the class name cannot exceed 64 characters. If the length of the class name is greater or equal to 64 characters or the CArchive::Read failed to read the class name from the file then function returns NULL. If the class name successfully read from a file the szClassName is NULL terminated at the nLen length value and is looked up in the CRuntimeClass::FromName

 CRuntimeClass* PASCAL CRuntimeClass::Load(CArchive& ar, UINT* pwSchemaNum) { if(pwSchemaNum == NULL) { return NULL; } WORD nLen; char szClassName[64];   WORD wTemp;  ar >> wTemp; *pwSchemaNum = wTemp;   ar >> nLen;    if (nLen >= _countof(szClassName) || ar.Read(szClassName, nLen*sizeof(char)) != nLen*sizeof(char)) { return NULL; } szClassName[nLen] = '\0';   CRuntimeClass* pClass = FromName(szClassName); if (pClass == NULL) { TRACE(traceAppMsg, 0, "Warning: Cannot load %hs from archive. Class not defined.\n", szClassName); }   return pClass; }  

CRuntimeClass::FromName simply iterates through the AFX_MODULE_STATE::m_classList and does a comparison by name. If the class found CRuntimeClass pointer is returned. AFX_MODULE_STATE CRuntimeClass discovery is whole another topic that deserves its own article. But suffice it to say that this feature was implemented prior to RTTI (Run Time Type Information) compiler support and it allows runtime type discovery of the MFC classes with RTTI compiler switch turned off. As a matter of fact default setting for the Visual C++ 6.0 RTTI switch was off.

 CRuntimeClass* PASCAL CRuntimeClass::FromName(LPCSTR lpszClassName) { CRuntimeClass* pClass=NULL;   ENSURE(lpszClassName);   AFX_MODULE_STATE* pModuleState = AfxGetModuleState(); AfxLockGlobals(CRIT_RUNTIMECLASSLIST); for (pClass = pModuleState->m_classList; pClass != NULL;  pClass = pClass->m_pNextClass)  {  if (lstrcmpA(lpszClassName, pClass->m_lpszClassName) == 0)  {  AfxUnlockGlobals(CRIT_RUNTIMECLASSLIST);  return pClass;  }  } AfxUnlockGlobals(CRIT_RUNTIMECLASSLIST); #ifdef _AFXDLL  AfxLockGlobals(CRIT_DYNLINKLIST); for (CDynLinkLibrary* pDLL = pModuleState->m_libraryList; pDLL != NULL;  pDLL = pDLL->m_pNextDLL)  {  for (pClass = pDLL->m_classList; pClass != NULL;  pClass = pClass->m_pNextClass)  {  if (lstrcmpA(lpszClassName, pClass->m_lpszClassName) == 0)  {  AfxUnlockGlobals(CRIT_DYNLINKLIST);  return pClass;  }  }  } AfxUnlockGlobals(CRIT_DYNLINKLIST); #endif   return NULL; }  

Back into the CArchive::ReadClass it returns back CRuntimeClass, pSchema, and pObTag pointers.

       if (pSchema != NULL) *pSchema = nSchema; else m_nObjectSchema = nSchema;   if (pObTag != NULL) *pObTag = obTag;   return pClassRef;    

After CRuntimeClass pointer were successfully obtained the framework calls CreateObject which is provided by the DECLARE_SERIAL and IMPLEMENT_SERIAL macros.

  • stores current CArchive::m_nObjectScema into the nSchemaSave
  • Assigns current CRuntimeClass schema to the CArchive::m_nObjectSchema
  • Calls virtual Serialize function
  • Pops the nSchemaSave back into the CArchive::m_nObjectSchema
     TRY { pOb = pClassRef->CreateObject(); UINT nSchemaSave = m_nObjectSchema;  m_nObjectSchema = nSchema;  pOb->Serialize(*this);   m_nObjectSchema = nSchemaSave;  ASSERT_VALID(pOb); }

So now you know why your class will only have one schema regardless of how many classes you have in your class hierarchy.

How do we address this issue? There are four ways to go around it. Some are more elegant then the others. Let us look at all of those. Of course this applies only to the cases when you must maintain versions throughout all of your classes. The easiest way is not to version anything however in the real life if your application life expectancy measured in decades it is absolutely imperative to maintain versioning right from the start.

1st Solution: Do all serialization in the derived class

This is less elegant solution but it works and eliminates all surprises. For our above example this code will look like this:

  IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)   void CDerived::Serialize(CArchive& ar) {   if (ar.IsStoring()) {   ar << m_i;  ar << m_f;  ar << m_d;   ar << m_l; ar << m_us; ar << m_ll;   } else { UINT nSchema = ar.GetObjectSchema();   switch (nSchema) { case 1: case 2:   ar >> m_i;  ar >> m_f;  ar >> m_d;   ar >> m_l; ar >> m_us; ar >> m_ll; break; } } }

This solution is not very pretty. And if your base class has too many members your Serialize function can potentially be enormous.

2nd Solution: Pop the schema back into the CArchive

This solution a bit more elegant however you would still need to increment schemas in all of base classes when schema changes.

 IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1)    void CBase::Serialize(CArchive& ar) { if (ar.IsStoring()) { } else { UINT nSchema = ar.GetObjectSchema();      switch (nSchema) { case 1: case 2:  ar >> m_i; ar >> m_f; ar >> m_d; break; }     ar.SetObjectSchema(nSchema); } }   IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2)   void CDerived::Serialize(CArchive& ar) { CBase::Serialize(ar);   if (ar.IsStoring()) { } else { UINT nSchema = ar.GetObjectSchema();   switch (nSchema) { case 1: case 2: ar >> m_l; ar >> m_us; ar >> m_ll; break; } } }

3rd Solution: Consider Overhaul of Serialize function with Don’t Call Us, We Will Call You design pattern

Adding private virtual function SerializeImpl(CArchive& ar, UINT nSchema) will eliminate need to call CArchive::GetObjectSchema more than once.

 class CBase : public CObject { DECLARE_SERIAL(CBase) public: int m_i; float m_f; double m_d; virtual void Serialize(CArchive& ar); private: virtual void SerializeImpl(CArchive& ar, UINT nSchema); }; class CDerived : public CBase { DECLARE_SERIAL(CDerived) public: long m_l; unsigned short m_us; long long m_ll; private: virtual void SerializeImpl(CArchive& ar, UINT nSchema); }; IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1) void CBase::Serialize(CArchive& ar) { if (ar.IsStoring()) {  SerializeImpl(ar, (UINT)-1); } else { UINT nSchema = ar.GetObjectSchema(); switch (nSchema) { case 1: case 2:  ar >> m_i; ar >> m_f; ar >> m_d; break; }  SerializeImpl(ar, nSchema); } } void CBase::SerializImpl(CArchive& ar, UINT nSchema) { } IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2) void CDerived::SerializImpl(CArchive& ar, UINT nSchema) { CBase::SerializImpl(ar, nShema); if (ar.IsStoring()) { } else { switch (nSchema) { case 1: case 2: ar >> m_l; ar >> m_us; ar >> m_ll; break; } } }

This is somewhat more elegant but it will still require us to increment version number in the all of the base classes when schema changes.

And here comes the most elegant solution.

4th Solution: Store your base class schema as the 1st member of your class

Now this solution addresses the shortcomings of the MFC serialization mechanism. You have access to your base class schema via member variable static classCBase::m_wSchema in our example.

 IMPLEMENT_SERIAL(CBase, CObject, VERSIONABLE_SCHEMA | 1)  void CBase::Serialize(CArchive& ar) { if (ar.IsStoring()) { WORD wSchema = (WORD)classCBase.m_wSchema; ar << wSchema; ar << m_i; ar << m_f; ar << m_d; } else { WORD wSchema = 0; ar >> wSchema; switch (wSchema)  { case 1: ar >> m_i; ar >> m_f; ar >> m_d; break; } } } IMPLEMENT_SERIAL(CDerived, CBase, VERSIONABLE_SCHEMA | 2) void CDerived:: Serialize(CArchive& ar) { CBase::Serialize(ar); if (ar.IsStoring()) { } else { UINT nSchema = ar.GetObjectSchema(); switch (nSchema) { case 1: case 2: ar >> m_l; ar >> m_us; ar >> m_ll; break; } } }

This is the most elegant solution because it frees you from the maintenance of the base classes at the cost of adding a sizeof(WORD) to you file per every parent class.

Serializing Pure Base Class

Suppose you have a CObject derived class with pure virtual functons.

 class CPureBase : public CObject { DECLARE_SERIAL(CPureBase) public: CPureBase(); virtual ~CPureBase(); virtual void Serialize(CArchive& ar);   virtual CString CanSerialize() const = 0; virtual CString GetObjectSchema() const = 0; virtual CString GetObjectRunTimeName() const = 0; };

Under normal circumstances this will not work because IMPLEMENT_SERIAL macro will add the following function to your code:

 CObject* PASCAL CPureBase::CreateObject() { return new CPureBase; }

To work around this issue we would need to create our own version of the IMPLEMENT_SERIAL macro that will return nullptr from the CreateObject function.

 #define IMPLEMENT_SERIAL_PURE_BASE(class_name, base_class_name, wSchema)\ CObject* PASCAL class_name::CreateObject() \  { return nullptr; } \ extern AFX_CLASSINIT _init_##class_name; \ _IMPLEMENT_RUNTIMECLASS(class_name, base_class_name, wSchema, \ class_name::CreateObject, &_init_##class_name) \ AFX_CLASSINIT _init_##class_name(RUNTIME_CLASS(class_name)); \ CArchive& AFXAPI operator>>(CArchive& ar, class_name* &pOb) \ { pOb = (class_name*) ar.ReadObject(RUNTIME_CLASS(class_name)); \ return ar; }

Now you can declare your pure base class serializable.

 IMPLEMENT_SERIAL_PURE_BASE(CPureBase, CObject, VERSIONABLE_SCHEMA | 1)   CPureBase::CPureBase() { }   CPureBase::~CPureBase() { }   void CPureBase::Serialize(CArchive& ar) { if (ar.IsStoring()) { } else { } }

Serializing with Document/View

This type of serialization is the most covered in MFC literature. If you have application with the document view architecture, serialization is already part of the CDocument derived class. Serialize override provides necessary code. Typical structure of the code looks like this:

  void CSerializeDemoDoc::Serialize(CArchive& ar) { if (ar.IsStoring()) { ar << m_pRoot; } else { ar >> m_pRoot; } }

Serializing without Document/View

To serialize without the Document / View say in the console application you would need to add following code to write to the file

 CFile file;   if (!file.Open(_T("Test.my_ext"), CFile::modeCreate | CFile::modeReadWrite | CFile::shareExclusive)) return false;   CArchive ar(&file, CArchive::store | CArchive::bNoFlushOnDelete);   ar << val;   ar.Close();   file.Close();  

To de serialize or read without the Document / View use following code

 CFile file;   if (!file.Open(_T("Test.my_ext"), CFile::modeRead | CFile::shareExclusive)) return false;   CArchive ar(&file, CArchive::load);   ar >> val;   ar.Close(); file.Close();

Just in a few lines of code you have harnessed the power of the CArchive object.

Serializing plain old data types

CArchive provides following insertion and extraction operators to handle the plain old data storage and retrieval.

 CArchive& operator<<(BYTE by); CArchive& operator<<(WORD w); CArchive& operator<<(LONG l); CArchive& operator<<(DWORD dw); CArchive& operator<<(float f); CArchive& operator<<(double d); CArchive& operator<<(LONGLONG dwdw); CArchive& operator<<(ULONGLONG dwdw);   CArchive& operator<<(int i); CArchive& operator<<(short w); CArchive& operator<<(char ch); #ifdef _NATIVE_WCHAR_T_DEFINED  CArchive& operator<<(wchar_t ch); #endif  CArchive& operator<<(unsigned u);   template < typename BaseType , bool t_bMFCDLL> CArchive& operator<<(const ATL::CSimpleStringT& str);   template< typename BaseType, class StringTraits > CArchive& operator<<(const ATL::CStringT& str);   template < typename BaseType , bool t_bMFCDLL> CArchive& operator>>(ATL::CSimpleStringT& str);   template< typename BaseType, class StringTraits > CArchive& operator>>(ATL::CStringT& str);   CArchive& operator<<(bool b);   CArchive& operator>>(BYTE& by); CArchive& operator>>(WORD& w); CArchive& operator>>(DWORD& dw); CArchive& operator>>(LONG& l); CArchive& operator>>(float& f); CArchive& operator>>(double& d); CArchive& operator>>(LONGLONG& dwdw); CArchive& operator>>(ULONGLONG& dwdw);   CArchive& operator>>(int& i); CArchive& operator>>(short& w); CArchive& operator>>(char& ch); #ifdef _NATIVE_WCHAR_T_DEFINED  CArchive& operator>>(wchar_t& ch); #endif  CArchive& operator>>(unsigned& u); CArchive& operator>>(bool& b); ...

If you need to serialize data types which are not declared in the CArchive object, you would need to write your own implementation. We will look at this a bit later when I cover serializing Windows SDK structures.

Serializing CArray template collection

MFC provides serialization support for nearly all of its collection and in order to serialize MFC collections all you need to do is to call collection’s version of Serialize(CArchive& ar). CArray is different because it is a template and the template type isn’t known ahead. And the type potentially may or may not be derived from CObject. Default implementation of the CArray::Serialize function is listed below. All it does is writes size of the CArray during write operation and reads size of the CArray from disk and resizes CArray during read operation. It then kindly forwards the call to SerializeElements() function.

 template<class TYPE, class ARG_TYPE> void CArray::Serialize(CArchive& ar) { ASSERT_VALID(this);   CObject::Serialize(ar); if (ar.IsStoring()) { ar.WriteCount(m_nSize); } else { DWORD_PTR nOldSize = ar.ReadCount(); SetSize(nOldSize, -1); } SerializeElements(ar, m_pData, m_nSize); }

The user must provide appropriate implementation of the SerializeElements() for the type being stored or retrieved from the archive. Following listing demonstrates SerializeElements implementation for CAge class. Please refer to the SerializeDemo project for the implementation details.

 class CAge : public CObject { DECLARE_SERIAL(CAge) public: CAge(); CAge(int nAge); virtual ~CAge(); virtual void Serialize(CArchive& ar);   UINT m_nAge; };     template inline void AFXAPI SerializeElements(CArchive& ar, CAge** pAge, INT_PTR nCount) { for (INT_PTR i = 0; i < nCount; i++, pAge++) { if (ar.IsStoring()) { ar << *pAge; } else { CAge* p = nullptr; ar >> p; *pAge = p; } } }

Serializing to and from the process memory

Serialization to and from memory is supported via CMemFile. CMemFile does not require a file name.

 CMemFile file; CArchive ar(&file, CArchive::store);   ar << val;   ar.Close();  

Serialization from the memory done in the following manner

 CMemFile file;   file.Attach(m_aBytes.GetData(), m_aBytes.GetSize()); CArchive ar(&file, CArchive::load);   ar >> val;   ar.Close();  

Serializing to and from the shared process memory

Serialization to and from memory is supported via CSharedFile. This is very useful if you want to transfer your serialized object to the clipboard for pasting into another instance of your application or for passing it to another application.

 UINT m_nClipboardFormat = RegisterClipboardFormat(_T("MY_APP_DATA"));   CSharedFile file(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT); CArchive ar(&file, CArchive::store | CArchive::bNoFlushOnDelete);    GetDocument()->Serialize(ar);   EmptyClipboard(); SetClipboardData(m_nClipboardFormat, file.Detach()); CloseClipboard();   ar.Close(); file.Close();

Serialization from the shared memory paste operation from the clipboard:

   UINT m_nClipboardFormat = RegisterClipboardFormat(_T("MY_APP_DATA"));   if (!OpenClipboard()) return;   CSharedFile file(GMEM_MOVEABLE | GMEM_SHARE | GMEM_ZEROINIT); HGLOBAL hMem = GetClipboardData(m_nClipboardFormat);   if (hMem == nullptr) { CloseClipboard(); return; }   file.SetHandle(hMem);   CArchive ar(&file, CArchive::load);    GetDocument()->DeleteContents(); GetDocument()->Serialize(ar);   CloseClipboard();   ar.Close(); file.Close();  

Serializing to and from the sockets

Serialization to and from sockets is done via the CSocketFile class. You can serialize CArchive into the CSocket only if the CSocket is of the type SOCK_STREAM. This topic is a bit more complex than it is described in the MSDN documentation. Official documentation describes that you can write and read to the CSocket with the CSocketFile. This is true for the write operation but for the read operation this is not necessarily true. If your transmitted data size is a few bytes only then yes you can use CSocketFile for the receiving the data. However if you data size is in megabytes (or any size greater than the reading buffer) then you will likely to receive the data in several reads and you will have to accumulate all of it into the CByteArray structure first and only after all the data has been received you can attach it to the CMemFile rather than CSocketFile and de serialize. Trying to read partial data from the CSocketFile usually results in CArchiveException.

 CSocket sock;   if (!sock.Create()) return;   if (!sock.Connect(_T("127.0.0.1"), 1011)) return;   CSocketFile file(&sock); CArchive ar(&file, CArchive::store | CArchive::bNoFlushOnDelete);   ar << m_pRoot;   ar.Close(); file.Close(); sock.Close();  

Serialization from the socket is a bit more complicated. I am giving the full listing of the class to demonstrate how to properly read large binary data set from the socket. For the full source code listing please refer to the example project SerializeTcpServer.

   class CSockThread;   class CRecvSocket : public CSocket { public: CRecvSocket(); virtual ~CRecvSocket(); virtual void OnReceive(int nErrorCode);   CSockThread* m_pThread;  CByteArray m_aBytes;    private: DWORD m_dwReads;   void Display(CRoot* pRoot); };     #define INCOMING_BUFFER_SIZE 65536   CRecvSocket::CRecvSocket(): m_pThread(nullptr) , m_dwReads(0) { }   CRecvSocket::~CRecvSocket() { }   void CRecvSocket::OnReceive(int nErrorCode) { Sleep(10);   BYTE btBuffer[INCOMING_BUFFER_SIZE] = { 0 };     int nRead = Receive(btBuffer, INCOMING_BUFFER_SIZE);   switch (nRead) { case 0: m_pThread->PostThreadMessage(WM_QUIT, 0, 0); break; case SOCKET_ERROR: if (GetLastError() != WSAEWOULDBLOCK) { m_pThread->PostThreadMessage(WM_QUIT, 0, 0); } break; default: m_dwReads++;   CByteArray aBytes;      aBytes.SetSize(nRead);      CopyMemory(aBytes.GetData(), btBuffer, nRead);      m_aBytes.Append(aBytes);   DWORD dwReceived = 0; if (IOCtl(FIONREAD, &dwReceived)) { if (dwReceived == 0) { CMemFile file;  file.Attach(m_aBytes.GetData(), m_aBytes.GetSize());  CArchive ar(&file, CArchive::load);  CRoot* pRoot = nullptr;    TRY  {  ar >> pRoot;  }  CATCH(CArchiveException, e)  {  std::cout << "Error reading data " << std::endl;  }  END_CATCH if (pRoot) { Display(pRoot); delete pRoot; } ar.Close(); file.Close();   m_pThread->PostThreadMessage(WM_QUIT, 0, 0); } } } CSocket::OnReceive(nErrorCode); }

In today’s applications you will rarely receive all transmission of the binary or text data in just one OnReceive call. Thus you need to accumulate all of the data into the array of bytes. And only then you can successfully de serialize it by attaching the accumulated CByteArray to the CMemFile. The above example calls IOCtl(FIONREAD, &dwReceived) to determine if more data is inbound. The rule of thumb is this: because our reading buffer is equal to the 65536 bytes any data transmitted greater than the reading buffer will result in more than one read.

The CSockThread* m_pThread; implementation is provided in the example project SerializeTcpServer.

Serializing arbitrary byte stream

Arbitrary byte stream is basically any binary file that you do not know or do not care about its internal structure. An example is that you want to store a JPEG images or mpeg 4 movies files inside of your class data without any knowledge of the underlying data structure. You may de serialize it later and use it with the appropriate application. The MFC serialization allows you to easily store such data.

In the following code we will store the byte stream of four JPEG pictures

 class CMyPicture : public CObject { DECLARE_SERIAL(CMyPicture) public: CMyPicture(); virtual ~CMyPicture(); virtual void Serialize(CArchive& ar);   CString GetHeader() const;   CString m_strName; CString m_strNewName; CByteArray m_bytes; }; typedef CTypedPtrArray CMyPictureArray;

Following listing is the body of the class

 IMPLEMENT_SERIAL(CMyPicture, CObject, VERSIONABLE_SCHEMA | 1)   CMyPicture::CMyPicture() { }   CMyPicture::~CMyPicture() { }   void CMyPicture::Serialize(CArchive& ar) { if (ar.IsStoring()) { ar << m_strName; ar << m_strNewName; } else { UINT nSchema = ar.GetObjectSchema(); switch (nSchema) { case 1: ar >> m_strName; ar >> m_strNewName; break; } }   m_bytes.Serialize(ar); }  

To populate such a class with the JPEG image data all you need to do is following

 m_aPictures.Add(InitPicture("Water lilies.jpg", "Water lilies Output.jpg")); m_aPictures.Add(InitPicture("Blue hills.jpg", "Blue hills Output.jpg")); m_aPictures.Add(InitPicture("Sunset.jpg", "Sunset Output.jpg")); m_aPictures.Add(InitPicture("Winter.jpg", "Winter Output.jpg"));   UpdateAllViews(nullptr, HINT_GENERATED_DATA); SetModifiedFlag(); }   std::vector CSerializeDemoDoc::ReadBinaryFile(const char* filename) { std::basic_ifstream file(filename, std::ios::binary); return std::vector((std::istreambuf_iterator(file)), std::istreambuf_iterator()); }   CMyPicture* CSerializeDemoDoc::InitPicture(const char* sFileName, const char* sOutFileName) { std::vector vJPG = ReadBinaryFile(sFileName); CMyPicture* pPicture = new CMyPicture; pPicture->m_strName = sFileName; pPicture->m_strNewName = sOutFileName; pPicture->m_bytes.SetSize(vJPG.size()); CopyMemory(pPicture->m_bytes.GetData(), (void*)&vJPG[0], vJPG.size() * sizeof(BYTE)); return pPicture; }   void CSerializeDemoDoc::OnTestdataWriteimagedatatodisk() { for (INT_PTR i = 0; i < m_pRoot->m_aPictures.GetSize(); i++) { CMyPicture* pPic = m_pRoot->m_aPictures.GetAt(i);   std::ofstream fout(pPic->m_strNewName, std::ios::out | std::ios::binary); fout.write((char*)pPic->m_bytes.GetData(), pPic->m_bytes.GetSize()); fout.close(); }   AfxMessageBox(_T("Finished writing images back to disk"), MB_ICONINFORMATION); }

Serializing Windows SDK data structures

Serialization of the Windows SDK structures is not provided by the CArchive class. However it is nearly effortless to add a support for such serialization. Following is the code demonstrates how to serialize LOGFONT SDK structure.

   inline CArchive& AFXAPI operator <<(CArchive& ar, const LOGFONT& lf) { CString strFace(lf.lfFaceName);   ar << lf.lfHeight; ar << lf.lfWidth; ar << lf.lfEscapement; ar << lf.lfOrientation; ar << lf.lfWeight; ar << lf.lfItalic; ar << lf.lfUnderline; ar << lf.lfStrikeOut; ar << lf.lfCharSet; ar << lf.lfOutPrecision; ar << lf.lfClipPrecision; ar << lf.lfQuality; ar << lf.lfPitchAndFamily; ar << strFace;   return ar; }   inline CArchive& AFXAPI operator >> (CArchive& ar, LOGFONT& lf) { CString strFace;   ar >> lf.lfHeight; ar >> lf.lfWidth; ar >> lf.lfEscapement; ar >> lf.lfOrientation; ar >> lf.lfWeight; ar >> lf.lfItalic; ar >> lf.lfUnderline; ar >> lf.lfStrikeOut; ar >> lf.lfCharSet; ar >> lf.lfOutPrecision; ar >> lf.lfClipPrecision; ar >> lf.lfQuality; ar >> lf.lfPitchAndFamily; ar >> strFace; _tcscpy_s(lf.lfFaceName, strFace);   return ar; }

After you have defined the LOGFONT extraction and insertion operators all you need to do is following code snippet.

   void CRoot::Serialize(CArchive& ar) { CBase::Serialize(ar);   if (ar.IsStoring()) { ar << m_lf; } else { UINT nSchema = ar.GetObjectSchema(); switch (nSchema) { case 1: ar >> m_lf; break; } } }

Next code snippet serializes WINDOWPLACEMENT SDK structure:

   inline CArchive& AFXAPI operator <<(CArchive& ar, const WINDOWPLACEMENT& val) { ar << val.flags; ar << val.length; ar << val.ptMaxPosition.x; ar << val.ptMaxPosition.y; ar << val.ptMinPosition.x; ar << val.ptMinPosition.y; ar << val.rcNormalPosition.bottom; ar << val.rcNormalPosition.left; ar << val.rcNormalPosition.right; ar << val.rcNormalPosition.top; ar << val.showCmd;   return ar; }   inline CArchive& AFXAPI operator >> (CArchive& ar, WINDOWPLACEMENT& val) { ar >> val.flags; ar >> val.length; ar >> val.ptMaxPosition.x; ar >> val.ptMaxPosition.y; ar >> val.ptMinPosition.x; ar >> val.ptMinPosition.y; ar >> val.rcNormalPosition.bottom; ar >> val.rcNormalPosition.left; ar >> val.rcNormalPosition.right; ar >> val.rcNormalPosition.top; ar >> val.showCmd;   return ar; }

Then reading and writing the WINDOWPLACEMENT structure becomes as trivial as this

 void CRoot::Serialize(CArchive& ar) { CBase::Serialize(ar);   if (ar.IsStoring()) { ar << m_wp; } else { UINT nSchema = ar.GetObjectSchema(); switch (nSchema) { case 1: ar >> m_wp; break; } } }

Serializing STL collections

Serialization of the STL collection is just as trivial as the serialization of the SDK data structures. Let’s define insertions and extractions operators for the popular STL collections. To serialize std::vector we would need following definitions

 inline CArchive& AFXAPI operator <<(CArchive& ar, const std::vector& val) { ar << (int)val.size(); for each (int k in val) { ar << k; } return ar; }

To read the STL vector back into the std::vector we do the following

 inline CArchive& AFXAPI operator >> (CArchive& ar, std::vector& val) { int nSize; ar >> nSize; val.resize(nSize); for (size_t i = 0; i < (size_t)nSize; i++) { ar >> val[i]; } return ar; }

Serialization of the std::map collection. First we store the size of the map. Because underlying element of the std::map is a std::pair we store the first and the second members of the pair.

   inline CArchive& AFXAPI operator <<(CArchive& ar, const std::map& val) { ar << (int)val.size(); for each (std::pair k in val) { ar << k.first; ar << k.second; } return ar; }

Reading code for the std::map as follows.

 inline CArchive& AFXAPI operator >> (CArchive& ar, std::map& val) { int nSize; ar >> nSize; for (size_t i = 0; i < (size_t)nSize; i++) { std::pair k; ar >> k.first; ar >> k.second; val.insert(k); } return ar; }

Serialization of the STL fixed size std::array.

 inline CArchive& AFXAPI operator <<(CArchive& ar, const std::array& val) { for each (int k in val) { ar << k; } return ar; }

std::array reading operator.

 inline CArchive& AFXAPI operator >> (CArchive& ar, std::array& val) { for (size_t i = 0; i < (size_t)val.size(); i++) { ar >> val[i]; } return ar; }

Serialization of the std::set collection.

 inline CArchive& AFXAPI operator <<(CArchive& ar, const std::set& val) { ar << (int)val.size(); for each (std::string k in val) { ar << CStringA(k.c_str()); } return ar; }

Reading code of the std::set collection.

 inline CArchive& AFXAPI operator >> (CArchive& ar, std::set& val) { int nSize; ar >> nSize; for (size_t i = 0; i < (size_t)nSize; i++) { CStringA str; ar >> str; val.insert(std::string(str)); } return ar; }

Serializing STL data types

Serialization of the STL types is just as trivial as the serialization of the SDK data structures. First we need an extraction and the insertion operator definition. To serialize or de serialize std::string we need to add following operators:

   inline CArchive& AFXAPI operator <<(CArchive& ar, const std::string& val) { ar << CStringA(k.c_str()); return ar; }

De serialize std::string:

 inline CArchive& AFXAPI operator >> (CArchive& ar, std::string& val) { CStringA str; ar >> str; val = str; return ar; }

I will stop here with the STL data and containers serialization implementation. When you saw one STL collection and one STL type serialized, you have seen them all. I will leave it to the reader as an exercise to serialize std::pair, std::tuple, std::unordered_map etc.

Serializing flat C style arrays

To serialize flat C arrays you will follow the same procedure as with serializing collection. But because flat C style array has known size there is no need to store its size in the file.

 inline CArchive& AFXAPI operator <<(CArchive& ar, float val[3]) { for(int i = 0; i < 3; i++) { ar << val[i]; } return ar; }

Reading flat C style array.

 inline CArchive& AFXAPI operator >> (CArchive& ar, float val[3]) { for (size_t i = 0; i < 3; i++) { ar >> val[i]; } return ar; }

Serializing enumerated types

To serialize enumeration you really need an extraction operator because when inserting an enumeration implicitly converted into an int. But providing both the insertion and extraction operators for enumeration results in the much more cleaner solution and potentially eliminates nasty surprises in the future.

 enum EMyTestEnum { ENUM_0, ENUM_1, };

Write enumeration code.

 inline CArchive& AFXAPI operator <<(CArchive& ar, const EMyTestEnum& val) { int iTemp = val; ar << iTemp; return ar; }

Read enumeration code.

 inline CArchive& AFXAPI operator >> (CArchive& ar, EMyTestEnum& val) { int iTmp = 0; ar >> iTmp; val = (EMyTestEnum)iTmp; return ar; }

Serialization versioning for CObject derived classes

This is rather interesting topic and versioning of the CObject derived can be done in two ways. Let assume we have a class whose version is constantly evolving as the new features are implemented into the core application.

 class CMyObject : public CObject { DECLARE_SERIAL(CMyObject) public: CMyObject(); virtual ~CMyObject(); virtual void Serialize(CArchive& ar); float m_f; double m_d;   COLORREF m_backColor; COLORREF m_foreColor;   CString m_strDescription; CString m_strNotes; };

To serialize such an object and still being able to read the Versions 1, 2, and 3 older files, we can implement this in the following ways.

 IMPLEMENT_SERIAL(CMyObject, CObject, VERSIONABLE_SCHEMA | 4)   void CMyObject::Serialize(CArchive& ar) { if (ar.IsStoring()) { ar << m_f; ar << m_d; ar << m_backColor; ar << m_foreColor; ar << m_strDescription; ar << m_strNotes; } else { UINT nSchema = ar.GetObjectSchema(); switch (nSchema) { case 1: ar >> m_f; ar >> m_d; break; case 2: ar >> m_f; ar >> m_d; ar >> m_backColor; ar >> m_foreColor; break; case 3: ar >> m_f; ar >> m_d; ar >> m_backColor; ar >> m_foreColor; ar >> m_strDescription; break; case 4: ar >> m_f; ar >> m_d; ar >> m_backColor; ar >> m_foreColor; ar >> m_strDescription; ar >> m_strNotes; break; } } }

This approach although crystal clear is tedious at best. There is much of the repetitive code. Another approach is to load this data in reverse and let the switch case statement to fall through to the correct version of the file.

 IMPLEMENT_SERIAL(CMyObject, CObject, VERSIONABLE_SCHEMA | 4)   void CMyObject::Serialize(CArchive& ar) { if (ar.IsStoring()) {   ar << m_strNotes; ar << m_strDescription; ar << m_backColor; ar << m_foreColor; ar << m_f; ar << m_d; } else { UINT nSchema = ar.GetObjectSchema(); switch (nSchema) { case 4: ar >> m_strNotes; case 3: ar >> m_strDescription; case 2: ar >> m_backColor; ar >> m_foreColor; case 1: ar >> m_f; ar >> m_d; break; } } }

This is much cleaner versioning solution that eliminates all of the repetitive code.

Serialization versioning for non CObject classes

To serialize non CObject derived class we simply will follow same rule as with the Windows SDK structures.

 class CMyObject { public: CMyObject(); virtual ~CMyObject();   static const short VERSION = 1; float m_f; double m_d; };

Write the version number as the very first member. Then when reading depending what is the version inside the file you can take it through appropriate read procedure that corresponds to the version loaded.

   inline CArchive& AFXAPI operator <<(CArchive& ar, const CMyObject & val) { ar << val.VERSION; ar << val.m_f; ar << val.m_d; return ar; }   inline CArchive& AFXAPI operator >> (CArchive& ar, CMyObject & val) { short nVersion = 0;   ar >> nVersion; switch(nVersion) { case 1: ar >> val.m_f; ar >> val.m_d; break; } return ar; }

Caveats

Do not serialize WIN32 and WIN64 typedefs ever! If you upgrade your application to the 64 bit and try to read a file which was created with the 32 bit version of the application, which happened to serialize WIN32/64 typedefs (such as DWORD_PTR) it will fail miserably. Because DWORD_PTR on the 32 bit architecture is 4 bytes long and 8 bytes long on WIN64 so reading 4 bytes into the 8 bytes and vice versa will result in CArchiveException and it will make your file useless to another bit aligned version of your application. Serialize only hard known types. If you must use 64 bit integer then serialize it as __int64 explicitly in both 32 and 64 bit versions of your application. This is especially concerning if you are serializing SDK structures. You will need to carefully examine structure declaration and if there are potentially WIN32/64 typedefs present, explicitly cast them to the largest size if you building 32 bit application and plan to upgrade it to 64 bit in the future.

Stick to either to UNICODE or ANSI period. If for whatever reason you must maintain both ANSI and UNICODE versions of your application then serialize exclusively either CStringA or CStringW so another version can read the file. Suffice it to say that string such as “hello” will be stored as 5 bytes long in ANSI string but 10 bytes long for the UNICODE version.

Link to MFC statically to eliminate runtime dependency from the MFCXX.DLL, or any other 3rd party library for that matter. Hypothetically if the sizeof(WhateverClass) has changed in a newer version of the 3rd party DLL and your application dynamically linked to it plus serializes it, your application will fail to read the file. Better safe than sorry. So if you are not in control of the 3rd party library code, then link to it statically. A little planning ahead goes a long way.

Using the code

I have supplied the SerializeDemo solution project that demonstrates all aspects described in this article. This solution contains 4 subprojects:

  • SerializeData – houses data structures and operators that are used by all projects
  • SerializeDemo – MFC Document / View application
  • SerializeTcpServer – a console server application running on a local host "127.0.0.1" port 1011. You may need to change the port number if is already 1011 occupied on your machine. SerializeDemo application demo application can connect to this server for transmitting serialized data
  • SerializationWithoutDocView – console application that demonstrates usage of CArchive without Document / View architecture

SerializeDemo application in action.

Try to play with this application using following menu commands.

Try using Edit Copy and Paste into the new instance of the SerializeDemo application.

Serialize TCP Server in action.

The SerializeDemo app sent serialized data to the server. Server prints received binary data.

History

March 16th 2017 Original artice.

LEAVE A REPLY