Thursday, May 14, 2009

How to embed images into XML files



The title may sound confusing.We all know XML as plain text formats while images as binary files.So how can we embed images inside a XML file?Is it some kind of links to the external image files like the web pages?Not exactly.The image is completely embedded inside the nodes of XML in text format as base64 string.

The Base64 Format:


Base64 converts binary data to plain text using 64 case-sensitive, printable ASCII characters: A-Z, a-z, 0-9, plus sign (+) and forward slash (/), and may be terminated with 0-2 "padding" characters represented by the equal sign (=). For example, the eight-byte binary data in hex "35 71 4d 8e 4c 5f db 42″ converts to Base64 text as "NXFNjkxf20I=".

To accomplish this, we need some kind of encoder application that converts the image into the base64 format and embed the text into XML.To extract the image from the XML,we need another application that reads the base64 string,decodes it into the image again.In .NET Framework class library(FCL), we have convenient components in the "System.Convert" namespace to perform this encoding and decoding.Let's look at an example to understand the process.

The Base64 Encoder:

static void Main()
{
string base64FormattedImage= string.empty;
string imageFilePath = @"C:\Images\nature.jpg";
base64FormattedImage= EncodeToBase64FromImage(imageFilePath);

if (!string.IsNullOrEmpty(base64String))
{
string destinationXMLFile = @"C:\XML\NewsContent.xml";
WriteToXML(base64FormattedImage,destinationXMLFile);
}
}

private string EncodeToBase64FromImage(string imageFilePath)
{
System.IO.FileStream inFile;
byte[] binaryData = null;
string base64String=string.Empty;

try
{
inFile = new System.IO.FileStream(imageFilePath,
System.IO.FileMode.Open,
System.IO.FileAccess.Read);
binaryData = new Byte[inFile.Length];
long bytesRead = inFile.Read(binaryData, 0,
(int)inFile.Length);
inFile.Close();
}
catch (System.Exception exp)
{
// Error creating stream or reading from it.
System.Console.WriteLine("{0}", exp.Message);

}
// Convert the binary input into Base64 Encoded output.
try
{
base64String =
System.Convert.ToBase64String(binaryData,
0,
binaryData.Length);
}
catch (System.ArgumentNullException)
{
System.Console.WriteLine("Binary data array is null.");

}

return base64String;
}

private void WriteToXML(string base64FormattedImage,string destinationXMLFile)
{
XmlDocument doc = new XmlDocument();
doc.Load(destinationXMLFile);
XmlNode nd = doc.SelectSingleNode("/contents/content/Images").FirstChild;
nd.InnerText = base64FormattedImage;
doc.Save(destinationXMLFile);
doc = null;
}


The encoder reads an image file from the disk,converts it into Byte array and calls the "System.Convert.ToBase64String" method to get the base64 string from the Byte array.Then main method passes the base64 string to WriteToXML method to save into an XML file.

The XML file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<contents>
<content>
<ID>1</ID>
<Title>The title</Title>
<Body>
Content goes here..
</Body>
<Images>
<Image ID="1">/9j/4AAQSkZJRgABAgEAYABgAAD/4RBY ......</Image>
</Images>
</content>
</contents>


The Base64 Decoder:

static void Main()
{
string sourceXMLFilePath= @"C:\XML\NewsContent.xml";
string destinationImageFilePath = @"C:\Images\nature.jpg";
DecodeFromBase64ToImage(sourceXMLFilePath, destinationImageFilePath);
}

private void DecodeFromBase64ToImage(string sourceXMLFilePath, string destinationImageFilePath)
{
XmlDocument doc = new XmlDocument();
doc.Load(sourceXMLFilePath);
XmlNode nd = doc.SelectSingleNode("/contents/content/Images").FirstChild;
string base64String = nd.InnerText;

byte[] binaryData;
try
{
binaryData =
System.Convert.FromBase64String(base64String);
}
catch (System.ArgumentNullException)
{
System.Console.WriteLine("Base 64 string is null.");
return;
}
catch (System.FormatException)
{
System.Console.WriteLine("Base 64 string length is not " +
"4 or is not an even multiple of 4.");
return;
}

// Write out the decoded data.
System.IO.FileStream outFile;

try
{
outFile = new System.IO.FileStream(destinationImageFilePath,
System.IO.FileMode.Create,
System.IO.FileAccess.Write);
outFile.Write(binaryData, 0, binaryData.Length);
outFile.Close();
}
catch (System.Exception exp)
{
// Error creating stream or writing to it.
System.Console.WriteLine("{0}", exp.Message);
}
}


The decoder parses the XML document,retrieves the base64 string,converts it into Byte Array by calling the "System.Convert.FromBase64String" method.It then saves the Byte Array as an image file into the disk.

This approach is widely used in the online news media industry.The news media companies exchange articles and images among themselves in XML format.The schema of the XML usually follows the popular NITF(News Industry Text Format) schema or DTD.You can find more about NITF Here.

2 comments:

Sohan said...

I knew about base64 encoding but didn't have any idea about this usage. Thanks for sharing.
One thing to add to this. The same technique is used to transmit email attachments. So, when you attach a file to an email, it basically adds a new content, that is most likely base64 encoded, in to the mail. Your email client decodes it for you.

Arif said...

Thanks for your comment and the information about the usage of base64 in email applications.The base64 encoding seems to be really effective for data transmission.