Monday, January 11, 2010

My meeting with BLOB in Sharepoint

I have been working with Microsoft Sharepoint technologies since long and the question I always had somewhere in my mind was... how does the physical file gets stored into the Sharepoint database?Database itself runs on .MDf and .ldf files so saving a file into a database is pouring a file into file.

Last night I tried to scratch the question and was amazed to see the sky above me. BLUE..Deep Blue sky.

My first blog read was of Kyle:
http://www.kyletillman.net/blog/post/SharePoint_External_Binary_Store-PartI.aspx

She explained much and a line of it is "Out of the box WSS stores all binary content in the application’s Content database in the AllDocStreams.Content column (which is an image type". SO there is something that turns the file into a Binary and pushes it into the SQL database as Binary content.

Link from the same carried my mind packed to MSDN article:

http://msdn.microsoft.com/en-us/library/bb802976.aspx

This one is awseome and tells clearly "BLOB storage routed the binary data stream associated with a SharePoint file to the Microsoft SQL Server content database, which it shared with the site's structured data. Under that scenario, when you invoked a Save command on the SharePoint file, a parser in the Save path recognized the Save command and promoted a parcel of metadata out of the file stream. Then the metadata, along with the BLOB associated with the file, was stored in the SQL Server content database".

Diagram:


Image above is from Microsoft MSDN site: http://i.msdn.microsoft.com/Bb862195.be761d13-4bb4-44d2-aece-b8d99d9d7536(en-us,office.12).gif
This is great working model of Sharepoint storage and further I checked that developers may develop their own applications and use SQL BlOb storage capabilities.

Now, the question came in when I further dig into the rabbit hole and found that storing Large Blobs being unstructured are not best suited for SQL Server and may reduce the performace and cause fragmentation. There are many discussion articles and forums talking about the same and Microsoft's work on doing even better on BLOB Storage in SQL Server 2005 and much better in SQL Server 2008.

Finally the research article from MS put more focus into what the architecture asks for.
http://research.microsoft.com/pubs/64525/tr-2006-45.pdf

Most of such articles advocates storing data larger than 256KB - 1 MB on filesystem than SQL server. Default MAx upload size for a single file in MOSS is 50 MB which is Much Much larger than the 1 MB, so further, I found that it is the
chunk size whichdetermines the amount of data that the client retrieves in one go when opening a document. For example, if a client tries to open a document of 50MB and the chunk size is 10MB, the document is divided and retrieved in 5 chunks. Each chunk will be loaded into the memory of both the WFE handling the request and SQL Server. The default chunk size is 5MB and you can adjust the chunk size by issuing the following stsadm command:
Stsadm.exe -o setproperty -pn large-file-chunk-size -pv
(courtsey: http://www.lcbridge.nl/vision/2008/largefiles.htm)
but it seems that the WEBDAV doesnt use the chunk size and hence try to load the complete 50 MB file into the memory and hence causes failure with Large files sometimes.
WIth WSS SP1, Microsoft extended the Stroage and exposed the API for external Blob storage where the Large files may get stored on the filesystem and the matadata along with Binary ID would get into the Database.
THis would be just very Opaque to the Application as this would happen under the Storage Access Stack and THe sharepoint object model.
This being Grey area was not much worked uponin this Version of Sharepoint but with SHarepoint 2010, Microsoft has done great work with BLOB storage and the functionality is into the product.

I am going to try BLOB in Sharepoint 2010 and will share in my next post...

Thanks for reading

No comments:

Post a Comment