SQL Server 2008 hierarchyid Data Type Tutorial

3

One of the coolest features SQL Server 2008 has to offer is the new data type called hierarchyid (or SqlHierarchyId in the CLR). As you can begin to imagine this data type addresses a common issue when dealing with hierarchical data. You’ve probably been confronted with a scenario in witch you needed to data tree of some sort, and you’ve probably struggled with joins, recursive operations, and other strategies in the past. With this new type you can implement tree structures with ease, making use of the best indexing strategy. This article aims to explain what this data type is, what it can do for you, its main advantages and a sample.

Hierarchyid is a variable length data type, and it represents a node in your data tree. By itself, it knows nothing of its siblings, so it’s up to the developer to provide such connection in order to create the necessary relationship between rows.

Memory-wise, it encodes information about a single node in the hierarchy tree by encoding the path from the root of the tree to the node. This path is nothing but a sequence of node labels of all children from the root to the node. The root itself is represented by a slash, so a path that aims to the root will only have “\”. From there on, slashes are used to separate levels. Each node label is encoded as a sequence of integers separated by dots. Consider this ordered example of hierarchyid’s:

\  (root level)

\1  (depth 1 node, with label “1”)

\0.5  (depth 1 node, with label “0.5”)

\0.5.1  (depth 1 node, with label “0.5.1”)

\0.6  (depth 1 node, with label “0.6”)

\1\1  (depth 2 node, with label “1”)

So, you’re probably wondering how do we know how to order nodes in the same level. This is accomplished by comparing node labels, like versioning. 0.5.1 comes after 0.5 and before 0.6. If we wished to insert a new node between 0.5 and 0.5.1, we could use 0.5.0.1 or 0.5.0.2, etc. Pretty simple actually.

Now, performance-wise, here’s what you can expect:

Extremely compact:

The average number of bits that are required to represent a node in a tree with n nodes depends on the average fanout (the average number of children of a node). For small fanouts (0-7), the size is about 6*logAn bits, where A is the average fanout. A node in an organizational hierarchy of 100,000 people with an average fanout of 6 levels takes about 38 bits. This is rounded up to 40 bits, or 5 bytes, for storage.

Comparison is in depth-first order

Given two hierarchyid values a and b, a<b means a comes before b in a depth-first traversal of the tree. Indexes on hierarchyid data types are in depth-first order, and nodes close to each other in a depth-first traversal are stored near each other. For example, the children of a record are stored adjacent to that record. For more information, check out Using hierarchyid Data Types (Database Engine).

Support for arbitrary insertions and deletions

By using the GetDescendant method, it is always possible to generate a sibling to the right of any given node, to the left of any given node, or between any two siblings. The comparison property is maintained when an arbitrary number of nodes is inserted or deleted from the hierarchy. Most insertions and deletions preserve the compactness property. However, insertions between two nodes will produce hierarchyid values with a slightly less compact representation. The encoding used in the hierarchyid type is limited to 892 bytes. Consequently, nodes which have too many levels in their representation to fit into 892 bytes cannot be represented by the hierarchyid type.

SQL Server 2008 database engine has some new methods that go along with it. I’ll describe them a bit:

GetAncestor(n) : Returns a hierarchyid representing the nth ancestor of the affected node.

GetDescendant(child1, child2) : Return a child of the affected node, depending on child 1 and 2.

  • If affected node is NULL, returns NULL.
  • If affected node is not NULL, and both child1 and child2 are NULL, returns a child of the affected node.
  • If affected node and child1 are not NULL, and child2 is NULL, returns a child of the affected node greater than child1.
  • If affected node and child2 are not NULL and child1 is NULL, returns a child of the affected node less than child2.
  • If affected node , child1, and child2 are not NULL, returns a child of the affected node greater than child1 and less than child2.
  • If child1 is not NULL and not a child of the affected node, an exception is raised.
  • If child2 is not NULL and not a child of the affected node, an exception is raised.
  • If child1 >= child2, an exception is raised.

GetLevel : Returns an integer that represents the depth of the affected node in the tree.

GetRoot : Static method. Returns the root of the hierarchy tree.

IsDescendantOf(parent) : Returns true if the affected node is a descendant of parent.

Parse(input) : Static method. Converts the canonical string representation of a hierarchyid to a hierarchyid value. Parse is called implicitly when a conversion from a string type to hierarchyid occurs. Acts as the opposite of ToString.

Read : It’s a CLR method. It reads binary representation of SqlHierarchyId from the passed-in BinaryReader and sets the SqlHierarchyId object to that value. As it cannot be called by using Transact-SQL, use CAST or CONVERT instead.

GetReparentedValue(oldRoot, newRoot) : Returns a node whose path from the root is the path to newRoot, followed by the path from oldRoot to the affected node.

ToString : Returns a string with the logical representation of the affected node. ToString is called implicitly when a conversion from hierarchyid to a string type occurs. Acts as the opposite of Parse.

Write(BinaryWriter) : CLR method. Writes out a binary representation of SqlHierarchyId to the passed-in BinaryWriter. Write cannot be called by using Transact-SQL so use CAST or CONVERT instead.

One of the first challenges I faced on a personal project when working with hierarchyid is the actual lack of support in Entity Framework. As far as I know, Microsoft has no time frame for this yet (I’ll keep you posted though!).

The workaround I found to this was to have an extra column on my tree table to hold the node’s varbinary representation of it’s hierarchyid. This way I can work with it (almost) normally. In the example, I will ad two extra columns, one for the binary representation and another to show each node label in string mode.

Now let’s assume we’re creating a database for a computer hardware online store. We need a Products table, and a Categories table. In order to provide a decent search and navigation experience across the product catalog in the UI, the best thing to do is to create a product category tree and afterwards relate each category with a product in our list. This way we will have an organized and hierarchical representation of our product catalog.

Let’s assume we already have the following Categories in our Categories table:

  1. Products
  2. Storage
  3. Internal Storage
  4. External Storage
  5. IDE
  6. Sata II
  7. SSD
  8. Peripherals
  9. Mice
  10. Keyboards

The idea of this tutorial is to represent a product category tree like so:

  • Products (ID 1 – root)
    • Storage (ID 2)
      • Internal Storage (ID 3)
        • IDE (ID 4)
        • Sata II (ID 6)
        • SSD (ID 7)
      • External Storage (ID 4)
    • Peripherals (ID 8)
      • Mice (ID 9)
      • Keyboards (ID 10)

This could be a sample of our Products table, with associated CategoryID’s:

  1. Sunsung 1.5Tb Sata II 3.5” (CategoryID 5)
  2. Sunsung 500Gb ESata + USB + Ethernet (CategoryID 3)
  3. Maxedstor 1Gb IDE 3.5” (CategoryID 4)
  4. ODZ SSD 250Gb (CategoryID 6)
  5. Macrosoft Explorer Optical Mouse (CategoryID 8)
  6. Logitak Ultra Flat Keyboard  (CategoryID 9)

Sound like underground Chinese brands… :)

With all our base info set up, we need to map our category tree. But first, we need to create a table to store the hierarchy. So we must execute the following:

CREATE TABLE [CategoryTree](
    [NodeID] [hierarchyid] NOT NULL,
    [NodeString] [varchar](512) NOT NULL,
    [NodeBinary] [varbinary](512) NOT NULL,
    [CategoryID] [int] NOT NULL
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[CategoryTree] WITH CHECK ADD CONSTRAINT [FK_CategoryTree_Category] FOREIGN KEY([CategoryID])
REFERENCES [dbo].[Category] ([ID])
GO

ALTER TABLE [dbo].[CategoryTree] CHECK CONSTRAINT [FK_CategoryTree_Category]
GO

Notice how I added a NodeString and NodeBinary columns. This is for me to be able to have a string representation of each node label and also be allowed to know their positioning in the tree. Now for the index:

CREATE UNIQUE CLUSTERED INDEX [IX_CategoryTree] ON [dbo].[CategoryTree]
(
    [NodeID],
    [CategoryID]
)
GO

Table and index created, it’s time to make the tree grow! Here are the inserts! First, the root “Produts”:

/* Insert Root - Products */
insert into [CategoryTree]
    ([NodeID]
    ,[NodeString]
    ,[NodeBinary]
    ,[CategoryID])
values
    (hierarchyid::GetRoot()
    ,hierarchyid::GetRoot().ToString()
    ,convert(varbinary, hierarchyid::GetRoot())
    ,11)

Since I’m adding a root node, I must call GetRoot() to get the root node reference. Now, I’ll insert “Storage” and “Peripherals” sub-categories. They will both be children of the “Products” category, so I’ll have to make sure I generate the appropriate node reference:

/* Declarations */
declare @rootNodeId hierarchyid
declare @categoryNodeId hierarchyid
declare @nodeBinary varbinary
declare @firstChildNodeId hierarchyid

/* Variable assignments */
select @rootNodeId = hierarchyid::GetRoot() from [CategoryTree]
select @categoryNodeId = @rootNodeId.GetDescendant(null,null) from [CategoryTree]
set @nodeBinary = convert(varbinary, @categoryNodeId)

/* Insert first child - Storage */
insert into [CategoryTree]
    ([NodeID]
    ,[NodeString]
    ,[NodeBinary]
    ,[CategoryID])
values
    (@categoryNodeId
    ,@categoryNodeId.ToString()
    ,@nodeBinary
    ,2)

Notice the call to GetDescendant() from the root node. Having both parameters set to null, I’m getting a new node reference that is a child of root. I then use that reference in the insert. Next, we’ll be inserting the “Peripherals” category, in the same level, right below “Products” and next to “Storage”:

/* Insert second child - Peripherals */
select @firstChildNodeId = @rootNodeId.GetDescendant(NULL,NULL)
select @categoryNodeId = @rootNodeId.GetDescendant(@firstChildNodeId, null) from [FrontOffice].dbo.[CategoryTree]
set @nodeBinary = convert(varbinary, @categoryNodeId)

insert into [CategoryTree]
    ([NodeID]
    ,[NodeString]
    ,[NodeBinary]
    ,[CategoryID])
values
    (@categoryNodeId
    ,@categoryNodeId.ToString()
    ,@nodeBinary
    ,8)

Since we needed to add a new category, at the same level of the “Storage” category, we had to get a node reference next to it, hence the @rootNodeId.GetDescendant() call with @firstChildNode as the only not null parameter. It allowed us to get a node position after “Storage”. Now, both Storage sub-categories – Internal Storage and External Storage, are needed. Here’s what we must do (keep in mind that this SQL is continuous, so variables hold previous code snippet values):

/* Insert second level child of Storage - Internal Storage */
/* @categoryNodeId represents the "Peripherals" category node at this point */
select @categoryNodeId = @categoryNodeId.GetAncestor(1).GetDescendant(null,@categoryNodeId).GetDescendant(null,null)
set @nodeBinary = convert(varbinary, @categoryNodeId)

insert into [FrontOffice].dbo.[CategoryTree]
           ([NodeID]
           ,[NodeString]
           ,[NodeBinary]
           ,[CategoryID])
     values
           (@categoryNodeId
           ,@categoryNodeId.ToString()
           ,@nodeBinary
           ,3)

The reason we had to call GetAncestor(1).).GetDescendant(null,@categoryNodeId) was to get a reference to the previous node, the Storage category. It allowed us to go to the parent, and fetch the child right before the “Peripherals” category.

Now onto the “External Storage category:

/* Insert second level child of Storage - External Storage */
/* @categoryNodeId represents "Internal Storage" category node at this point */
select @categoryNodeId = @categoryNodeId.GetAncestor(1).GetDescendant(@categoryNodeId,null)
set @nodeBinary = convert(varbinary, @categoryNodeId)

insert into [FrontOffice].dbo.[CategoryTree]
    ([NodeID]
    ,[NodeString]
    ,[NodeBinary]
    ,[CategoryID])
values
    (@categoryNodeId
    ,@categoryNodeId.ToString()
    ,@nodeBinary
    ,5)

And so on… There’s no need to explain the rest of the tree creation since it’s all much the same. At this point in time, our category tree table would look like this:

NodeID NodeString NodeBinary CategoryID
0x / 0x 1
0×58 /1/ 0×58 2
0x5AC0 /1/1/ 0x5A 3
0x5B40 /1/2/ 0x5B 4
0×68 /2/ 0×68 8

Notice the path representation and how it all fits in. Pretty neat huh?

Hope this article helped. I’ll be posting some more content regarding hierarchyid very soon, so keep it synched.

Technorati Tags:

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

SQL Server 2008 SP1

0

Service Pack 1 for SQL Server 2008 was made available last week. But since I was in vacation the post only came out today :) The Service pack is available via download here and is primarily a roll-up of cumulative updates 1 to 3, quick fix engineering updates and minor fixes made in response to requests reported through the SQL Server community. While there are no new features in this service pack, customers running SQL Server 2008 should download and install SP1 to take advantage of the fixes which increase supportability and stability of SQL Server 2008.

Customers have no reason to wait to upgrade to SQL Server 2008 and many are already taking advantage of SQL Server 2008 as a smart IT investment. In fact, there have been over 3 million downloads of SQL Server 2008 since the RTM in August. With this Service pack, Microsoft is introducing 80% fewer changes to customer configurations compared to previous SQL Server Service Pack releases. This remarkable decrease is a testament to a revised product development process and updated servicing strategy that is focused on ease of deployment while keeping customer environments stable.

With this service pack, Microsoft has made investments to ease deployment and make management of service packs more efficient. Key improvements in SP1 include Slipstream, Service Pack Uninstall and Report Builder 2.0 click-once deployment.
- Slipstream allows administrators to install SQL Server 2008 and Service Pack 1 in a single instance. This decreases the total time for an installation, including a fewer number of reboots thereby increasing productivity and deployment availability.
- Service Pack Uninstall allows administrators to uninstall the service pack separately from the database release. This feature also improves DBA productivity, reduces the cost of deployment and improves overall supportability
- Report Builder 2.0 Click Once improves the existing SQL Server end-user report authoring application by easing deployment to business users.

Source: SQL Server Team Blog

Technorati Tags:

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

The new features in SQL Server 2008

0

Not that i’m giving the common blog reader any news, but this post aims to list all the new features of SQL Server 2008, and tries to explain each and everyone of them. It’s going to be a work in progress, with links and content being added as I post in the blog. Think of this as a little SQL Server 2008 “reference”. Here goes:

Security

  • Transparent Data Encryption
  • External Key Management
  • Data Auditing – This is one the core features of SQL Server 2008. With it, you’ll have:
    • The introduction of first class ‘AUDIT’ objects.
    • Auditing DDL (data definition language) commands.
    • Support for multiple logging targets.

    Availability

  • CPUs – Support for pluggable CPUs which means that a CPU can be added on the fly and recognized by SQL Server 2008 just like memory in SQL Server 2005
  • Database Mirroring – Enhanced Database Mirroring to include compression of mirror streams, enhanced performance and automatic page-level repair for the principal and mirror.

    Performance

  • Data Compression – This new feature provides the ability to easily enable or disable data compression as an online command as well as offer more efficient data storage above and beyond traditional data compression
  • Backup Stream Compression – The ability to configure compression with server level control or backup statement control of all backup types (full, differential, transaction log)
  • Performance Data Collection – When you are experiencing a performance issue the biggest problem is pinpointing the problem, so with SQL Server 2008 Microsoft is introducing a single common framework for performance related data collection, reporting, and warehousing.
  • Improved Plan Guide Support – With SQL Server 2008 plans can be frozen for permanent query usage as well as pull plans directly from plan cache with SQL Server Management Studio integration.
  • Resource Governor – If you have had the need to segment your SQL Server resource utilization then you should be looking forward to SQL Server 2008 because you will have the opportunity to create pools and groups to segment the resources and govern them independently.

    Management

  • Policy-Based Management Framework – The ability to manage objects via policies as opposed to traditional scripts with inherent monitoring and enforcement Microsoft System Center – Integration with Microsoft System Center which a product from Microsoft to improve operational costs.
  • Extended Events – Another new feature is Extended Events which is a high performance yet light weight tracing infrastructure with insight into the core engine independent of SQL Trace.

    New data types

  • Date Time Data Type – The datetime data type will now be able to support the following: Precision to the 100th nanosecond which is 7 digits past second; time-zone datetime offset to translate the datetimes across numerous time zones; rather than having to parse the datatime for just the date or just the time, now SQL Server 2008 will have date only support as well as time only support.
  • HierarchyID – With the introduction of the HierarchyID data type this data type will be hierarchical-aware and will be accompanied by built-in functions, methods, etc. to support complex hierarchies in your data with .NET

    Development

  • Entity Data Model – This will enable SQL Server to return objects (or Entities) instead of resultsets.
  • LINQ- LINQ is a new .NET Framework that encompass language-integrated query, set, and transform operations.
  • SQL Server Change Tracking – This feature provides the ability to have change data capture without a comparable value.
  • Table Valued Parameters – Ability to create variables of a table value data type and be able to pass this variable into a stored procedure.
  • MERGE statement – Another term for this new feature is ‘upsert’; this commands provides the ability to programmatically INSERT data if it does not exist or UPDATE the data if it does all in 1 set of logic.
  • Large UDT’s – The 8000 byte limit is no longer applicable for on CLR-based UDTs and UDAs.
  • Spatial data – SQL Server 2008 will support GEOMETRY and GEOGRAPHY data types with built-in spatial function support and spatial indexes to support more GIS needs.
  • XML enhancements (support for lax validation, office 12 support, xs:dateTime support, lists/union types, LET FLOWR support, etc.)

    Service Broker

  • Interface – A new user interface and tools will be released for working with Service Broker in order to add, drop or edit Service Broker objects directly in SQL Server Management Studio.
  • Conversation Priority – The ability to set message ordering with a send and receive impact with levels one to ten.

    Data Storage

  • Data Compression – Reference the Performance section above FILESTREAM Attribute – With this feature get the best of both worlds with functionality from BLOBs in the DB vs. BLOBs on filesystem.
  • Integrated Full Text Search – With SQL Server 2008 Full Text Search is fully integrated into the relational engine with no external storage, no external service as well as more efficient and reliable costing.
  • Sparse columns – SQL Server 2008 has more efficient storage for ‘wide’ tables with many columns that repeat and do not contain data.
  • New index types – New indexes include: Spatial indexes, Hierarchical indexes, FILTERED indexes (indexes on filtered values within columns).

    Data Warehousing/ETL

  • Partitioned Table Parallelism – This feature eliminates the one thread limit per partition.
  • Star Join support – SQL Server 2008 now supports a star join with no special syntax which is completely optimizer based with full backward syntax support.
  • Data compression – Reference the Performance section above.
  • Resource Governor – Reference the Performance section above.
  • Persistent Lookups in SSIS – There is no longer a need for re-querying for lookup operators and cache lookups in multiple ways with the ability to persist lookups to disk.
  • Improved thread scheduling in SSIS – This is accomplished by a shared thread pool and pipeline parallelism.
  • SQL Server Change Tracking – Reference the Development Enhancements section above.
  • MERGE statement – The MERGE statement will add a great deal of value with slowly changing dimensions.
  • Scale-out analysis services – With read-only storage multiple Analysis Services SQL Servers can be leveraged.
  • Subspace computations
  • New tools for cube design.
  • Best practice design alerting
  • Backup cubes with better scalability
  • Excel – Data-mining add-ins for Excel

    Reporting

  • Reporting Services Deployment – IIS is no longer required to run Reporting Services.
  • Rich-text support
  • Enhanced visualization graphing.

     

    Discontinued features

  • 60, 65 and/or 70 compatibility modes.
  • NOLOG and/or TRUNCATEONLY for the BACKUP command.
  • Word – Reports can be rendered to Microsoft Word.

    Sources: SQL Tips, MSDN

  • Technorati Tags:

    Digg This
    Reddit This
    Stumble Now!
    Buzz This
    Vote on DZone
    Share on Facebook
    Bookmark this on Delicious
    Kick It on DotNetKicks.com
    Shout it
    Share on LinkedIn
    Bookmark this on Technorati
    Post on Twitter
    Google Buzz (aka. Google Reader)

    SQL Server 2008 Released

    0

    SQL Server 2008 has been recently released. Whether you are an MSDN subscriber or not, you may or may not be able to download the full version. Either way, start getting those bytes now, and you can rest assured you’ll be in for a treat.

    Here’s the press release:

    “REDMOND, Wash. – Aug. 6, 2008 — Microsoft Corp. today announced the release to manufacturing of Microsoft SQL Server 2008, the new version of the company’s acclaimed data management and business intelligence platform. This version of SQL Server provides powerful new capabilities such as support for policy-based management, auditing, large-scale data warehousing, geospatial data, and advanced reporting and analysis services. SQL Server 2008 provides the trusted, productive and intelligent platform necessary for business-critical applications.”

    In this version of SQL Server, there will be several editions available:

    Enterprise – SQL Server 2008 Enterprise Edition is a comprehensive data management and business intelligence platform that provides enterprise-class scalability, data warehousing, advanced analytics, and security for running business-critical applications. With this edition, you can consolidate servers and perform large-scale online transactional processing and reporting.

    Standard – SQL Server Standard is a complete data management and business intelligence platform, providing best-in-class ease of use and manageability for running departmental applications.

    Workgroup – SQL Server 2008 Workgroup is a reliable data management and reporting platform that delivers secure, remote synchronization and management capabilities for running branch applications. It includes the core database features of the SQL Server product line and is easy to upgrade to Standard or Enterprise.

    Web – Microsoft SQL Server 2008 Web is specifically designed for highly available Internet facing web serving environments for the next generation Windows Server. With more robust capabilities than previous versions of SQL Server, SQL Server 2008 Web provides what you need to help support low cost, large scale highly available web applications or hosting solutions to your customers.

    Unlike the previous offering of SQL Server 2005, SQL Server 2008 Web has greater technical capabilities such as automating security policy enforcement and supporting new integration features in the .NET Framework.

    Compact – SQL Server Compact is a free SQL Server embedded database that is ideal for building stand-alone and occasionally connected applications for mobile devices, desktops, and Web clients. SQL Server Compact runs on all Microsoft Windows platforms, including Windows XP and Windows Vista operating systems, and on Pocket PC and smartphone devices.

    Express – SQL Server 2008 Express is a free edition of SQL Server that is ideal for learning and building desktop and small server applications, and for redistribution by ISVs. SQL Server Express includes powerful features, such as SQL Server 2005 Reporting Services, a server-based platform for creating and delivering traditional and interactive reports, and a graphical management tool, SQL Server 2008 Management Studio Express, for easily managing your database.

    I’ll be posting some content on this soon, so be tuned for some brain candy :)

    Technorati Tags:

    Digg This
    Reddit This
    Stumble Now!
    Buzz This
    Vote on DZone
    Share on Facebook
    Bookmark this on Delicious
    Kick It on DotNetKicks.com
    Shout it
    Share on LinkedIn
    Bookmark this on Technorati
    Post on Twitter
    Google Buzz (aka. Google Reader)

    Note: Silverlight, C#, in fact any .NET web development projects is best used with windows hosting than Linux based hosting.