Siddharth Mehta's Blog: March 2009

Tuesday, March 31, 2009

Price of SQL Server 2008 and Project Estimation : Which license of SQL Server 2008 to use for your project

SQL Server 2008 is available under three licensing models:

Server plus device client access license (CAL): Requires a license for the computer running the Microsoft server product, as well as CALs for each client device.

Server plus user client access license (CAL): Requires a license for the computer running the Microsoft server product, as well as CALs for each user.

Processor license: Requires a single license for each CPU in the operating system environment running SQL Server. This license includes unlimited client device access.

When SQL Server 2008 runs inside a virtual operating environment, it requires at least one license per virtual operating environment, except for SQL Server Enterprise edition.

The Business Intelligence components for SQL Server 2008 include Analysis Services, Reporting Services, and Integration Services. To use any of these components, the server on which Business Intelligence is installed must have a valid SQL Server 2008 license. If these components are in a separate server than the main database server, then they require an additional license of each additional server where they are installed.

For exhaustive information regarding license and price of SQL Server 2008, check out the SQL Server 2008 Pricing and Licensing document.

Monday, March 30, 2009

Impact analysis of change in metadata or OLTP Schema on SSIS packages

Tweet this !

In a large enterprise environment, you may have dozens if not hundreds of SSIS packages that run at various times to load a myriad of tables. Let’s say that you have a new DBA who decides it will be more optimal to turn a column into a small integer from a varchar field. So, what will be the impact?

The change, no matter how small it is, may create the need to change 15 packages. The metadata solution pack is one of the discovery tools you can use to explore the impact of a change like this.There are a few approaches you could take to analyze the metadata of BI systems like SSIS. One approach you could use is to look at the XML files that make up each package using XPATH queries.

In order to solve the metadata challenge a team at Microsoft has developed a free set of utilities and samples called the SQL Server BI Metadata Samples Toolkit. The kit gives you a jump start in developing your own solution for impact analysis or you can take the Microsoft solution right out of the box. SQL Server BI Metadata Samples Toolkit includes the following items to help you create your own metadata solution:

· DependencyAnalyzer.exe – Tool that evaluates and loads into a database the lineage on SSIS packages, Analysis Services and SQL Server. All the source code for this program is provided.

· DependencyViewer.exe – A tool that lets you graphically see the dependencies and lineage of objects in the lineage repository. Source code is provided for this program.

· Data Source View – A DSV that connects to the lineage repository (SSIS META database) that can be used by Reporting Services.

· Lineage Repository – A database called SSIS_META that can be used to house metadata from nearly any system.

· Reports – Some standard reports for impact analysis studies. You will find two key reports out of the box with several sub-reports.

· Report Model – A report model that you can use with Report Builder to allow end-users to create ad-hoc reports.Integration Services Samples – A few sample packages to start auditing and viewing lineage on.

Reference: BI Metadata Whitepaper

Saturday, March 28, 2009

Free ASP.NET MVC eBook Tutorial

Tweet this !

Before several years, I was selected at Indian Institute of Information Technology - Allahabad for my technical paper presentation. One of the presenter, who was the Head of Department from DAVV - Indore, gave a very huge and lengthy presentation. He was approximately 60 Years of age, but his enthusiasm was worth watching. I don't remember even a word he said, but one line he said during his presentation still rings in my mind. "Information Technology cannot work in isolation". This translates into our perspective that, we can't just stick to one technology and dream of growing to a high level. My personal belief is that one who dreams to be an architect, a CFO, a CIO etc.. needs to have a knowledge of a breadth of technology. Especially in this turbulent times, one needs to diversify. This gives one, more breadth of technology along with an expertise level in the technology of choice and intermediate level in rest of the technologies which keeps your job safe to a considerable level.

Here's the download of the day: Free ASP.NET MVC eBook Tutorial

Friday, March 27, 2009

Oracle to SQL Server 2008 Migration : SQL Server Migration Assistant 2008 for Oracle V4.0

Tweet this !

SQL Server Migration Assistant (SSMA) 2008 for Oracle is a complete suite of tools that dramatically cuts the effort, cost, and risk of migrating from Oracle to SQL Server 2008. SSMA 2008 for Oracle automates almost every aspect of the migration process including assessment, schema and data migration, business logic conversion, and validation. Companies such as Motorola and Simon & Schuster have used SSMA to migrate their business-critical databases quickly, easily and accurately.

With SSMA, IT organizations now have the freedom to move to SQL Server and take advantage of SQL Server's exceptional security, scalability, developer productivity, and the industry’s lowest total cost of ownership.

Licensing

SSMA is a FREE download, and is available for use by any customer or partner with no charge or any obligations. However, the product requires a license key for activation. The key can be obtained after quick and simple registration. Microsoft reserves the right to use an e-mail address or phone number provided during registration in order to contact a customer regarding his/her SSMA experience. To help you with the registration process, a License Key Required dialog box opens the first time that you start the SSMA program.

Download SQL Server Migration Assistant (SSMA) 2008 for Oracle V4.0

Reference: MS SSMA Download Page Excerpt

Thursday, March 26, 2009

Data Source View Internals : SSIS Package behavior with Data Source View

Tweet this !

After you add a connection manager that references a data source object to a package, any data flow component in the package can use the data source view to define sources and destinations. Transformations that use reference tables such as the Lookup transformation can also use data source views.

The following data flow components can use data source views:

OLE DB Source

Lookup Transformation

Fuzzy Lookup Transformation

Term Lookup Transformation

OLE DB Destination

SQL Server Destination

There is no dependency between a data source view and the packages that reference it. When you use a data source view in a package, the view definition is added to the package definition in the pertinent data flow component property. For example, in a package that includes a Lookup transformation, the view—a SELECT SQL statement—is stored in the sqlCommand property of the Lookup transformation. Even if a data source view is no longer part of the project, the package continues to be valid because the SQL representation of the data source view is saved in the package definition.

Reference: Excerpt from MSDN BOL

Tuesday, March 24, 2009

Data Source View ( DSV ) in SSIS and SSAS - Metadata management and Advantages

Tweet this !

Definition of DSV as per MSDN goes this way: "A DSV is a metadata layer between the object that consumes / processes data and the underlying data source that allows us to build our objects without being connected to the underlying source". For reference, a DSV is similar to Business Objects (BO) universe. A data source view is the metadata definition, stored in an XML format.

A Data Source View:

Contains the metadata that represents selected objects from one or more underlying data sources, or the metadata that will be used to generate an underlying relational data store.
Can be built over one or more data sources, letting you define objects that integrate data from multiple sources.
Can contain relationships, primary keys, object names, calculated columns, and queries that are not present in an underlying data source and which exist separate from the underlying data sources.
Is not visible to or available to be queried by client applications. Data sources views are not used when you build objects programmatically.

Analysis Services Perspective for DSV: A DSV is the metadata definition of these schema elements used by the Unified Dimensional Model (UDM) and by the mining structures. A data source view contains the logical model of the schema used by Analysis Services database objects—namely cubes, dimensions, and mining structures. In Analysis Services, a developer is embedding business metadata throughout the application when they create Data Source Views (DSVs).

Analysis Services database objects are bound to the logical objects contained in the data source view, instead of being directly bound to the underlying physical objects in the underlying data sources. Therefore, you can define logical objects, such as calculated columns or named queries, that do not exist in an underlying data source and which you might not have permissions to define in an underlying data source.

Integration Services Perspective for DSV: Advantages of using data source views include the following:

A DSV can be defined once and then used by multiple data flow components. Because a data source view is not created within the context of a package, it can be used by tasks and data flow components in multiple packages.
A DSV can be refreshed to reflect changes in its underlying data sources.
A DSV also allows developers to develop packages without having live connections to the source data systems by caching the metadata from the data source from which it’s built.
A DSV can focus the lists of objects that appear in the user interface of data flow components, speeding up the design interface.

Reference: MSDN BOL

Logging in SSIS : Usage of Log Reports and Log Provider selection

Tweet this !

As per MSDN BOL, in SQL Server 2005, SSIS can log information about the package’s runtime to any of the five log providers: text file, SQL Profile trace, SQL Server table, XML file or a Windows Event log. You can also write to two different providers if you wish to audit in a multitude of ways. If none of these logging providers fit your appropriate requirements, you always have the option of building a custom logging provider.

The most common log provider to use is the SQL Server table provider. This is because you can then write Reporting Services Reports that can be viewed or subscribed to see a package’s status. If you select this provider, you must select which Connection Manager you wish to write to and then the events you wish to log (shown below). You can log each time a warning or error occurs or any information message.

Inside the database of your choosing, a sysdtslog90 table will be created. The sysdtslog90 table can grow rather large so you will need to schedule something to eventually archive the table. The table is full of operational data like when each task started and stopped and the success of each one. The most important column in the table is the message column that stores the verbose result of that step.

But in practice, I have seen that the most favorite source of logging remains an XML file. Firstly no one gets interested in logging to a database, as it requires connecting to server and all the related administration and security related issues involved with it. Secondly, thou surprising, in development projects I have seen absolutely no projects where logging is taken so seriously that SSRS reports are developed to view the performance data. Actually, theoretically speaking, it should be done but in practice it's not.

XML based logging is also not a bad option at all. A simple XSLT can be developed for the XML document in which all the execution related data is being logged, and the log information can be seen in the format of choice like HTML for example. This saves hits on the server, requires absolutely low resources, minimal efforts to get a proper GUI for viewing the logged information in a report based format and generally security is not a concern for viewing of log reports as there's nothing sensitive in it. Also log reports are generally used either during debugging or performance tuning in development or production environments.

In theory, theory and practice are the same; but in practice they are not !

Monday, March 23, 2009

Managing Metadata in SQL Server 2005

Tweet this !

Today I am going to open one more category of post in my blog, which I term as "Metadata Intelligence". When working on a BI project, metadata management is one of the areas that is given the least attention, which causes a havoc in a majority of cases is the later phases of the project. Project artifacts like data dictionary, mapping documents, data lineage, audit trail and logging, etc... are given attention only when a chaos gets created for a delivery date. This has been my experience and needs not be true in a generic theory.

In the database engine, SQL 2005 expands on SQL 2000 by adding support for extended properties to nearly all database objects. Extended properties can be used for rich metadata management because each table and column (as well as most other objects in the relational engine) can have multiple extended properties that can each contain up to 7500 bytes of data. Since these properties are scriptable and are part of the structure of each object, developers can make use of these properties to store business-level definitions of each object.

Extended properties can be used for the following:

Specifying a caption for a table, view, or column. Applications can then use the same caption in a user interface that displays information from that table, view, or column.
Specifying an input mask for a column so that applications can validate data before running a Transact-SQL statement. For example, the required format for a postal code or telephone number column can be specified in the extended property.
Specifying formatting rules for displaying the data in a column.
Recording a description of specific database objects that applications can display to users. For example, the descriptions may be used in a data dictionary application or report.
Specifying the size and window location at which a column should be displayed.

Extended properties is one of the starting block for starting creation of metadata management within the enterprise or system, but in a relational OLTP system. There are much more resources and tools to help or cater the same in a Business Intelligence solution (which I would discuss in future posts)

Reference: MSDN BOL

Saturday, March 21, 2009

Wrapper or Framework to Create SSIS Packages Programatically using Microsoft .NET

Tweet this !

Creating SSIS packages programatically is one of the emerging requirements from the .NET world. I had put down a post earlier on this topic, where I also provided a link on MSDN samples on creating SSIS packages programatically. This post is an extension to the same post.

A new wrapper on the SSIS object model has been made available, which facilitates creation of packages programatically thru fewer lines of code. Credits of this wrapper goes to a gentleman, Evgeny Koblov - one of the testers on the SSIS team. I had spent around 4+ Years programming on Microsoft .NET right from version 1.0 till the present days of WCF, WPF etc... Based on my experience, my feeling is that, this wrapper would definitely be a value addition but it would come with a few trade-offs too.

Few which I can think of are, performance, extensibility are the two factors which can be a concerning factor. Manageability would be definitely be a great value addition.

Performance - As the inference level would increase to one more level performance in pressure environments can degrade, but without proper load test on this layer it's premature to comment anything.

Extensibility - If the wrapper is directly used in compiled form, and there are any changes in the SSIS object model (for ex. effective due to some service packs), the same would not be available immediately in this wrapper. But as the source code is available, these wrapper can be implemented as a part of the project or can be implemented as a library for continuous value addition.

Now let me introduce the star of the post - it's called "EzAPI – Alternative package creation API". As per the documentation of this Framework, here's the explanation behind the motive that this architecture facilitates:

"Having an easy way to create SSIS packages programmatically is vital. In some cases static packages do not provide enough flexibility – it is hard to change property task or component of a static package using SSIS object model while using configurations is not always possible. Current approach to create SSIS package programmatically is to use SSIS object model directly. However SSIS object SSIS object model was designed to be universal for all kinds of components (native and managed ones) and to separate runtime and design time phases of component usage. However both of the stages use the same metadata. This is the reason why SSIS distinguishes design-time, runtime functionality and metadata. The side effect is that more code is necessary to create a package and once the package is created – it is not very convenient to change properties of pipeline components and especially metadata of input, output, virtual and external columns.

However, in .NET managed world we can use all the advantages of OOP and have a corresponding class that will encapsulate both metadata and operations that can be made on them – so, there is no need to separate it as we do not deal with runtime internals during design phase.

The framework can significantly reduce the amount of code that need to be supported and allows more efficient development of SSIS packages programmatically."

Download EzAPI from CodePlex

EzAPI Tutorial

Friday, March 20, 2009

SQL Server 2008 Upgrade Whitepaper or Ebook

Tweet this !

SQL Server engineering team in association with Solid Quality Mentors have published a 490 page whitepaper titled "SQL Server 2008 Technical Reference Guide" provides in-depth information on how to upgrade to SQL Server 2008.

Like all IT projects, planning for every contingency and then testing your plan gives you confidence that you will succeed. But if you ignore the planning process, you increase the chances of running into difficulties that can derail and delay your upgrade. This document covers the essential phases and steps involved in upgrading existing SQL Server 2000 and 2005 instances to SQL Server 2008 by using best practices. These include preparation tasks, upgrade tasks, and post-upgrade tasks.

Chapter 1 gives an overview of the technical issues and decisions that are involved in an upgrade to SQL Server 2008, as well as recommendations for planning and deploying an upgrade.

Chapter 2 addresses issues related to upgrading to SQL Server 2008 Management Tools.

Chapters 3 through 8 focus on upgrade issues for SQL Server relational databases.

Chapter 9 addresses upgrading to SQL Server 2008 Express.

Chapters 10 through 14 focus on upgrading to SQL Server 2008 Business Intelligence components: Analysis Services, Data Mining, Integration Services, and Reporting Services.

Chapter 15 addresses the implications of upgrading to SQL Server 2008 for other Microsoft applications and platforms.

Appendix 1 contains a table of allowed SQL Server 2008 version and edition upgrade paths.

Appendix 2 contains an upgrade planning checklist.

Download SQL Server 2008 Upgrade Technical Reference Guide

Thursday, March 19, 2009

User Interface for TableDiff command line utility

Tweet this !

The tablediff utility is used to compare the data in two tables for non-convergence, and is particularly useful for troubleshooting non-convergence in a replication topology. This utility can be used from the command prompt or in a batch file to perform the following tasks:

A row by row comparison between a source table in an instance of Microsoft SQL Server acting as a replication Publisher and the destination table at one or more instances of SQL Server acting as replication Subscribers.

Perform a fast comparison by only comparing row counts and schema.

Perform column-level comparisons.

Generate a Transact-SQL script to fix discrepancies at the destination server to bring the source and destination tables into convergence.

Log results to an output file or into a table in the destination database.

Many of us would have used this tool. The only boring and time consuming point with this tool is that it's a command-line based tool and it's hard to type those long commands again and again. Now here is a little utility that's a very simple user interface that acts as a driver for this utility. Below is a screen-shot of the same.

Download User Interface utility for TableDiff tool

Wednesday, March 18, 2009

Download Microsoft Visual Studio Team System 2008 , Team Foundation Server SP1 and Team Suite SP1 VPC Image

Tweet this !

This virtual machine is running Windows Server 2003 Enterprise Edition, SP2. It contains a full installation of Visual Studio Team System 2008 Team Foundation Server SP1, Team Build 2008 SP1, Team Explorer 2008 SP1, Visual Studio Team System 2008 Team Suite SP1 and all necessary prerequisites. In addition, it has been updated with Office 2007 SP1, current Windows Updates as of December 1, 2008, the October 2008 Release of the TFS Power Tools, Team System Web Access 2008 SP1, and the Visual Studio Team System 2008 Team Foundation Server MSSCCI Provider. Team Foundation Server is installed in workgroup authentication mode and thus does not require a domain controller.

This release also contains 38 hands-on labs and sample data. Please examine C:\MSLabs inside the VPC image. If you desire, you can download the hands-on labs and their support files only.

Download VPC Image

Tuesday, March 17, 2009

MDX : MDX Tutorial , MDX Query , MDX Script , How to in MDX, MDX Select Query

Tweet this !

I have seen developers searching for a last minute MDX query / script to create a calculated member on the cube. Many start in an organized fashion with a start from some simple MDX tutorial, and learning tuples, namedsets, axis, MDX Select Query Syntax etc... But still after that, many queries remain unsolved, and the real answer comes after a hands-on experience of some real-time project when one really starts applying functions, tweaking or nesting queries and parameters.

Any material found in MDX should be treasured. By far I have just seen 3-4 books on SSAS flavor of MDX. Below are the links to code-snippets which are FAQ's or used frequently while either querying the cube or creating calculated members on the cube.

Hope this helps all the last minute MDX requirements of those developers who don't have time to spend in learning all the syntax, structure and theory behing MDX syntax just to solve a little query or requirement on a cube in production or last-minute patch.

How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?

MDX query to get sales by prod line for period plus number of months with NON EMPTY sales?

How do I group dimension members dynamically in MDX?

MDX-How do you write query that returns measure ratio to parent value?

In an MDX query how can I get the top 3 sales years based on the order quantity?

MDX-How can I compare members from different dimensions that have the same key values?

MDX - How do you extract first tuple from the set?

MDX - How do you extract first member from the tuple?

MDX - How do you compare dimension level name to specific value

MDX - How can I get attribute key with MDX

MDX - Query to get sales by product line for specific period plus number of months with sales

How do I create a Rolling 12 Months Accumulated Sum that can show a trend without seasonal variations

MDX - How can I setup default dimension member in cube calculation script?

MDX - How do you make calculated measure to be default cube measure

MDX - I would like to create calculated measure that instead of summing children amounts,uses last child

MDX query for the count of customers for whom the earliest sale in the selected time period (2002 and 2003) occurred in a particular Product Category

How to calculate YTD monthly average and compare it over several years for the same selected month

Download MDX Tutorial - Code Snippets of Various FAQ Queries (PDF)

How do I check for division by zero and null to avoid -1.#INF in cells

How do I check for the top level in a dimension to avoid division by an nonexsting parent(-1.#INF)

MDX - How can I use dimension members/measures in query when sometimes they might not exists

Download MDX Tutorial - Code Snippets of Error Handling (PDF)

MDX - How to use the Filter function in a SCOPE statement

MDX - How to create a SCOPE statement based on the Member property

MDX - How do you specify SCOPE For Dimension to include all members before certain member

MDX - How do you specify SCOPE For Dimension to include all members after certain member

MDX - How Can I define SCOPE statement for all measures in one measuregroup

MDX - How Can I define SCOPE statement for all but one measures in one measure group

MDX - How Can I define SCOPE in MDX for all measures in multiple measuregroups?

Download MDX Tutorial - Cope Snippets SCOPE statement (PDF)

MDX - How do you get Last month in the time dimension

MDX - Query example that returns list of months from start of year up to specified month

MDX - Query to show the first day of the last date month in the cube

How in the report can I order date dimension members in descending order?

MDX - query to get the last Month loaded into a cube

MDX - Query to get the first month of the last year loaded into a cube

MDX - How do you write query that uses execution date/time ( NOW() ) as a parameter?

MDX - Query to get latest months and previous years same months data

MDX - Query to show year level data for all years except last one, and month level data for last year

MDX - How to create calculated member for AVG sales over last 3 years based on NOW()?

MDX - How do I calculate sales for 12 Month to date?

MDX - How can I get Last (Previous) Year to Date (YTD) values?

MDX - Query to get count of months with sales amount > 0 in defined period

MDX - How do you calculate monthly average of one year, optionally including empty months?

MDX - How do you calculate monthly average of one year, optionally including empty months?

Download MDX Tutorial - Code Snippets for Time Dimension (PDF)

Content Host / Courtsey : SSAS-Info

Monday, March 16, 2009

Event , Event Handler and Auditing in SSIS

Tweet this !

The Integration Services runtime includes the IDTSComponentEvents interface (Events collection) that is passed by the runtime to various components that execute within a package, such as tasks, enumerators, and so on. Events are raised by tasks using the IDTSComponentEvents interface that is supplied during execution and by the run-time execution engine. In the Script task, the Events property on the Dts object is really an instance of the IDTSComponentEvents interface.

Components raise events by using the Fire[x] methods. Components can also raise events at any point during execution. Events allow component developers to provide feedback to users of the component as it executes. Calling the FireError method during execution is likely to fail the package.

The Fire methods can be called at design time and during execution. Typically, during component design, the FireError, FireInformation, and FireWarning methods are called to provide user feedback when a component is incorrectly configured. Events that are raised during component design are displayed in the Task Pane.

In SSIS, any events you fire will be written to all enabled log handlers that are set to log that event. Logging allows you to see what happened with your script when you’re not there to watch it run. This is useful for troubleshooting and auditing purposes. The default event handler at design time is the Execution Results tab in the BIDS design environment. This is particularly useful in Script Tasks, if the script task is performing complex operations and an info or warning is required to be logged for each such entry in the data stream.

Within a Script Task, the Log method of the Dts object writes a message to all enabled log providers. The Log method is similar to the FireInformation method of the Events property, but it is easier to use and more efficient — you also do not need to create a specialized event handler to respond to the method call. All you need to do is set up a log provider within the package.

Reference: MSDN BOL

Friday, March 13, 2009

Download Free SQL Server Ebook : How to Become an Exceptional DBA

Tweet this !

Author Brad McGehee describes how using third-party tools can help you achieve more in your day:

"The Exceptional DBA fully understands how to use native SQL Server tools, but also keeps his or her eye open for other options. If a third-party tool is introduced that could greatly enhance their productivity, then they are willing to give it a try. Most DBAs are well-paid, and it is better to spend a few extra dollars on tools to reduce "drudgery" work, in order to free up time for the DBA to work on more important tasks."

Ebook Sponsor: RedGate

Tool to profile MDX query processing and MDX Performance Analysis

Tweet this !

All my career till date, most of the time I have been working on assignments where my primary role have remained of improvising performance, whether it may be application performance, query performance, database performance or system performance. A good debugging tool acts as a backbone while working on performance tuning and optimization. Industry class tools are already available for application in Visual Studio and for databases like SQL Profiler. But I have not seen a tool of the likes of SQL Profiler for MDX, at least in SSAS parlance.

I found an interesting tool on Codeplex that serves this purpose.

MDX Script Performance Analyser Overview (reference: Codeplex)

The MDX Script Performance Analyser allows the user to identify how much each calculated member, named set and scoped assignment in an Analysis Services cube's MDX script contributes to the overall performance of a given MDX query.

To use the tool, all you need to do is connect to the database and cube you'll be querying and then enter an MDX query in the main textbox and hit the 'Run Query' button. When you do this, the tool does the following:

Connects to the cube using ADOMD.Net and AMO
Read's the cube's MDX Script and splits it up into its constituent statements, storing them in an array
Executes a Clear Cache command to ensure that all queries are run on a cold cache
Executes a Clear Calculations command so that for the current session the cube appears as though its MDX Script contains no commands
For each statement in the array of statements from the MDX Script, it then:
1. Executes the first statement in the MDX Script within the session, so that the cube now acts as though its MDX Script contains only this statement and all previously executed statements
2. Runs the query you entered in the textbox
3. Stores how long the query took to run, plus other interesting metrics
Once the query has run on the equivalent of the entire MDX Script in the cube, a report is generated which contains graphs and charts illustrating the data captured earlier

Download MDX Script Performance Analyser

Tool to assess Microsoft SQL Server 2005 configuration for Sarbanes-Oxley Act ( SOX ) compliance

Tweet this !

I discovered this tool while browsing for enterprise policy framework kind of concept implementation for SQL Server 2005. Apart for the framework that I described in my earlier post, this is one another tool that adds value in terms of compliance and can prove as a standard auditing mechanism for convincing end clients too for SOX compliance.

Overview (reference: Microsoft Download Center)

This configuration pack contains configuration items intended to help you establish and validate a desired configuration for your SQL 2005 servers in order to support your Sarbanes-Oxley compliance efforts.

The Microsoft SQL Server 2005 Assessment Configuration Pack for Sarbanes-Oxley Act was developed in conjunction with Brabeion Software, and contains suggested system configurations that map best practices and standards to the requirements of the Sarbanes-Oxley Act. This configuration pack defines recommended configurations based on a number of settings-based configuration items, and assesses your system against the desired configuration.

Once imported into SCCM 2007, this configuration pack and its included configuration items can be targeted to collections of systems which are then audited for compliance with the recommended configurations. When settings are detected to be out of compliance, events will be generated and sent to the Configuration Manager server where the data is available for reporting.

Please note that the configuration items, settings, and values included in this configuration pack do not necessarily represent every configuration required for Sarbanes-Oxley compliance. The administrator should verify that the rules being evaluated meet the unique requirements for their specific environment. If desired, the administrator can also modify the configuration pack by adding, modifying, and deleting configuration items, settings, and values through the Desired Configuration Management user interface within the SCCM 2007 console.

Download Microsoft SQL Server 2005 Assessment Configuration Pack for Sarbanes-Oxley Act (SOX)

Thursday, March 12, 2009

Programming Script Task in SSIS 2008 : Dts object / ScriptObjectModel class

Tweet this !

The Dts object is actually a property on your package that is an instance of the Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptObjectModel. The Dts object provides a window into the package in which your script executes. The Dts object has properties and methods that allow you to interact with the package.

While programming the script task, below collections in the form of properties, provides seamless access to package components. Using the managed assemblies, the scope open up even beyond the limits of SSIS.

Below mentioned are links and description to the public properties of DTS object:

Connections: Returns the Connections collection of existing connections defined in the package that contains the Script task.

Events: Returns the collection of existing events defined in the package that contains the Script task.

ExecutionValue: Gets or sets a user-defined object that provides additional information about the results of task execution.

TaskResult: Returns the result of the Script Task which is an DTSExecResult enumeration value.

Transaction: Returns the transaction associated with the container of the Script task.

VariableDispenser: Returns the VariableDispenser that can be used within the Script task to work with existing variables. The VariableDispenser object provides methods for locking variables for read - only or read - write access and then retrieving them. Initially this was the standard way of accessing variables in scripts. The reason for the explicit locking mechanism is to allow control in the Script Task to keep two processes from competing for accessing and changing a variable.

Variables: Returns the Variables collection of existing variables defined in the package that contains the Script task.

Reference: MSDN BOL

Wednesday, March 11, 2009

Unit Testing SSIS Packages : Code Review Checklist for SSIS Packages

Tweet this !

Unit testing is of the the key development practice that should be religiously followed. A proper unit tested package can reduce 50% defects, that would come down post system testing or end-to-end testing.

The first part of unit testing starts from matching the developed package against the requirements. Below are some tips that can help to make unit testing of ssis packages, thorough, broad, and efficient. I have followed these ideas or approach of testing SSIS packages, while working with SSIS 2005 or SQL Server 2005.

1) Use a Script Task to put your debugging messages, and log them to the configured log provider. This can be particularly used to check variable values getting updated whenever they are updated by a task.

2) Use the ForceExecutionResult property with each task to check how the flow behaves in case of failure, success, and completion.

3) To emulate a long - running query, use the TSQL waitfor statement in the query. This will help to test scenarios like connection issues as well as functional dependencies, for eg, if a particular task completes faster than other.

4) Check naming conventions are followed as per the corporate / project naming convention policy document. This applies to packages, variables, tasks and components.

5) Test the same package on another box. This would help to test properties like DelayValidation property and PackageProtectionLevel property. Whenever SSIS Packages are executed, first the SSIS runtime checks the package for validity and provided it passes the validation, then it executes the package.

6) Execute the package using the DTExec Utility instead of executing it everytime from BIDS. Also if the package would be getting called from other package, for eg. master package, execute the package everytime from the parent package. This would help to test package configurations, as in such scenarios the package configuration used is parent package configuration.

7) Check against each task if errors are configured to be ignored. If Yes, cross-verify with developer the reason for the same and get it documented as part of annotation in the package or somewhere in the Technical Specification Document (TSD) of the package. These can filter out issues with Destination Adapters and components like Lookup task/component.

Tuesday, March 10, 2009

SSIS Performance Framework Design Considerations : SSIS Best Practices , SSIS Performance / Package Review Checklist

Tweet this !

Having implemented projects with end-to-end MS BI Implementation and after reading tons of best practices document from technical forums and blogs, I feel like sharing my experience about how many of these I have seen getting useful in real practice. Below is the list of some of the SSIS 2005 best practices and SSIS 2005 performance related hints which I have practically seen getting useful in projects that I have worked upon. I would rate it's usefulness on a scale of 1 to 10, and this rating is solely my viewpoint and experience.

Best Practices:

To avoid most of the package deployment error from one system to another system, set package protection level to ‘DontSaveSenstive’. [While working in a multi-user environment in a version-controlled system, this is very important. Rating: 6/10]

Once you copy-paste a script component and execute the package, it may fail. Just open the script editor of the pasted script component, save the script and execute the package – it will work. [This may or may not happen. I feel it's experience driven, but I have experienced this. Rating: 3/10]

In Master packages, expecially extract packages where a single package is extracting data from 20-30 odd tables, Group tasks together with annotations. This would seem very trivial during the start of development, but it would work like an air-conditioner when the development tempreature is at it's peak. [This keeps package manageable, easy to debug and more organized after annotations are added. Rating: 7/10]

Avoid unnecessary type casts. For example, the flat file connection manager, by default, uses string [DT_STR] data type for all the columns. Watch precision issues when using the money, float, and decimal types. Also, be aware the money is faster than decimal, and money has fewer precision considerations than float. Check my post for SQL Server to SSIS data type mapping for more info on data types. [Understanding of SSIS data type is very important from a design and performance perspective. Rating: 10/10]

For the SQL job that call the SSIS packages, make multiple steps each doing small tasks rather than a single step doing all the tasks. In the first case, the transaction log grows too big and if a rollback to happen, it may take the full processing space of the server. Do check out my post about Improving SSIS Performance of load packages using Chunking method. [Generally in a data warehouse loading scenario, only the very first load that populated data warehouse for the very first time is heavy. For the rest delta is loaded, and that delta is often divided into different dimension and fact loads. I do not rule out possibility of this scenario, but my belief is that the probability of this scenario is 50/50 in a data warehouse load. Rating: 5/10]

The control flow of an SSIS package threads together various control tasks. It is possible to set a transaction that can span into multiple tasks using the same connection. To enable this, the “retainsameconnection” property of the connection manager should be set to “True”. [This can save many connection hits to the database and gain at least 15-20% connection making load reduction of production database. Rating: 7/10]

Coming to SSIS performance related points, three of the top points which I have seen affecting performance directly and dramatically are Sorting, batchsize and commitsize properties related to transactions, and selection of appropriate components / tasks.

SSIS 2005 Performance Hints:

Sorting in SSIS is a time consuming operation. At the places where we know that data is coming from database tables, it’s better to perform the sorting operation at the database query itself. Sort the data using the ORDER BY clause in the query, and change the IsSorted property of the source adpaters to make the component and SSIS aware that the data is sorted.

When you execute SQL statements within SSIS whether to read a source, to perform a look transformation, or to change tables, some standard optimizations significantly help performance:

If Integration Services and SQL Server run on the same server, use the SQL Server destination instead of the OLE DB destination to improve performance.
Commit size 0 is fastest on heap bulk targets, because only one transaction is committed. If you cannot use 0, use the highest possible value of commit size to reduce the overhead of multiple-batch writing. Commit size = 0 is a bad idea if inserting into a Btree – because all incoming rows must be sorted at once into the target Btree—and if your memory is limited, you are likely to spill. Batchsize = 0 is ideal for inserting into a heap. For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.
Use a commit size of less than 5000 to avoid lock escalation when inserting; note that in SQL Server 2008 you can now enable/disable lock escalation at the object level, but use this wisely.
Heap inserts are typically faster than using a clustered index. This means that you may want to drop indexes and rebuild if you are changing a large part of the destination table; you will want to test your inserts both by keeping indexes in place and by dropping/disabling all indexes and rebuilding to validate.
Use partitions and partition SWITCH command; i.e., load a work table that contains a single partition and SWITCH it in to the main table after you build the indexes and put the constraints on.
A key network property is the packet size of your connection. By default this value is set to 4,096 bytes. This means a new network package must be assemble for every 4 KB of data. As noted in SqlConnection.PacketSize Property in the .NET Framework Class Library, increasing the packet size will improve performance because fewer network read and write operations are required to transfer a large data set.
Another network tuning technique is to use network affinity at the operating system level. At high throughputs, you can sometimes improve performance this way. For the network itself, you may want to work with your network specialists to enable jumbo frames to increase the default payload of 1,500 bytes to 9,000 bytes. By enabling jumbo frames, you will further decrease the amount of network operation required to move large data sets.

Right Component / Task selection for the right job: As per many discussions and forums, SCD Task performs poor and this starts getting realized as soon as the data flow increase beyond the limit of 50k records. As per the latest conference at SQLBits, I just saw the presentation slides of one of the speaker where it was suggested to use MERGE statement of SQL Server 2008 instead of SCD Task. Also one other poorly performing task is the OLE DB Command which does a row-by-row operation. Instead opt for a BULK Insert Command. These tasks should either not be used or should be used minimally with caution.

Reference: MSDN SSIS Forum - SSIS 15 Best Practices, SSIS Best Practices By SQLCAT Team

Monday, March 09, 2009

Tool to create data for Asset Management Data Warehouse plus SQL Server 2008 migration planning

Tweet this !

Asset Management is of the the areas where data warehousing can play it's role. You have not visited any infrastructure management blog, and this is for a sure a business intelligence blog :)

Before a couple months, I came across a requirement where using some tool, exhaustive data was to be collected and the same was required to be massaged in a format that can be boarded to data warehouse. Then it can to my ears that the tool is called MATH tool, but when I googled it, I came to know that the tool is called MAP.

This tool can create exhaustive data in and above IT environments of an organization, spanning many Windows platforms and now it also includes support for SQL Server 2008 migration. Also the variety of data that this tool can generate that too across the network, once taken into any data warehouse and analyzed, should definitely lead to cost savings. Though from a higher level this tool would look like any asset management utility, I see a different angle to it

For the convenience of the readers, I have mentioned the details about the MAP tool. These details are published by Microsoft and the orginial document can be accessed from here. Download MAP tool and evaluate it for a sample PC to analyze the kind of data that it can generate.

The Microsoft® Assessment and Planning Toolkit (MAP) is a powerful inventory, assessment, and reporting tool that can securely run in small or large IT environments without requiring the installation of agent software on any computers or devices. The data and analysis provided by this Solution Accelerator can significantly simplify the planning process for migrating to SQL Server 2008 along with other platforms which includes : Windows Vista®, Microsoft Office 2007, Windows Server® 2008, Windows Server 2008 Hyper-V, Virtual Server 2005 R2, and Microsoft Application Virtualization 4.5 (formerly SoftGrid), Microsoft SQL Server 2008, Microsoft Online Services, Forefront/NAP. Reports for these assessment scenarios are localized in French, German, Japanese, Korean, Brazilian Portuguese, and Spanish.

The Microsoft Assessment and Planning Toolkit (MAP) generates reports containing both summary and detailed assessment results for each migration scenario. The results are provided in both Microsoft Excel and Microsoft Word documents. Reports are generated for the following scenarios:

Identification of currently installed Windows Client operating systems, their hardware, and recommendations for migration to Windows Vista. This tool also reports if desktops have anti-virus and anti-malware programs installed or if the Windows Firewall is turned on.
Identification of currently installed Windows Server operating systems, their hardware, and recommendations for migration to Windows Server 2008.
Identification of currently installed Microsoft Office software and recommendations for migration to Microsoft Office 2007.
Detailed assessment and reporting of server utilization gathered using the Performance Metrics Wizard.
Recommendations for server consolidation and virtual machine placement using Hyper-V or Virtual Server 2005 R2.
Assessment of client machines, servers, and the technology environment for the implementation of Microsoft Application Virtualization (formerly SoftGrid).
Reporting of SNMP-enabled devices found in the environment during inventory.
Security Assessment: Create a report on existing client computers to identify machines where Security Center, antispyware, antivirus or firewall products are not found or are not running correctly.
Microsoft SQL Server Assessment: Create a report on existing client and server computers to identify machines where Microsoft SQL Server instances are found.
Virtual Machine Inventory: Create reports that identify virtual machines in your computer environment.
Power Savings Assessment: Create a proposal to identify server and client machines running in your environment and understand the power management capabilities available.
Optimized Desktop Scenario Selection Tool: This tool defines common-usage scenarios to help organizations choose the client computing scenarios that best meet the unique needs of their businesses. Use the Windows Optimized Desktop Scenario Selection Tool to identify one or more Optimized Desktop scenarios that suit your organization’s needs.

Sunday, March 08, 2009

Metadata information storage for Extraction or SSIS Extract Packages : SSIS Package Configuration

Tweet this !

Ideally, extraction process should be guided or driven by metadata. There always remains a margin of change in any OLTP source system though the change may not be that huge or frequent. In case, if you are into a project that follows a RUP model and OLTP development plus ETL or BI development in general is carried out in parallel, metadata inventory is the key guideline to keep the BI development and delivery in control.

Some of the different kind of metadata information that is generally required, at least from the extract perspective of SSIS are as below:

o List of tables to be extractedo Columns that are to be extracted for each table

o Delta detection of each table/synch

o Extraction Start and End Date Times

o Status of each table extract

o System parameters such as folder location, connection parameters

o Schedule durations

o Number of retries incase of failure

o Logging Parameters

The metadata information could be stored in configuration files, database tables, DSVs etc. Based on the complexity of configuration and flexibility for making changes you could choose different stores.

As per my viewpoint, a convenient and advisable option for the metadata storage would be to start from an indirect XML file, the location of which can be taken from an environment variable. This methodology has advantages like, during the development phase when debugging can be expected to be frequent, editing an XML file is more easy than updating a table as it would require database permissions and the complexities associated with it. Also just having the repository in the table won't suffice as the end-user generally can't be expected to change data manually by opening and editing tables. A user-interface would be required to be developed, to faciliate data manipulation within those configuration tables.

As the iterations of RUP progresses and moves towards the final iterations of the construction phase, storage location could be replaced by database tables and these should not take more than two man days efforts to have this change in place.

Saturday, March 07, 2009

SPARSE column property in SQL Server 2008

Tweet this !

Designed to optimize storage space for columns with a large percentage of NULLs, the option to designate a column as sparse is new in SQL Server 2008. To designate a column as SPARSE, the column must allow NULLs. When a NULL is stored in a column designated as SPARSE, no storage space is consumed. However, non-NULL values require 4 bytes of storage space in addition to the normal space consumed by the data type. Unless you have a high enough percentage of rows containing a NULL to offset the increased storage required for non-NULL values, you should not designate a column as SPARSE.

You cannot apply the SPARSE property to:

Columns with the ROWGUIDCOL or IDENTITY property
TEXT, NTEXT, IMAGE, TIMESTAMP, GEOMETRY, GEOGRAPHY, or user-defined data types
A VARBINARY(MAX) with the FILESTREAM property
A computed column of a column with a rule or default bound to it
Columns that are part of either a clustered index or a primary key
A column within an ALTER TABLE statement

More details on space saving by data-type and restrictions for using SPARSE columns can be found on BOL.

Reference: Microsoft Press Books

Friday, March 06, 2009

SQL Server 2008 data type to SSIS data type mapping

Tweet this !

While working with connections or source adapters, it's not that SSIS every time reads the source and maps to the precise SSIS data-type. For ex, especially while using a flat file connection, it happens many a times that the data-type of few of the columns are not as per the requirements. This frequently happens with sources like flat files and excel files.

Below is a mapping chart of SQL Server 2008 data types and SSIS data types which can come really handy to developers while working with tasks like source adapters, script tasks, derived columns and wherever a casting operation is being operated on a data-type.

SQL Server 2008 Data-type	SSIS Data-type
bigint	eight - byte signed integer [DT_I8]
binary	byte stream [DT_BYTES]
Bit	Boolean [DT_BOOL]
tinyint	single - byte unsigned integer [DT_UI1]
Datetime	database timestamp [DT_DBTIMESTAMP]
Decimal	numeric [DT_NUMERIC]
real	float [DT_R4]
Int	four - byte signed integer [DT_I4]
Image	image [DT_IMAGE]
Nvarchar or nchar	Unicode string [DT_WSTR]
Ntext	Unicode text stream [DT_NTEXT]
Numeric	numeric [DT_NUMERIC]
Smallint	two - byte signed integer [DT_I2]
Text	text stream [DT_TEXT]
Timestamp	byte stream [DT_BYTES]
Tinytint	single - byte unsigned integer [DT_UI1]
Uniqueidentifier	unique identifier [DT_GUID]
Varbinary	byte stream [DT_BYTES]
Varchar or char	string [DT_STR]
Xml	Unicode string [DT_WSTR]
Date	date-only [DT_DBDATE]
Time	time-only [DT_DBTIME]

Thursday, March 05, 2009

Considerations for using Snowflake schema : Star schema vs Snowflake schema

Tweet this !

In Dimensional modeling parlance, star schema and snowflake schema are the fundamental blocks of modeling a relational data warehouse. Generally the modeling starts with a draft of star schema and in almost a majority of cases, it ends up in a snowflake model. As per my views, any medium to large sized datawarehouse (where there are approximately 10+ fact tables and 20+ dimension tables) cannot suffice its implementation in a star schema model.

Star schema : The simplest relational schema for querying a data warehouse databases is the star schema. The star schema has a center, represented by a fact table, and the points of the star, represented by the dimension tables. From a technical perspective, the advantage of a star schema is that joins between the dimensions and the fact tables are simple, performance, ability to slicing and easy understanding of data

Snowflake schema : A variation of a star schema is called a snowflake. In a snowflake, the dimension tables are normalized. From a reporting perspective, there is no logical difference because all the tables of a snowflake dimension can be logically joined into a single star dimension. From a performance perspective, the snowflake may result in slower queries because of the additional joins required.

There remains a trade-off between performance and business requirements and/or technical limitations while making a decision for star or snowflake schema. A majority of the IT parade that works on BI projects are mere code-workers and do not have an understanding of why the schema is modeled in a particular way, when theories like normalization already exists and have been in practice in OLTP systems.

Two basic reasons why a snowflake schema is preferred over star schema though there would be a slight trade-off in performance are as below:

1) Size and Storage : Consider a snowflake schema to reduce storage requirements for a large dimension tables. For example, Storing the information about the manufacturer in a product dimension table would make a product dimension unnecessarily large because one manufacturer may provide multiple products. Larger the dimension, the more time it will take to process while processing a cube.

2) Integrity : Consider snowflaking a dimension if one entity is referenced by multiple dimensions. For example, Customers, Resellers, and Manufacturers may all have a Geography component. Extracting geography information into a normalized table allows for consistency between geographical definitions for the different dimensions. A snowflaked dimension accessed from multiple dimensions is called a reference dimension.

This article is of a very fundamental level, but should help those fresh and curious brains who keep on wondering about why the relational data warehouse was modeled against the normalization theory that they have seen getting practiced in regular OLTP systems.

Wednesday, March 04, 2009

Improve Performance of ETL / SSIS Load Package using "Chunking" method

Tweet this !

In a data warehouse environment, when changes / delta is being loaded to data warehouse, especially during the very first load when almost all the records from OLTP system are brought to the data warehouse for the ver first population, the time taken to populate it is generally high. Also in Retail and CRM applications which operate round the clock and across the globe, deltas also remain quite huge. If the OLTP system is quite old and number of records to be loaded are in units of million, performance becomes one important point.

Instead of executing a very large operation in a single statement call, you can break the modification into smaller pieces, potentially increasing performance and improving database concurrency for larger, frequently accessed tables. This technique is often used for large data loads to reporting or data warehouse applications.

Large, single-set updates can cause the database transaction log to grow considerably. When processing in chunks, each chunk is committed after completion, allowing SQL Server to potentially reuse that transaction log space. In addition to transaction log space, on a very large data update, if the query must be cancelled, you may have to wait a long time while the transaction rolls back. With smaller chunks, you can continue with your update more quickly. Also, chunking allows more concurrency against the modified table, allowing user queries to jump in, instead of waiting several minutes for a large modification to complete.

This "chunking" method can be used with INSERTs, DELETEs and UPDATEs too, along with SELECT. The expanded functionality of TOP (beyond just SELECT) adds a new technique for managing large data modifications against a table. By reducing the size of large modifications, you can improve database concurrency by reducing the time that locks are held during the operation (leaving small windows for other sessions), and also help manage the size of the transaction log (more commits, instead of one single commit for a gigantic transaction).

Max Commit and Batch Size property implements the above theory. As per some of the SSIS best practices recommended by SQL CAT Team to improve SSIS Performance Tuning and Optimization:

Commit size 0 is fastest on heap bulk targets, because only one transaction is committed. If you cannot use 0, use the highest possible value of commit size to reduce the overhead of multiple-batch writing. Commit size = 0 is a bad idea if inserting into a Btree – because all incoming rows must be sorted at once into the target Btree—and if your memory is limited, you are likely to spill. Batchsize = 0 is ideal for inserting into a heap. For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.
Use a commit size of less than 5000 to avoid lock escalation when inserting; note that in SQL Server 2008 you can now enable/disable lock escalation at the object level, but use this wisely.
Note that a commit size value of 0 might cause the running package to stop responding if the OLE DB destination and another data flow component are updating the same source table. To prevent the package from stopping, set the Maximum insert commit size option to 2147483647.

Tuesday, March 31, 2009

Monday, March 30, 2009

Saturday, March 28, 2009

Friday, March 27, 2009

Thursday, March 26, 2009

Tuesday, March 24, 2009

Monday, March 23, 2009

Saturday, March 21, 2009

Friday, March 20, 2009

Thursday, March 19, 2009

Wednesday, March 18, 2009

Tuesday, March 17, 2009

Monday, March 16, 2009

Friday, March 13, 2009

Thursday, March 12, 2009

Wednesday, March 11, 2009

Tuesday, March 10, 2009

Monday, March 09, 2009

Sunday, March 08, 2009

Saturday, March 07, 2009

Friday, March 06, 2009

Thursday, March 05, 2009

Wednesday, March 04, 2009

Latest Trends and Technologies

Elasticsearch Resources

Hadoop, BIG Data, and Cloud

Read My Articles

Microsoft Business Intelligence

SQL Server Product Team Blogs

Community

MS BI 2008 Whitepapers

Article Category

MS BI 2008 Video Tutorials

Blog Archive