Wednesday, December 29, 2010

SSAS Engine , SSIS Engine and SSRS Engine

I'm reading: SSAS Engine , SSIS Engine and SSRS EngineTweet this !
Speaking about Microsoft Business Intelligence stack i.e. SSIS / SSAS / SSRS, there are different engines associated with each services. If you are ignorant about these engines, you probably are not fit to design the architecture of your solution using the respective services. The major engines that comes into consideration when you are using MS BI stack are as below:

1) SSIS Runtime Engine - This engine takes care of the administration and execution section of SSIS. In a developer language, I would consider it a Control Flow + SSMS of SSIS.

2) SSIS Data Flow Engine - This engine can be considered as the Buffer Manager of SSIS in-memory architecture.

3) SSAS Formula Engine - The engine takes care of resolving the retrieval of members on any axis of an MDX query. Tuning the performance of this engine has much to do with MDX tuning.

4) SSAS Storage Engine - This engine can be considered as the Data Manager of SSAS, which decides what data needs to be fetched from where. If you trouble Formula Engine, there is a good possibility that this would cascade to Storage Engine, which directly deals with aggregations.

5) SSRS Service Endpoint - This cannot be technically considered as an engine, as most people would argue that rendering / authentication / processing are engines, but I consider these as extensions rather than engines. This endpoint takes care of the administration part of SSRS. Anything that you can do with Reports Manager is a virtue of this endpoint.

6) SSRS Execution Endpoint - This is the endpoint that one would like to award the medal of being an engine. This endpoint takes care of executing the report right from processing the RDL till rendering the report.

You can read more about each of these in MSDN as well as different books and blogs. But until you thoroughly understand the function of these engines and you are designing the architecture, I am of the opinion that one should not feel confident about the architecture design.

Tuesday, December 28, 2010

SSIS Architecture Design Document

I'm reading: SSIS Architecture Design DocumentTweet this !
When I was quite junior in the IT industry, I use to wonder how one designs an architecture document. I used to collect templates of these documents from different sources. Today when I have matured enough to design one myself with my own sense and understanding of system, I feel that a basic idea of the sections that the SSIS architecture design document should contain would be of interest to all those minds who are growing up as future architects.

Delta Detection - This section would contain details about incremental load and Change Data Capture.

Extraction & Staging - This section would contain details regarding the treatment of extracted data and whether permanent / temporary staging are being used. If a staging area is used, it would require more elaboration about the same.

Facades - Interfaces and contracts are a vital part of any architecture design. This section would contain the views / SPs that you would create in your OLTP system. Even your CDC SPs can act as your facade. Also this section would describe whether you are using a push / pull model.

Trigger mechanism - Many solutions contain a application which triggers the ETL cycle. Whether you would use any scheduler like Autosys / SQL Agent should be described here.

Process Control Structure - Each category of packages classified by functionality should log which package was doing what and when. This would provide monitoring layer and control over the entire ETL execution.

Environment configuration - All details about environment variables, package configurations, and the storage locations of those configurations should be described in this section.

Tracability - Logging details should be mentioned in this section. Auditing can be of huge importance to support teams, so this section should be documented and designed keeping support teams in consideration.

Transaction Handling - Details about transaction handling within and across packages belongs to this section.

Error Handling - This section is very important to almost all teams screening your architecture document. Also is your error handling is weak, your transaction handling might suffer and this can be catastrophic to any kind of data loading.

Automation - Particularly in data warehousing projects, automation is one of the regular requirements. Trigger mechanism falls within the automation umbrella upto an extent.

Scalability - The more you loosely couple your package structure to take advantage of parallelism and more you design your logic keeping memory (synchronous and asynchronous transformation)in consideration, the more you gain in scalability. These details should be mentioned in this section.

Hardware Configuration - Server configuration details right from the number of cores till the amount of RAM that you intend to use belongs to this section.

Change Control & Deployment - Deployment methodology and location should be elaborated in this section. Though change control is a part of the configuration exercise, it's linked with how you would manage deployment. So it would make more sense to elaborate Change Control methodology in this section.

Feel free to add your comments to make this section more complete.

Monday, December 20, 2010

Tool to create / support BUS architecture based data warehouse design

I'm reading: Tool to create / support BUS architecture based data warehouse designTweet this !
Whatsoever powerful SSAS may be, when it comes to starting a fresh new dimensional modeling exercise, using SSAS is the last step in the process i.e. data warehouse implementation. Dimensional modeling starts with the understanding of how the clients want to analyse their business, which implicitly involves identifying the ER of the targeted business models. Right from there, one needs to develop a BUS matrix (provided you are following kimball methodology and BUS architecture) followed by a Data Map / Data Dictionary.

Once you have the blue-print ready, artifacts required to build the anatomy of the data warehouse needs to be built, and two of the major ones are:
1) ETL routines to shape your data compliant to Data Mart design.
2) Relational Data warehouse / Data Mart i.e. dimension and fact tables and other database objects that would hold your data transformed by ETL.

The process sounds quite crystal clear, but when you are developing from scratch, and when your data warehouse and dimensional modeling is in the phase of evolution, there is one tool which can be very instrumental in designing the same. The wonderful part is that this tool / template comes for free from the courtesy of kimball group, and it's called Dimensional Modeling Spreadsheet.

Dimensional Modeling Spreadsheet: This template spreadsheet can help you to create your entire data dictionary / data map for your dimensional model, and it contains samples for some of the basic dimensions used in almost any dimensional model. The unique thing about this spreadsheet is that once you have keyed in your design, it has the option to create SQL out of your model. You can use this SQL Script in your database and create the dimension and fact tables right out of it, which means that your data mart / relational data warehouse is ready to store the data. Also this spreadsheet can form the base for your ETL routines. The only other tool in my knowledge which can serve near to this functionality is Wherescape RED, and of course it's not free, as it serves a lot more than just this.

You can read more about this spreadsheet in the book "The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset". For those who are fresh to dimensional modeling concepts, read this
article to gain a basic idea of the life-cycle.

Sunday, December 19, 2010

Developing different types of reports in SSRS 2008 R2

I'm reading: Developing different types of reports in SSRS 2008 R2Tweet this !
SSRS 2008 R2 has advanced the reporting functionality to a long way, since the time of 2000. There are many different types of reports that can be created using SSRS and each serves a different functionality, but still each of these can be categorized into a few common groups. Different types of reports that can be created using SSRS are as below:
  • Parameterized reports
  • Linked reports
  • Snapshot reports
  • Cached reports
  • Ad hoc reports
  • Clickthrough reports
  • Drilldown reports
  • Drillthrough reports
  • Subreports

The categories of reports that I see from a development / production aspect are as below:

  • Business Users authored reports: Ad hoc reports and Clickthrough reports can be classified in this category of reports.

  • Performance Boosters: Cached reports and Snapshot reports can be classified in this category of reports.

  • Data / Problem Decomposition: Drill-down and Drill-through reports can be classified in this category of reports. These are probably the reports of maximum value for business users.

  • Business As Usual: Parameterized reports are the most common kind of reports, and can be classified in BAU category.

  • Reusable / Versioned reports: Subreports and Linked reports can be classified in this category. These are probably the most focused category of reports from a development perspective.

For more detail on the types of reports in SSRS 2008 R2, a very detailed explanation can be read on MSDN from here.

Tuesday, December 14, 2010

Zip and Email SSRS reports requires use of SSIS - True or False ?

I'm reading: Zip and Email SSRS reports requires use of SSIS - True or False ?Tweet this !
Recently one of my blog reader posted me a question where the requirement is to zip and email SSRS reports to users. Reader wanted to check with me if this can be done using subscriptions and any kind of delivery method. The two main delivery methods available are file-system and email delivery, and to the best of my knowledge, none of them include zipping of files.

When report files are huge in size, to the magnitude of several MBs, emailing those files is quite a big challenge and also it's not an advisable option. In my views, SSRS team should make an option available of zipping the file to at least any one compressed file format, when the file are delivered using file-system or email delivery methods. In my views, these are some of the very common reporting related requirements that SSRS would have to incorporate to gain acceptance as a serious and enterprise class reporting platform.

The order of this process can be as follows:

1) Create report output stream using subscriptions, and the output can be in requested file format.
2) Zip the output stream to any compressed file format, at least .cab format which windows has been using since the times of it's evolution.
3) Deliver the file using file-system / email delivery methods.

As of now, there are no out-of-box options to directly deliver zipped reports to file share output / email the same. SSIS can be used for the same purpose. In the whitepaper that I have authored, I have explained how to generate SSRS reports in bulk using SSIS and SSRS 2008 R2. With little more coding in the Script task, you can zip the created files and email them if required. Two of the articles that demonstrate how to implement the zipping functionality can be read from here and here.

Wednesday, December 08, 2010

Achieve high availability of cubes using SSAS and SSIS

I'm reading: Achieve high availability of cubes using SSAS and SSISTweet this !
Recently, I was faced with two different questions at two different events and both of questions were directly or indirectly linked to high availability of cubes for querying. Those two questions were:

1) What is the difference between scale out and scale up?
2) How would use ensure 24 x 7 availability of a cube, considering the point that globally users are accessing the cube, and the cube should always remain available for querying?

The answer to the first question is when you need to achieve parallelism for concurrency in querying or processing, you distribute / replicate processing operations and/or data on multiple nodes. Scale up usually means that you increase the capacity of the host to enable the server to cater the incoming load. When the capacities of scaling up ends, scaling out steps in.

The next question was quite interesting, and the challenge was that I was in a situation to instantly think of a design and answer this query. I answered this question correctly, and to my delight, I found this whitepaper which is exactly what I answered. Such moments bring a lot of happiness and confidence that my knowledge has not gone stale and I can continue to provide consulting in MS BI business.

The presentation layer is coupled with SSAS query server. Data is read from relational engine and cube is processed on a separate server, which can be considered another layer altogether. After the cube is processed, query server and processing server are synchronized. For multi-server synchronization, SSIS is used. The below two diagrams demonstrates the same. Entire whitepaper can be read from
here.


Tuesday, December 07, 2010

MS BI RFP - Key differentiators to stay ahead in the league

I'm reading: MS BI RFP - Key differentiators to stay ahead in the leagueTweet this !
Right from the role of an associate architect to the role of an account manager, right from pre-sales to account management, I have evidenced that at every stage, one remains faced with the challenge of providing a key differentiator. This challenge can with be with your competitors or even within different teams competing for excellence. Talking about the Request For Proposals that clients invite from multiple solution providers, there are many factors that act as a differentiator which helps client in their decision making. In my views, RFP is not just science, it's art + science, where you blend a solution such that the client is able to accept, digest and accommodate. Please keep in view that the below points are from the eyes of an architect, and not a typical sales person / project management professional.

1) Robustness, outsourcing, design patterns, certified professionals, company size and more. Sales folks would swear by these words, but in my views these are not the differentiators. These can bring a good level of confidence in the engagement economics of a project, but not in the design of the solution. All reputed solution providers would have more or less same traditional design, same outsourcing pyramid, same quality of staff more or less. I have seen huge multinational clients giving away projects to small companies, considering the philosophy that "It's better to be top valued client of 10th best solution provider, than being 50th valued client of the top ranked solution provider."

2) Outsourcing might provide a cheaper solution development cost, but if the solution design has the effect of raising TCO (Total Cost of Ownership), then clients would definitely shy away from the same. This would work well with support projects, but not in case of a fresh solution design and development. One should always pay attention to the technology roadmaps shared by technology providers like Microsoft, Oracle and others, so that the designed solution is ensured to sustain for an acceptable duration of time with reasonable support.

3) Choice of technology platform and components is very vital. Technology is highly driven by innovation, and if your solution completely lacks innovation, even a weak competitor may nail a big blow with an innovative solution that might appeal the client in a brilliant manner. Cloud and SaaS are two of such areas. When a solution can be provided using MS BI and Amazon EC2, a solution proposed using even SQL Server Standard Edition with the biggest range of outsourcing, would fall flat. An innovative solution using a mix of cloud and in-premise technology components can defy even an extremely low cost traditional in-premise solution, as license costs would be the killer.

4) Fabrication speed of solution design and development can help you quote very aggressive "Time to Market". This can be powered by a very well planned Center Of Excellence practice within an organization. This gives the dual advantage of readiness to deal with any technical challenges and speed of execution. Whenever some client visits a production plant, they are shown the machinery and the production units, as a part of confidence building exercise. Listing out in-house developed components / framework ready to be plugged in the proposed solution can be a very big catalyzer to convince clients of an aggressive time to market proposal.

5) Pluggability. I know there is no such word like this, but what I mean is, how flexible is the solution in terms of to and fro integration with external systems, along with it's reusability. Archiecture should be consciously designed, such that each logical unit can be consumed by any external subscribers with least possible complexity, which is generally buzzed by the term Service Oriented Archiecture.

Different stakeholders of project acquisition and management would have different views to it. But being an architect, one needs to contribute directly or indirectly to pre-sales activities, and the above views can be helpful to save your shoulder from burning, if architecture design is at blame. Solution and architecture might be perfect from a technology perspective, but this is not the perspective from which clients would see at it. There are a lot of factors that contribute to a winning bid, but considering the scope and role of an architect in the RFP, I feel about points would sound fair. If you have more to add to this, please feel free to add your comments or email me to share your views.

Thursday, December 02, 2010

Limitations / Disadvantages of using Calculated Measures / Calculated Members in SSAS

I'm reading: Limitations / Disadvantages of using Calculated Measures / Calculated Members in SSASTweet this !
In my views, Designing is a process that is driven entirely by impact analysis. However trivial a product / tool / technology feature may be, if it's used without thoroughly analyzing the impact it can have on the overall system / solution that has to be developed, this would mean that you just signed up a guaranteed future roadblock.

Calculated measures / members seem like a very straightforward design feature, and whenever you feel need / shortage of some members / measures, calculated ones seems to be a very easily available option from a development perspective. But there is another side of the same, and these points should be kept in consideration before you make the decision of going with calculated members / measures.

1) Drillthrough does not operate on calculated measures or any other calculations that reference calculated measures / calculated members. This means, for example, if you have created a calculated measure on the cube which the user might opt to use as a part of drillthrough dataset, this means that now you are stuck and you need to find a workaround.

2) Calculated measures cannot be secured using Dimension Security in a straight forward manner, in fact they won't be listed at all in the Dimension tab of the role where we define the Dimension security. Also when security is applied on regular members, and due to the same, if they are not available to calculated members, they would fail i.e. when such measures are browsed in client tools like Excel, the value that would be displayed is an error value like #VALUE.

Based on the above two points, calculated measures / members should be scanned against drillthrough and security requirements, so that a trivial overlook in design doesn't translate into a showstopper issue over the period of time at a later stage.

Monday, November 29, 2010

SSAS Interview Questions / SSAS Training Curriculum

I'm reading: SSAS Interview Questions / SSAS Training CurriculumTweet this !
Whenever one wants to learn something or make sure one is competent enough to take the helm of any challenge in a particular technology, the first thing one needs to know is what one should be knowing. In simple words one should be aware of the topics that one needs to cover, then the next point is how much ground has already been covered and how much is yet to be covered. Below is a list of roughly drafted high level areas of SSAS in no particular order, which can be considered as a descent coverage, whether it's considered for SSAS training / SSAS interview. Keep in view that though the below coverage covers a major ground, it's not exhaustive and it can be used as a reference check to make sure you cover enough in your trainings / to make sure you have covered major fundamental areas.

  • Types of Dimensions
  • Types of Measures
  • Types of relationships between dimensions and measuregroups: None (IgnoreUnrelatedDimensions), Fact, Regular, Reference, Many to Many, Data Mining
  • Star Vs Snowflake schema and Dimensional modeling
  • Data storage modes - MOLAP, ROLAP, HOLAP
  • MDX Query syntax
  • Functions used commonly in MDX like Filter, Descendants, BAsc and others
  • Difference between EXISTS AND EXISTING, NON EMPTY keyword and function, NON_EMPTY_BEHAVIOR, ParallelPeriod, AUTOEXISTS
  • Difference between static and dynamic set
  • Difference between natural and unnatural hierarchy, attribute relationships
  • Difference between rigid and flexible relationships
  • Difference between attirubte hierarchy and user hierarchy
  • Dimension, Hierarchy, Level, and Members
  • Difference between database dimension and cube dimension
  • Importance of CALCULATE keyword in MDX script, data pass and limiting cube space
  • Effect of materialize
  • Partition processing and Aggregation Usage Wizard
  • Perspectives, Translations, Linked Object Wizard
  • Handling late arriving dimensions / early arriving facts
  • Proactive caching, Lazy aggregations
  • Partition processing options
  • Role playing Dimensions, Junk Dimensions, Conformed Dimensions, SCD and other types of dimensions
  • Parent Child Hierarchy, NamingTemplate property, MemberWithLeafLevelData property
  • Cube performance, MDX performance
  • How to pass parameter in MDX
  • SSAS 2005 vs SSAS 2008
  • Dimension security vs Cell security
  • SCOPE statement, THIS keyword, SUBCUBE
  • CASE (CASE, WHEN, THEN, ELSE, END) statement, IF THEN END IF, IS keyword, HAVING clause
  • CELL CALCULATION and CONDITION clause
  • RECURSION and FREEZE statement
  • Common types of errors encountered while processing a dimension / measure groups / cube
  • Logging and monitoring MDX scripts and cube performance

Thursday, November 25, 2010

How SSIS Performance can be optimized using Change Data Capture and/or Change Tracking , MERGE , and Filtered Indexes

I'm reading: How SSIS Performance can be optimized using Change Data Capture and/or Change Tracking , MERGE , and Filtered IndexesTweet this !
Delta detection, source data extraction and smooth sourcing of input data from OLTP and/or Relational data warehouse is a part of the overall ETL process. Though the performance optimization boundary is outside the regime of SSIS, but these are the fuelling stations of SSIS. OLTP and relational data warehouse are the two main data stations from which SSIS fetches and/or loads data, and relational data warehouse (i.e. data marts / ODS, whatever be the case) is often used to read as well as write data, and optimizing these relational sources can contribute to performance optimization of SSIS packages. In case if you are using SQL Server 2008 platform, some of those that can be effectively used for optimizing data access especially by SSIS packages are as below:

1) Change Data Capture (CDC) and/or Change Tracking: I would not go into the details of the difference between Change Data Capture and Change Tracking. But this is one of the effective means of delta detection. Actually more than performance, it provides a controlled and automated infrastructure for delta detection. Depending upon the kind of algorithm followed in the master ETL package that handles delta detection and extraction, CDC / Change Tracking can be used. A whitepaper that explain in depth methodology of integrating CDC with SSIS can be downloaded from here.

2) MERGE statement: MERGE is one of the smart means to off load processing of data for Slowly Changing Dimensions. A nice brief article from kimballgroup explaining the same concept can be read from here.

3) Filtered Indexes: Mostly everyone is aware of two types of indexes in SQL Server - Clustered and Non-Clustered. But Filtered Index is a new advancement and is a very useful weapon to optimize OLTP as well as relational data warehouse for optimizing data access without impacting the overall system or impacting only the system for which SSIS needs a helping hand from the relational sources for faster data access. Again I would not go into the details of filtered indexes, and a thorough section that explains the same can be read on MSDN from here.

Mostly we talk of how to optimize SSIS package performance, but the bigger picture is how do we optimize ETL process where SSIS is the driving ETL tool, and these pointers can provide some direction in an effort to optimize SSIS package performance, though indirectly.

Tuesday, November 23, 2010

Download SSIS Denali Video Tutorials and Master Data Services Video Tutorials

I'm reading: Download SSIS Denali Video Tutorials and Master Data Services Video TutorialsTweet this !
SSIS Team has posted lot of videos and resources on the SQL Server Denali Resource Center. My blog readers would have noticed that I have added a permanent link before a week. Recently SSIS Team have shared couple of SSIS Denali Videos on their blog, and without any big background story, keeping it plain and simple, find these videos listed below available for your download. Also there's a How-to video made available on Technet, which is related to Master Data Services and it explains how to import data into Master Data Services in SQL Server 2008 R2.

Master Data Services Video Tutorial

SQL Server Integration Services Denali Video Tutorials

Migration and Deployment

Impact Analysis and Data Lineage

SSIS Designer Enhancements

Monday, November 22, 2010

Lineage Analysis in SSIS

I'm reading: Lineage Analysis in SSISTweet this !
In the recently released video / webcast of "What's new in SSIS Denali", SSIS team talks a little about Lineage analysis and the efforts they are making in this direction. The purpose that seems behind these efforts are to provide a transparent tracking of data flow end-to-end right from it's source where the data comes into the scope of tracking till the data makes to the end-user territory i.e. dashboards or excel. This can potentially open the scope of a new business stream in itself, but still that is far.

Lineage analysis is quite important even in the ETL development methodology. In SSIS, lineage of a data stream or I can say lineage of each field is tracked using a LineageID property. Understanding of SSIS architecture is quite important to understand where and why LineageID changes. Wherever the transformation is asynchronous, or precisely speaking whenever the input buffer and output buffers from the transformation are different due to the processing that a transformation applies on the buffer, the source of data generation can be considered as the new lap in the race.

This can been seen like a relay race where one runner hands over the baton to the next runner, and here the baton is the data. But when the race ends, one needs to know that how many runners exchanged the baton and which runner kept the baton for what amount of time. This is logically what we mean by lineage analysis. Whenever an asynchronous transformation is used, it creates a closing door for the logical flow before it, and an opening door for the logical flow after it. Such logical flows are classified as execution trees in SSIS. Support for lineage analysis is quite limited out-of-box in SSIS, and you can use Metadata Toolkit 2008 upto an extent for the same.

Taking this lineage analysis to a higher level, the same kind of analysis across the end-to-end BI lifecycle right from OLTP till Dashboards is very difficult to envision, implement, track and analyze. This would require some service contracts between MS BI services i.e. SSIS - SSAS - SSRS at least. When this data would make it to dashboarding platform which is as of now Performancepoint Services in Sharepoint 2010, it would again have to have a new kind of integration with Sharepoint which is out of SQL Server territory. Still there seems to be a way out as Sharepoint feeds on SQL Server. This is probably what we can expect to see in future.

Thursday, November 18, 2010

SSIS , Azure , Sync Framework = Economic solution for synchronization business

I'm reading: SSIS , Azure , Sync Framework = Economic solution for synchronization businessTweet this !
SSIS is independently famous as an ETL tool, but often a tool is a good as the purpose it can cater. Often integration is just a piece of the entire process, that hands over the delivered output to the next runner of the relay race.

In today's competitive and budget economy, one of the biggest challenges that architecture designs as well as project costs face are CAPEX ( Capital Expenditure ) and OPEX ( Operational Expenditure ). If you have underutilized resources and your architecture design is such that it can raise CAPEX and/or OPEX over a period of time, it's definitely going to fail when a RFP (Request for Proposal) is sent to the client. To gain competitive edge, today's architecture design propositions necessarily needs to have dual flavors - one in the traditional way and another which includes a cloud based economy design. Cloud involvement as a platform or as a component in the solution design is the only most promising way to provide a competitive edge to your propositions. In a typical microsoft platform solution, in my views, the primary cloud choice is Amazon EC2 and Azure platform is gaining adoption and popularity gradually.

Sync Framework is one of the robust mechanisms to synchronize data across boundaries spanning networks / systems / machines. This framework can be integrated very well with .NET applications, effectively synchronization of a variety of data can be catered by this framework. A very common example is to synchronize backup of a database across geographically separate systems where each terminal need to synchronize from a common master backup of the database.

Now if we join this separate piece of the puzzle, it can form a solution to a very big business that can be seen in almost every enterprise. Huge multinational corporations often have various companies operating under a single umbrealla, for eg. CitiGroup may have CitiBank, CitiCorp, CitiSecurities etc.. each having their own business, and each company might be implementing different IT systems from legacy to cloud based. A few of these might be occasionally connected system to a central CitiGroup headquaters which might be storing centralized key data summarizing all business and geographies.

1) SSIS can be used to provide the first level of integration for each independent child company. The central controlling office might need to gather data and even the child companies might need to receive different kinds of data from the central office. These child companies might be occasionally connected or dedicatedly connected to the head office.

2) After SSIS has kept the parcel ready, Azure platform i.e. cloud can be used as a distribution network. Cloud is an economic choice, considering the time for which storage might be required for occasionally connected systems. Also Windows Azure Content Distribution Network can be utilized to replicate this data near to the geography of the child offices / businesses.

3) Sync Framework supports cloud as a data source (to the best of my knowledge). Storage and retrieval on the cloud can be controlled by sync framework, to customize the kind of replication and synchronization required for each particular businesses.

4) One can hook a .NET application and keep a monitor on these processes. Also if required, Sharepoint can be used for a collaborative control and monitoring of these processes.

Every enterprise need one or another kind of replication and synchronization of data, and using SSIS, Azure and Sync Framework (optionally), solution providers can exploit the hidden businesses that lies in every enterprise. I would be interested to hear on this viewpoint from Sales folks, especially those who deal in MS BI business.

Monday, November 15, 2010

Download SSIS Denali ( 2011 ) CTP1 video tutorial / tour

I'm reading: Download SSIS Denali ( 2011 ) CTP1 video tutorial / tourTweet this !
I can see SQL Server blogosphere getting painted with new discoveries that professionals make by browsing CTP1. Then they make a lot of efforts to draft a step by step tutorial on how to use the new feature and post in on their blogs. This is quite a lot of effort, especially for a lazy person like me. What am I doing to explore new features and share the same? Well my resource detection radar picks up any useful resource that can cut my efforts and help me explore things in an appealing and effective manner. And this time my radar has dug out a video from TechEd Europe 2011, where SSIS Group program manager Steve Swartz demonstrates the upcoming enhancements in SSIS in Denali / 2011 version, along with detailed theoretical explanation. This 1 hr video can be watched as well as downloaded from here.

Those who do not have time and patience to watch this 1Hr video, here are some bullet points in my version of explanation.

1) From a usability perspective, controls toolbox seems more organized. Also some pampering controls like "Source Assistant" are provided for creating connection managers. Fortunately a perfectly working Undo and Redo are in place.

2) Transforms were very brittle in earlier versions, and change in metadata would very easily break the flow. Also the transform were being left almost uneditable once the input source breaks. This has been taken care of now, and a new mapping dialog has come in place to take care of change metadata from input sources. This is a very big welcome change.

3) Project, Parameters and Environments are new concepts that are introduced. Actually again if I look from application development territory, Project and Parameters seems like global and local scoped variables. Now a package contains variables and parameters. Again Parameters and Environment takes the same level semantics of local and global. Packages are the execution units on which these scopes get applied. Check out the video for a better understanding with demonstration.

4) Improved build and deployment model, tighter integration of SSIS with SSMS as well as tighter control and configuration of SSIS from SQL Server. Earlier sometimes SSIS seemed to me more of an independent application, which uses SQL Server just as a storage box. Now it seems much more integrated than earlier version.

5) Data Quality Services and Lineage Analysis are two of the most exciting enhancements that we are looking forward to be introduced in upcoming CTPs. If these points are getting your nerves excited, watch out the video.

Sunday, November 14, 2010

BISM model from the eyes of MVC / MVP design pattern

I'm reading: BISM model from the eyes of MVC / MVP design patternTweet this !
Having spent quite some time in application development world, design pattern is one thing that I cannot take out easily from my head. I was going through a post by T.K.Anand on Analysis Services Roadmap for SQL Server Denali and Beyond and a particular section gave me a glimpse of what looks like Model View Controller / Model View Presenter design pattern in .NET / application development world. Below is an excerpt from the post. I have highlighted my comments inline with these points. Before reading these points, take a look at the first BISM image shared by Microsoft.


The BI Semantic Model can be viewed as a 3-layer model:
  • The Data Model layer that is exposed to client applications. Even though the BISM model is fundamentally relational, it can expose itself using both relational as well as multidimensional interfaces. OLAP-aware client applications such as Excel can consume the multidimensional interface and send MDX queries to the model. On the other hand, a free-form reporting application such as Crescent can use the relational interface and send DAX queries.

[Siddharth]: This can be seen as the view part of the MVC pattern. If I were an application developer I would call this layer as the presentation interface / facade. Crescent is providing support only to BISM models as of now. Crescent and SQL Azure Reporting are starting off on a conservative basis where they are providing support to BISM Models and SQL Azure databases only, respectively.

  • The Business Logic layer that encapsulates the intelligence in the model. The business logic is created by the model author using DAX (Data Analysis Expressions) or MDX (Multidimensional Expressions). DAX is an expression language based on Excel formulas that was introduced in PowerPivot and built on relational concepts. It does not offer the power and flexibility that MDX does, but it is simpler to use and requires minimal tuning. There will always be sophisticated BI applications that need the power of MDX calculations and we envision that the BI Semantic Model will offer the choice of MDX as well, but this will likely come in a release after Denali.

[Siddharth]: This layer can be seen as the controller part of the MVC pattern. BI Semantic Model would offer the choice of MDX and would be available in future versions, means that it is definitely not a priority right now. In my views not just the sophisticated applications ones, but almost all existing applications need MDX. It would have been nice if it would have been the other way round, where MDX support is provided now with partial DAX now and rest in service packs / future release.

  • The Data Access layer that integrates data from various sources – relational databases, business applications, flat files, OData feeds, etc. There are two options for data access – cached and realtime. The cached mode pulls in data from all the sources and stores in the VertiPaq in-memory column store. VertiPaq is a breakthrough technology that encapsulates state-of-art data compression algorithms along with a sophisticated multi-threaded query processor that is optimized for the latest multi-core chipsets, thereby delivering blazing fast performance with no need for indexing, aggregates or tuning. The realtime mode, on the other hand, is a completely passthrough mode that pushes the query processing and business logic evaluation down to the data source, thereby exploiting the capabilities of the source system and avoiding the need to copy the data into the VertiPaq store. Obviously there is a tradeoff between these two modes (the high performance of VertiPaq versus the latency and overhead associated with copying the data into VertiPaq) and that choice is left to the model author.

[Siddharth]: This layer can be seen as the model part of MVC pattern. If I am getting this right, this means MOLAP evoparates here. No doubt Vertipaq is the demand of time, but legacy applications cannot make an entry into this layer, or source data from their data marts to this layer and adopt this model.

In my views, neither Microsoft nor Developer community can decide what industry has to adopt. Only customer needs and market standpoint of BISM in comparison to it's competition can decide the success of BISM as the probable successor of it's predecessor.

Friday, November 12, 2010

Future of MOLAP, Cube, MDX against BISM

I'm reading: Future of MOLAP, Cube, MDX against BISMTweet this !
Microsoft is of the opinion that MOLAP and BISM would survive together like C++ and C# in .NET. If I think as a customer, when I am aware of the roadmap that has been proposed, I would neither go for BISM nor try to get any more of MOLAP in my territory as the current roadmap would confuse me. BISM would not be completely mature in the Denali version, and MOLAP is under that threat of getting obsolete sooner or later. When someone invests millions into a product / technology, one wants to make sure that the product would continue to live till the next 8-10 years, though with newer blood induced through new releases/versions and medications in the form of service packs. As per the comment by Amir Netz on Chris's post, one statement says "Even with VertiPaq BISM introduced, MOLAP UDM is still used for all the high end heavy lifting" and one another statement says "VertiPaq BISM is advancing rapidly and is able to take on broader and broader workloads", and I translate it as, till VertiPaq BISM does not reach it's target, MOLAP UDM would provide cover-fire. Considering performance of BISM and in-memory vertipaq demonstrated in the PASS summit, I do not think it would be later than SQL Server 2011 R2 kind of version, for BISM to replace MOLAP in terms of maturity. In terms of adoption, it really depends how it propagates into the client territory.

Developer community is quite concerned and showing mixed reactions to this evolution. But one another community that would be concerned with this evolution is the partner products community, like Panorama, WhereScape RED and others. In my views, one of the factors that keeps a technology survived is the ecosystem of partner products that are bound to it. If the concept of Cube, MOLAP, MDX etc are wiped out, even then upto an extent they would be survived by partner products by building a layer over BISM.

a) If customers would feel the need to have the development environment operate in the same way, ISVs would come out with products that act like a shell. DAX can be seen as a shell around MDX, like Windows GUI is a shell around the core OS APIs. For eg. Tools / Utilities such as MDX to DAX converter, MOLAP and BISM integrator, or a Panorama kind of environment which acts like a managing shell over powerpivot can come to rescue if BISM is not getting popular either with developers / customers. Also MS can provide additional features in MOLAP that compensates / simulates what is available in BISM environment, and even if MS won't, ISVs would pick up this business if there is a strong requirement from the customers.

b) And if customers would find the new evolution of BISM tempting enough to upgrade / abandon the old MOLAP , in that case we were always running through a risk of this change and it was just a matter of time when this change would occur. We would be lucky even in this case as Microsoft has catered this required change, instead of a non-microsoft BI platform vendor.

Complex Event Processing, Columnar databases, Cloud computing etc are not new concepts and it's only that Microsoft is expanding it's BI territory now and introducing these players one by one in the form of StreamInsight, Project Apollo, Windows Azure and SQL Azure etc. BISM might be a new mixture of various old and new concepts, but if it outperforms MOLAP performance, it deserves to be the new leader in OLAP space replacing MOLAP.

I am of the strong opinion that MOLAP and probably MDX too would go obsolete sooner or later, but those who fear extinction of their jobs due to this evolution, I would say that there is no reason for the same. Even if Microsoft undergoes a drastic change, partner product community would find a business opportunity in this event, and would exploit the business of rescuing legacy customers, effectively jobs for SSAS developers. It is a similar kind of threat that DBAs might be feeling with the popularity of cloud, but a new generation of computing requires a new generation of DBA. If you feel that SSAS, MOLAP and MDX were too hard for relational brains and your job has been secure due to the scarce availability of developers who practice this skill than the quality of your consulting in providing data warehousing solutions, I feel that BISM can be the kick that might wake you up from this sedative.

Thursday, November 11, 2010

SQL Server Denali - Boost to SSIS career, Start of End to SSAS and MDX career?

I'm reading: SQL Server Denali - Boost to SSIS career, Start of End to SSAS and MDX career?Tweet this !
I am not fortunate enough to financially afford travelling all the way to US to attend PASS summit, but fortunately SQL Server blogsphere is quite vibrant to broadcast regular updates from the summit, so professionals like me can receive regular updates. SSIS is definitely being added more capacities and the enhancements related to data quality is definitely going to increase the applicability of SSIS in a broader range of solutions. On the other hand, this post from Chris which shows the SSAS roadmap shared by Microsoft, makes all the SSAS professionals quite concerned. I would say that, with microsoft, mutation never stops, and the mutation from SSAS Cubes and MDX to BI Semantic Model (BISM) is quite huge as it changes the way we as developers have looked at SSAS right from version 2005. And those who have got an expert level knowledge in SSAS are going to feel more pain at this moment, rather than the novice ones.

One of the investments principles of Warren Buffet says that "Don't put all your eggs in one basket", and this investment theory applies not only to finance, but also to the skills that define our career. The emotion that Chris shares due to the mutation that he can envision in the career of SSAS professionals as a direct effect of BISM, I am able to relate myself with this emotion very well as I have gone through the same sentiments in my career long back. I started my career as a Visual Basic 5/6 programmer. I spent quite a lot of time hopping from one version to another version of VB. Suddenly one day Microsoft .Net was announced, and I felt like "All hell broke loose". I managed to recover and adapted myself to the mutation, instead of a smooth transition, as VB and .Net are two different worlds.

The lesson I took from this experience is never to depend upon one skill set as neither it would help me become a successful architect, nor it would keep my career safe and stable on Microsoft platform. I executed a project as a .NET Project Lead long back and I embraced MS BI with the advent of SQL Server 2005 right from it's CTP. Being in MS BI, I have ensured that I gain experience on different projects which brings me diversified experience on SSIS, SSAS and SSRS with my expertise on SSIS. I have worked as an ETL Lead using SSIS, as a Senior Developer using SSAS and MDX, as a Senior Developer on Performancepoint, and SSRS has been a garnishing ingredient in most of the projects that I have worked. And though being deeply submerged in MS BI, I keep myself updated on .NET, Cloud computing technologies like Windows Azure and SQL Azure, to the level that if I need to board this ship in a catastrophic situation, this option remains open for me. As a by product of this exercise, I gain a deep insight into the end-to-end implementation of a solution involving .net application, SQL Server based OLTP and MS BI components, Sharepoint based dashboards and much more.

I feel that we should take this change constructively, and start aligning our careers in the right direction. And thanks to Microsoft that they have shared this roadmap quite early, so that we have enough time to position ourselves and our clients in the right direction.

Wednesday, November 10, 2010

Project Atlanta and Quest Software's Azure Services : Cloud based services related to SQL Server

I'm reading: Project Atlanta and Quest Software's Azure Services : Cloud based services related to SQL ServerTweet this !
Entire blogosphere related to SQL Server is cheering the release of SQL Server Denali CTP1, and everyone seems to be in a race to figure out the new features in this CTP. Some of the road blocks for developers to test this CTP are limited supported operating systems and number of issues during installation. SSIS seems to be loaded with a lot of enhancements with some great sounding support for data quality with new tools and controls.

Coming to the topic of this post, after the availability of Database Manager for SQL Azure, SQL Azure Reporting is one step forward in bringing SaaS model in SQL Server ecosystem. Project Atlanta and Quest Software's Azure Services are two of the cloud based SQL Server related SaaS / PaaS (as I have not analyzed it thoroughly) services.

Microsoft's official description for Project Atlanta is as follows: "Microsoft Codename Atlanta is a cloud-based service that analyzes installations of Microsoft SQL Server 2008 (and later) and provides proactive alerts to help you avoid downtime and adhere to Microsoft's best practices. Detection of potential issues is based on knowledge provided by experienced Microsoft Support engineers who keep it up to date to leverage the most recent experience in supporting SQL Server deployments around the world". Though it sounds very exciting, there is another dimension to this offering, which Brent has described in his
post. Do not miss the conversation between Brent and the Program Manager of Project Atlanta. This is one case study where one can take few lessons, that can be helpful in developing proprietary solution offerings on similar lines.

Seems like Quest software already had the same idea in mind. An excerpt about the video where executives from Quest Software shares their views and perspectives about their cloud based IT services on Windows Azure platform in as follows: "Quest is an early adopter of the Windows Azure platform. They’ve built out an extensive services framework as well as a few initial service offerings that sits on top of Windows Azure Framework and Windows Identity Framework. One of the key elements in the design of Quest’s framework is secure communication and authentication between all the service components and layers, whereby encryption is based on certificates stored within the dedicated certificate store provided by Windows Azure". This is one another case study from which lessons can be taken for your future solution offering on Windows Azure / SQL Azure platform.

If you analyze the trend of the wind, you can easily make out the SQL Server product vendors are gearing up to provide cloud based solutions, some of which might be effective and some might be not. From a solution provider point of view, when your client might request a PaaS / SaaS solution on Windows Azure or SQL Azure, case studies like these can provide good insight from an architecture as well as designing of solution offering standpoint. I am wondering what would be Red Gate 's answer to these services, as SQL Toolsbelt might not be sufficient when cloud comes into picture.

Tuesday, November 09, 2010

Download SQL Server Denali CTP1

I'm reading: Download SQL Server Denali CTP1Tweet this !
Next version of SQL Server after 2008 R2 has started taking it's baby steps in the Microsoft parlance. This version is called SQL Server Denali as of now, and the first CTP is available for download. You can download the same from here.

From a business intelligence perspective, SSIS is expected to be launched with a couple of improvements in SSIS toolset as well as SSIS designer in BIDS. A new reporting client for SQL Server is also expected. To unlock the suspense of what's new in SQL Server Denali, the only way is to download, install and check out what's new in this version.

In my views, my curiosity to check out some of the enhancements in my expectation are as follows:

1) Is there any improvement in SSMS from a spatial data support point of view?

2) Has the Reports Viewer control in SSMS been fixed to support SSRS 2008 and higher formats?

3) Has Silverlight support been added in SSRS?

4) What are the new enhancements in SSIS toolset as well as SSIS BIDS Designer?

5) Has better support now been provided in SSRS reports designer for reports debugging or we still need to use Fiddler?

6) Has any enhancements been provided for SSRS reports deployment or we still need to depend upon RSScripter?

7) Is there any improvement in SQL Server Agent to support better scheduling and triggering of services and instruction sets like SSIS packages, for example FileWatcher support in scheduler?

8) SSAS reference relationship of dimension with measuregroups has the probability to induce corruption in aggregation due to triggering of processing through partition processing. Has this been improved ?

9) BIDS Helper covers lots of gaps that BIDS does not provide. Has this support been incorporated as a part of BIDS?

10) Freeware utilities like SSMS Toolspack, SQL Search Add-in from Redgate, and DB Pro in Visual Studio are some of the distinct utilities that DB professionals require a lot during development. Has SSMS managed to spare us from using any of these utilities?

The above questions are the rapid-fire 10 questions that came out of my mind while drafting this post. There are many more curious questions popping up at different corners of my mind, and I am going to figure it out myself by checking out Denali. In case if you need food for your curiosity, I suggest doing the same.

Saturday, November 06, 2010

Non MS BI Data Sources in a BI or MS BI solution

I'm reading: Non MS BI Data Sources in a BI or MS BI solutionTweet this !
When someone tells me that integration is quite easy, you just pick up a source, a destination perform import-export, and you are done. I generally reply with a smile. Many products declare that they support reading data from almost any industry standard database / data formats. But sometimes one thought must have struck all of us that is there any standard list of such databases or data file formats. Being a MS BI / Microsoft platform professional, I know and have worked with database like SQL Server and data file formats like MS Access, Excel, XML, RSS Feeds etc. I have even worked with data sources like Oracle and SAP, when we have a solution that involves tools and/or technologies from various providers like Microsoft, IBM, Oracle etc.

Keeping a tab on competition is my characteristic as well as need of my profession, and any of us who have been working with solution provider organizations must have come across situations where we need to use MS BI with non microsoft databases or data-sources. Here is a list compiled by me, and though it's not exhaustive, it provides a large slice of the entire ecosystem. Please feel free to add to it to make the list complete. Whether small or big, each item on the below list has it's own market share in it's own platform with it's target user base. This list is quite handy to have, when some product vendor says that they support all industry standard databases or when someone says that integration / migration of data from source A to destination B is a play.

1) SQL Server
2) Oracle
3) IBM DB2
4) MySQL
5) Sybase
6) Informix
7) PostgreSQL
8) IBM UniVerse
9) IBM Unidata
10) Progress
11) FireBird
12) Interbase
13) Paradox
14) Ingres
15) Mimer SQL
16) Pervasive
17) SQLBase
18) Intersystems Cache
19) Teradata
20) DBMaker
21) Valentina
22) Excel
23) Access
24) Visual FoxPro
25) SQLite
26) FileMaker
27) Sharepoint (Document lists and libraries)
28) SAP
29) IBM AS/400

Thursday, November 04, 2010

Data storage on Windows Azure and SQL Azure

I'm reading: Data storage on Windows Azure and SQL AzureTweet this !
Windows Azure, SQL Azure, Blob storage, Local storage, Table storage, Queue storage, web role, worker role, Content Delivery Network, Dev Sync etc.. are some of the buzz words which indicates that there are Azure bees around you. If you are not keeping the pace with the advancements that are constantly being made available on this platform, it might feel like a very tough time when you are faced with a situation when you need to develop a POC or an architecture design that works on Azure platform. Fortunately there are a lot of whitepapers and MSDN library that can help you to get started.

But the first thing that clicks my mind when I start the architecture / solution design, is with some basic questions like, how will I capture data and where will I store data. Maintaining session state, implementing security, auditing, logging, error control and reporting, need for a collaboration platform, scalability and performance considerations, migration considerations etc.. are all vertical placeholders in a architecture design than runs across the solution. But when I need to figure out my horizontal figures, my source and destination are the basic blocks. If you have figured out the data storage as Windows Azure either as a destination / staging location / hosting platform, the answer for data storage is not as simple as "enterprise edition of 64-bit SQL Server", whether it's a BI solution where you might use Azure as a temporary / permanent staging location or an Application solution where you might use Azure in place of on-premise SQL Server. With cloud this gets quite more complex, and factors like pricing and storage capacity comes into consideration.

Recently there's a very nice article published on technet, that is worth keeping in your arsenal as a part of intelligence building exercise on Windows Azure platform, to tackle your first encounter with Windows Azure environment. This article provides nice overview on data storage offerings on Windows Azure platform, and the same can be read from here.

Wednesday, November 03, 2010

SSIS Warning Framework versus SSAS Warning Framework

I'm reading: SSIS Warning Framework versus SSAS Warning FrameworkTweet this !
Most of us would have seen tons of warnings in the execution results / progress tab of the SSIS package designer, when we would have adopted a lazy development methodology. For example, select all the columns from a table and do not use it, and execute the package. You should see a lot of warning in the progress tab. Of course it's a good sign, but one side effect of it is that it overloads the progress tab with tons of warnings which makes the progress tab overpopulated. Many a times it also happens that one might be aware of the design, but there is no easy way to skip from the warnings. Undo in SSIS package, equivalent of SSAS "Deployment Server Edition" property and Warning Infrastructure similar to SSAS are some of the improvements required in SSIS designer environment, in my views. Undo is hear to be available in the next version, but rest of the two are doubtful to be available.

SSAS designer environment provides great control over the warning framework. This was not that great in 2005 version, but with the enhancements available in 2008 version, the control is completely of a professional level. In fact by placing the control of warning framework in the hands of the developer, SSAS team has almost eliminated the need for a best practices analyzer kind of tool for SSAS. If you edit the SSAS database and check out the "Warnings" tab, you can see all the design time best practices applied, understand the same as well as control which design time decisions you want to apply in your solution and want to see warnings in violation of the same. This control keeps a check on the number of warnings that pops-up during design time, and instead of missing vital warnings in the heap of warnings collection, a developer can actually start treating warnings with same importance as one would treat an error message. If this kind of support is available in SSAS, I do feel that it should not be that hard for SSIS development team to provide the same level of support in SSIS development environment.

With these kind of support, this would make the progress report more clean and controlled. I hope someone from Microsoft SSIS development team gets an eye on this post, though it would clearly reflect my greedy demands :) BTW, I wish all my blog readers a very Happy & Prosperous Diwali ( Festival of Lights celebrated in India ).

Sunday, October 31, 2010

SQL Azure Reporting versus SSRS : Challenges in cloud based reporting

I'm reading: SQL Azure Reporting versus SSRS : Challenges in cloud based reportingTweet this !
SQL Azure Reporting is soon expected to be available on Azure platform, and as the name suggests, it is expected to cater reporting out of SQL Azure. A nice comparison of cloud based SQL Azure reporting and on-premises SQL Server Reporting Services can be read from this page.

Nice part about this cloud based reporting is that one can author report from cloud and host it on the cloud. But when authoring is done on the cloud, the only data source available would be SQL Azure. This makes the scope quite limited for the use of such reports. Earlier too, it was possible to create reports out of SQL Azure databases using SSRS, and one of my popular articles on how to implement the same can be read from here. Instead of being a PaaS / SaaS solution, I see it just as an extension to SQL Azure feature set, something similar to export data functionality. Support for reporting from on-premise data using WCF / RIA services would have made this reporting solution much admirable.

It's also being heard on a few blogs, that one can now wash away their hands from maintaining the reporting infrastructure as reporting is available on cloud, but there are a lot of questions before this statement. In my views, there must be some exceptional reasons to implement a reporting infrastructure for a cloud based database. Instead of it, the best solution is a PaaS / SaaS reporting solution.

Probably it's too early to make any comments right now, and you can get more information from this PDC video. In summary the good news is that it's the first ray of hope to see BI related features and services on cloud. You can sign up for the SQL Azure Reporting CTP from here.

Saturday, October 30, 2010

Windows Azure Market Place ( Data Market and App Market ) - Potential jobs for BI Tools

I'm reading: Windows Azure Market Place ( Data Market and App Market ) - Potential jobs for BI ToolsTweet this !
Windows Azure Market Place is a cloud based service offering on the Azure platform from Microsoft . For those who are completely unaware of Azure, I can say that I find it synonymous to Amazon Web Services in the making. Services available or to-be available on this platform can be divided in two parts: Data Market and App Market.

Data Market which was formerly know as Microsoft Codename "Dallas" is now commercially available, and this announcement was made very recently. Also many other significant announcements for the Windows Azure Market Place, which can be read from
here. App Market which is another section of services related to application development and hosting would be commercially released, probably early next year.

My area of interest revolves around data market. Data Market being a platform of data / content providers is a potential future candidate to be considered as a vital data source on the cloud. Traditionally we were used to see data sources as only in-premise databases like SQL Server, Oracle, Teradata, DB2 and others. But with the advent of cloud, we also see SQL Azure as a data source today. In my views if this platform evolves considerably, there is a bright scope that this is the probable new data source on the cloud.

Also this opens up jobs for MS BI Tools, especially SSIS and SSRS. I definitely see a requirement to merge data from different data providers that one might have subscribed, and create reports which is facilitated by Excel / Excel Add-In presently, can be replaced by SSRS. Data access from this platform is facilitated through REST APIs or OData, and MS BI Tools lack this support. But by the time DataMarket gains adoption as well as maturity from partners and consumers, we can expect MS BI tools to provide support for the same. Powerpivot is already aligned by Microsoft to serve the data analysis needs. Interestingly Tableau which is one of the leaders in data analysis software with it's huge range of data visualizations, is one of the technology partners and is providing support for data analysis from DataMarket. Check out the announcement of integration between Tableau and DataMarket from here.

To have a first look at the DataMarket section of Windows Azure Market Place, check out this video.

Wednesday, October 27, 2010

Use of SQL Server 2008 R2 Express Edition with Microsoft and Non-Microsoft BI platforms

I'm reading: Use of SQL Server 2008 R2 Express Edition with Microsoft and Non-Microsoft BI platformsTweet this !
SQL Server Express Edition is one of the most admired databases for BI platforms. Before a couple of years, I would not have been able to make this statement, but through the due course of experiences of my career, I can make this statement quite confidently today. Non only on microsoft platform, but non-microsoft based BI solutions are quite happy to include express edition as one of the constituents of a BI solution for either housekeeping tasks or to catalyze data / data processing. Some of the interesting use of express editions are listed below.

1) Proof Of Concept development: During the POC phase, projects are least likely to procure licenses. One of the first benefits of express edition is that it's license free for use in development as well as production, and does not come with a time bomb in it. As far as programmability features are concerned, it comes with almost all the features that an enterprise edition contains. So if the requirement is just a database container, express edition is one of the best starters on the menu.

2) Test data store: Instead of using resources on the server with higher editions of sql server, express edition can be used as a warehouse to store test data. When the development team requires test data during the development phase, they can always synchronize their local instances with test data from an express edition which can be treated as a test data server. This express instance can be handed over to a business analyst who would generally prefer to own a database free of complexity.

3) Metadata provider: Often many legacy systems or systems like SAP or other CRM systems emit out data in text / CSV files. Now these files many times contain non-printable characters or other junk characters, and the issue occurs during the import process. As a part of structural testing, legacy data files can be imported into express instance which can be installed at the source. Importing full / sample data into express edition can provide concrete metadata structure to the data (which does not exist in text / csv files) as well as test the import process. This can be done right at the source free of cost with express edition, making ETL processes and data migration more smooth. Many non-microsoft platforms would consider express edition as a tempting choice for purposes similar to these.

4) Temporary Staging: SQL Server 2008 R2 express edition has the database size limit of 10GB. This is a reasonable size for considering express instance as a temporary staging server in your ETL solution to stage master and/or transactional data.

5) BI Housekeeping: A production / UAT / QA server DBA would not allow external applications to connect to these servers. A project might want to retain a repository of execution and performance data history, and express edition is one of the best choice for such purposes. One can use any number of analysis / reporting tools on this edition without any security concerns.

Above mentioned are some of the uses of express edition in a BI solution development environment. 10GB DB size, SSMS, almost full DB programmability features, and license free production use are the major features of express edition. Keeping these features in view, you can consider SQL Server Express edition to economically catalyze your BI solution development.

Monday, October 25, 2010

Data Source View versus View in Data Source for SSIS , SSAS and SSRS

I'm reading: Data Source View versus View in Data Source for SSIS , SSAS and SSRSTweet this !
There is a vast difference between data source views and views in data source. We all recognize data source views i.e. DSV as one of the means to means to implement UDM. But I am of the opinion that when views in data source are not feasible to implement, one should resort to using DSVs. By views in data source, I mean that if you are using SQL Server as your OLTP data source, create views in SQL Server instead of creating lot of named queries / calculated columns using DSVs. Some of the advantages of using views in SQL Server for ETL / Cube Design / Reporting are as below:

1) Control and Filter over metadata: Mostly in SSIS a lazy implementation involves directly reading all the columns from a table, and tons of warnings in the progress tab as most of those columns are never used. With views you can always place a filter over the required number of columns, adjust data type casting as per requirement, and manipulate the metadata as required.

2) SQL advantage: With views in your database, you have entire SQL programming power at your disposal providing you the ability to create simple to most complex calculations. Also conditional logic, default values, null checks and many other validation and configuration can be placed at this level.

3) Adaptable to change: In case if the structure of your tables (fact / dimension) in your relational data warehouse change, or the structure of your OLTP source tables have changed, views can shield your ETL / Cube data source as well as save the solution from any kind of changes which would be required in case of DSV. Views can even help you to craft your snowflake schema into a star schema.

4) Transparency and Maintainability: If a new developer steps-in to maintain and enhance your legacy solution, swimming in your DSV to figure out the calculations can be a big challenge in itself. On the other side, analyzing a SQL select query can be much easy to figure out the calculations than analyzing each calculated column / named query in a DSV. Also a view can be used as a part of metadata / dependency analysis with SQL Server or other database analysis tools.

I do not intend to mean that DSVs are useless and views are the best. But a right balance between views in a data source and data source views (DSVs) can make your solution more robust and developer friendly than blindly using DSVs.
Related Posts with Thumbnails