Thursday, 31 December 2009

Nearly the 2nd decade of the 21st century but…

So much for the technological age.  It’s nearly 2010 and I need to make a payment that will arrive in a bank account the next working day.  So I phone up the bank, who inform me that it has to be made by CHAPS at a cost of £25.  Which is a tad annoying given that light travels around the world about 7 times a second and a typical enterprise database can serve several thousand concurrent users on a single processor, but obviously my maths is flawed.

Anyway, all they need for this transaction to take place is a fax.  Excuse me?!?!?  It’s the 21st frickin’ century!!  Why do I need to send a fax?  Surely email is easier?  It’s not like it’s even more secure!  I don’t have a fax machine and none of my PCs have telephone modems in them because it’s like…er…the 21st century and all that - I have ADSL and very quick it is too.  So I look for a fax by email service that doesn’t require me to sign up to a contract or monthly fee but to no avail.  I only want to send a 1 page fax for goodness sake!

So next time my bank asks me for information perhaps I should say something like “yeah, that’s fine.  Just send it to me on chiselled stone or Papyrus scrolls in ink made from Dodo blood”.

Friday, 18 December 2009

Maintainable code and Boolean logic

After months of doing BI-related stuff, I’ve found myself doing ‘proper’ coding again – specifically Windows Forms based work in C# working against a WCF data source. Other than the realisation that Forms seems primitive compared to WPF, I’ve found myself referring back to Juval Lowy’s excellent tome “Programming .NET Components” which includes the IDesign C# coding standard. This is the definitive .NET coding standard and one I thoroughly recommend adopting (if you haven’t done so already), however it does contain some guidance whose value cannot be fully appreciated until one starts to follow it. In particular I’m thinking of rule 30 in the “Coding Practices” section:

“Avoid function calls in Boolean conditional statements. Assign into local variables and check on them”

Among other things this makes it much easier this makes to debug nested conditional statements, but it’s also helped me to realise the value of using Boolean functions. By factoring out as much code as possible to such functions, code becomes self-descriptive. To illustrate this point, look at the following code:

if (
      !(lastRowRequested <= (m_CentrePoint + (m_ToleranceWindow / 2))
      &&
      lastRowRequested >= (m_CentrePoint + (m_ToleranceWindow / 2)))
  )
{
  if (
          lastRowRequested > GetAbsoluteHalfwayPoint()
          &&
          (m_Cache.Count + m_FirstRowAbsoluteIndex)!=m_TotalCount
      )
  {
      // Do something
  }
  elseif
      (
          lastRowRequested < GetAbsoluteHalfwayPoint()
          &&
          m_FirstRowAbsoluteIndex != 0
      )
  {
      // Do something
  }
  else
  {
      // Do something else
  }
}

It’s part of a cache manager component that keeps a ‘window’ in memory on a much larger dataset. Knowing that, does the above code make sense? It certainly doesn’t to me and I wrote it! At this point I refer to rule 8 of “Coding Practices”:

“Avoid comments that explain the obvious. Code should be self-explanatory. Good code with readable variable and method names should not require comments.”

Now this is one of the many potentially controversial rules in the standard, but only because some people take it to mean “don’t write comments”. What it is actually saying is “don’t state the bleedin’ obvious”. How can we rewrite the above code to give it context and clarity? Something like this:

bool isInToleranceWindow = IsInToleranceWindow(lastRowRequested);

if (!isInToleranceWindow)
{
  bool isInBottomHalfOfCache = IsInBottomHalfOfCache(lastRowRequested);
  bool isInTopHalfOfCache = IsInTopHalfOfCache(lastRowRequested);
  bool isFirstRowOfDatasetInCache = IsFirstRowOfDatasetInCache;
  bool isLastRowOfDatasetInCache = IsLastRowOfDatasetInCache;

  if (isInBottomHalfOfCache && !isLastRowOfDatasetInCache)
  {
      // Do something
  }
  elseif (isInTopHalfOfCache  && !isFirstRowOfDatasetInCache )
  {
      // Do something
  }
  else
  {
      // Do something else
  }
}

Now the beauty of this will be evident to any BI practitioners out there, in that it gives a high-level overview of what’s going on but (thanks to the F12 key in Visual Studio) one can drill down to a lower level of detail if required (i.e. look at the function definitions). I realise that this is a trivial example, but it does demonstrate how refactoring code can make it easier to understand (and maintain).

Incidentally, as an antithesis to coding standards there is always the obfuscator’s bible, whose adherents I have to thank for providing me with a good living for several years.

Tuesday, 15 December 2009

Custom paging with MDX in SSRS

As I’ve mentioned before, one of the things I really like about QlikView is scrollbars on the bar charts. Too much data to display on the screen at once? Not a problem – you can scroll the axes! Now this can be sort-of-replicated in SSRS by using dynamically sized charts and hosting the reports in a dashboard-friendly environment like SharePoint, but can require a bit more custom work if you’re either using SSRS Report Manager or are hosting in a single ASP.NET Report Viewer.

I came across such a problem recently and decided to use custom paging. Given that I’ve quite successfully implemented before in reports based on a T-SQL source (but not an MDX one), the T-SQL solution seemed a fairly good place to start. Basically you need 2 data sources – 1 for the pager and another for the actual data being displayed. Returning data in pages is a common requirement for ASP.NET developers (although less so since the advent of LINQ) so I’ll point you to this article for more information. The SSRS T-SQL pager requires that you:

  1. Include a page parameter in your report
  2. Have a page size
  3. Calculate how many pages are in the data
  4. Return a table of sequential numbers from 1 up to the amount of pages

Calculating the number of pages is pretty trivial (assuming that decimals are rounded up):

Number of pages = Dataset size / page size
(e.g. for a dataset of 150 rows with a page size of 10, the number of pages would be 15)

So to populate the pager with page numbers we need some T-SQL that will return the numbers 1 to 15. We could use a loop but that would be very ugly – a better alternative is to use Common Table Expressions as you can use recursion to create an incremental list. To cut a long story short, here’s some T-SQL that shows it (without explaining why):

DECLARE @StartNumber int, @EndNumber int;

SET @StartNumber = 1;
SET @EndNumber = 15;

WITH PagerNumbers
(
 PageNumber
)
AS
(
 SELECT
     @StartNumber [PageNumber]
 UNION ALL
 SELECT
     X.PageNumber + 1
 FROM
     PagerNumbers X
 WHERE
     PageNumber < @EndNumber
)
SELECT
 *
FROM
 PagerNumbers
OPTION
(
   MAXRECURSION 0
);

Set that as the data source for a table control where each cell hyperlinks through to the respective page in the report and that’s about it (actually there’s a few less-than-minor implementation details to be taken care of but you get the idea). What we really want to know is how to do the same thing easily in MDX? The answer lies in the RANK function. The principle is simple:

  • Use RANK to give each row in the entire dataset a unique (sequential) number
  • Return only the first row for each page from the dataset
  • Er…that’s it

We also need to know the total number of records in the dataset, as it may not be divisible exactly by the page size – which would lead to the number of the last page not being included. To illustrate this point, consider a dataset with 153 items and a page size of 10 items. If we return all rows where:

row number MOD page size = 0

then we will get 15 pages back where there should be 16 (the last page will contain rows 151-153). To get round this we need to make sure that the last item is displayed even if it doesn’t divide exactly by the page size. In the MDX below (based on the AdventureWorks cube), this can be seen in the use of the PagePoint and PageNumber measures – which are then used in combination with the Round function to work out if the item is the last one in the dataset:

WITH
 SET OrderedActivity
     AS
     Order
     (
         NonEmpty
         (
             StrToSet('[Product].[Product].[Product].ALLMEMBERS'),
             [Measures].[Internet Sales Amount]
         ),
         [Measures].[Internet Sales Amount],
         BDESC
     )
 MEMBER [Measures].[Row rank]
     AS
     Rank
     (
         StrToMember('[Product].[Product]').HIERARCHY.CURRENTMEMBER,
         OrderedActivity
     )
 MEMBER [Measures].[MaxPage]
     AS
     Max
     (
         OrderedActivity,
         [Measures].[Row rank]
     )
 MEMBER [Measures].[PagePoint]
     AS
     [Measures].[Row rank] / StrToValue(10)     
 MEMBER [Measures].[PageNumber]
     AS
     IIf
     (
         [Measures].[Row rank]=[Measures].[MaxPage],
         Round
         (
             (
                 [Measures].[PagePoint] + 0.5
             )
         ),
         (
             [Measures].[PagePoint]
         )
     )
SELECT
 {
     [Measures].[Row rank],
     [Measures].[PageNumber],
     [Measures].[MaxPage]
 } ON COLUMNS,
 Filter
 (
     Order
     (
         NonEmpty
         (
             {
                 StrToSet('[Product].[Product].[Product].ALLMEMBERS')
             },
             [Measures].[Internet Sales Amount]
         ),
         [Row rank],
         ASC
     ),
     [PageNumber] = Round([PageNumber])
 ) ON ROWS
FROM
 [Adventure Works]

Incidentally, the Round function is actually a VBA function that is in the VBAMDX assembly that should be installed by default with SSAS (it definitely is with 2008 but I haven’t checked previous versions).

Also note that I’ve used StrToxxx functions – this is to highlight the places where variables can be used in lieu of the literal strings so that this works generically in SSRS (the StrToValue value in the PagePoint measure is the page size). Finally, note the use of the NonEmpty() function in the main SELECT statement – if the dataset that you’re paging through doesn’t remove empty values then don’t remove them here.

So that’s the pager, which in combination with the SubSet() function will facilitate custom paging. As a general purpose pager it will need some tweaking as it doesn’t scale very well (try running it against the Customer dimension in AdventureWorks and you’ll see what I mean), but for limiting the amount of data displayed on a chart it works.

I’ve created a sample report based on AdventureWorks which can be downloaded from my SkyDrive (note: you will need to change the connection to point at your Analysis Server if it isn’t localhost).