Wednesday, 15 August 2012

Programmatically executing packages in the SSIS 2012 Catalog

Update: Have posted a solution to the execution timeout problem.

After about a year of intending to get to grips with the pre-release versions of SQL Server 2012 (Denali) and failing to do so, I was fortunate enough to land a contract where I had the option to use it from my first day across most of the stack (SSAS, SSRS, SSIS, and the database engine).  Always happy to fill my CV, I jumped at the chance and have been having a jolly good time since.  As part of the project I’ve developed some SSIS packages which my customer wants included in their continuous integration pipeline.  I’ve never done CI or formal unit testing on ETL before because I develop faultless software solutions it’s never really felt as mature as it is in application frameworks (e.g. MVC, WPF). The CI pipeline is based on SpecFlow and nUnit, so I need to be able to run SSIS packages programmatically and get meaningful outputs that I can run assertions against. Specifically I want to know:

  • Did it run successfully?
  • If not, why not?
Traditionally if I wanted to run SSIS programmatically then I’d use the Microsoft.SqlServer.Dts.Runtime namespace.  However if you use the Project Deployment model then you have to use the Catalog (or else Project Parameters won’t work), which comes with a new object model API which it has taken me some time and much head scratching to fathom.  I’ve been using the Project Deployment model with the SSIS Catalog because, quite frankly, I’d be a fool not to. Its merits are expounded elsewhere, but suffice to say it offers much greater control and flexibility over configuration with considerably less effort.
The new API is in the Microsoft.SqlServer.Management.IntegrationServices.dll assembly - but the first problem is where to find it.  It is in the GAC, but it doesn’t come up as a global assembly in the VS2010 Add Reference dialog so you’ll need to browse to it:
addreference
From the Add Reference dialog, click ‘Browse’ and navigate to %WINDIR%\assembly\GAC_MSIL\Microsoft.SqlServer.Management.IntegrationServices:
selectfile
There should be just 1 folder (11.0.0.0__89845dcd8080cc91), but there may be more in future versions, so select the latest folder and then select the DLL that should be inside it (Microsoft.SqlServer.Management.IntegrationServices.dll).  Back in the Add Reference dialog, click ‘Add’.  Before closing the dialog, add references to the following assemblies (which do appear in the GAC list):
  • Microsoft.SqlServer.ConnectionInfo
  • Microsoft.SqlServer.Management.Sdk.Sfc
  • Microsoft.SqlServer.Smo
To start with you will need to establish a connection to the SQL Server then the SSIS server:
var connection = new SqlConnection(@"Data Source=(local);Initial Catalog=master;Integrated Security=SSPI;");
var integrationServices = new IntegrationServices(connection);
Then you need to get a handle to the package.  Packages are stored in projects, which must be located in a folder, which will be in a catalog.  So the package can be gotten thus:
var package = integrationServices
    .Catalogs["MyCatalog"]
    .Folders["MyFolder"]
    .Projects["MyProject"]
    .Packages["MyPackage"];
It’s worth pointing out that each collection has a Contains() method.  This allows you to verify whether or not the catalog, folder, project, or package exists.  So you could rewrite the above code more defensively:
if (integrationServices.Catalogs.Contains("MyCatalog"))
{
    var catalog = integrationServices.Catalogs["MyCatalog"];

    if (catalog.Folders.Contains("MyFolder"))
    {
        var folder = catalog.Folders["MyFolder"];

        if (folder.Projects.Contains("MyProject"))
        {
            var project = folder.Projects["MyProject"];

            if (project.Packages.Contains("MyPackage"))
            {
                var package = project.Packages["MyPackage"];
            }
        }
    }
}
Once you have a handle to the package you can just go ahead and execute:
long executionIdentifier = package.Execute(false, null);
The return value is an execution identifier which relates back to that particular request to run the package.  The SSIS server will quite happily let you run multiple instances of the same package concurrently – so to get feedback on an executing package you need to get a reference to the execution instance (rather than the package):
var execution = catalog.Executions[executionIdentifier];
An interesting point here (and a difference from the Execute method in Microsoft.SqlServer.ManagedDTS assembly) is that, by default, execution is asynchronous.  This is not especially desirable in a testing scenario where one would like control to be returned once the package has finished executing.  So with a little help from Davide Mauri’s blog post I was able to deduce that the Execute method could be made synchronous by using the overload that includes a collection of ExecutionValueParameterSet objects:
var setValueParameters = new Collection<PackageInfo.ExecutionValueParameterSet>();
setValueParameters.Add(new PackageInfo.ExecutionValueParameterSet
{
    ObjectType = 50,
    ParameterName = "SYNCHRONIZED",
    ParameterValue = 1
});

long executionIdentifier = package.Execute(false, null, setValueParameters);

So once execution is complete, the outcome data can be gleaned from the Execution object (which is of type ExecutionOperation).  All messages generated during execution will be put in the Messages property which is a collection of OperationMessage objects - likewise the status will be in the Status property.  As the status is an enumeration of obvious values I won’t go into it here.


The messages are slightly different.  Back in the ManagedDTS days there was an errors collection - not so here.  If you just want the errors then you’ll need to filter the list of messages by MessageType.  Unfortunately this isn’t an enumeration so by a process of deduction and assumption I’ve established that warnings and errors have message types of 110 and 120 (although not necessarily in that order).  So to get the error and warning messages, you can use LINQ:

var errors = execution.Messages.Where(
    m => m.MessageType == 120 || m.MessageType == 110)
    .Select(m => m.Message);
Once you get to grips with the SSIS 2012 catalog you’ll not look back.  There’s a whole host of stuff I haven’t covered here (e.g. environment configurations) but this should be enough to move over from the ‘old’ way of doing things with a reasonable degree of confidence.

25 comments:

  1. i call my child pkgs using a SCRIPT TASK and adding the EVENTS like OnError event to the child package on the fly while i was calling it, something like .....

    Dim pkg As Package = app.LoadPackage(ChildPkgPathAndFileName, Nothing)
    .
    .
    Dim exec1 As Executable = Pkg.EventHandlers.Item("OnError").Executables.Add("STOCK:SQLTask")
    .....
    can i do this while i have my child package in the CATALOG and call it on the fly and add the events?
    my email is SNikkhah at live dot ca

    ReplyDelete
    Replies
    1. There is an Alter() method on the PackageInfo object which might be of some use, but I've never had cause to change packages on the fly so can't help you with that one.

      Delete
  2. I am work with VB.Net and i can't set the section that you mentioned about the package
    var package = integrationServices .Catalogs["MyCatalog"] .Folders["MyFolder"] .Projects["MyProject"] .Packages["MyPackage"];

    I have the IF statment working but cant get the above and the ....
    long executionIdentifier = package.Execute(false, null);
    .... running , how can you set them in vb.Net or C#

    ReplyDelete
  3. HI i got the "Package" by using ......
    Package = integrationServices.Catalogs(strCatalogName).Folders(strFolderName).Projects(strProjectName).Packages(strPackageName)
    .... and the execution by ......
    executionIdentifier = Package.Execute(False, Nothing)
    ... now i want to set the "Project variable" , "Pkg variable" , and set the "SYNCHRONIZED" settings and i cant do that how can you setthe parameters?
    thanks

    ReplyDelete
  4. Here's the VB version:

    Dim connection As New SqlClient.SqlConnection("Data Source=(local);Initial Catalog=master;Integrated Security=SSPI;")
    Dim integrationServices As New Microsoft.SqlServer.Management.IntegrationServices.IntegrationServices(connection)
    Dim package As Microsoft.SqlServer.Management.IntegrationServices.PackageInfo
    Dim executionIdentifier As Long
    Dim setValueParameters As New System.Collections.ObjectModel.Collection(Of _
    Microsoft.SqlServer.Management.IntegrationServices.PackageInfo.ExecutionValueParameterSet)
    Dim setValueParameter As New Microsoft.SqlServer.Management.IntegrationServices.PackageInfo.ExecutionValueParameterSet
    Dim execution As Microsoft.SqlServer.Management.IntegrationServices.ExecutionOperation

    setValueParameter.ObjectType = 50
    setValueParameter.ParameterName = "SYNCHRONIZED"
    setValueParameter.ParameterValue = 1

    setValueParameters.Add(setValueParameter)

    package = integrationServices.Catalogs("MyCatalog").Folders("MyFolder").Projects("MyProject").Packages("MyPackage")
    executionIdentifier = package.Execute(False, Nothing, setValueParameters)

    execution = integrationServices.Catalogs("MyCatalog").Executions(executionIdentifier)

    Dim errors = From e In execution.Messages _
    Where e.MessageType = 120 Or e.MessageType = 110
    Select e.Message

    ReplyDelete
  5. Hi, are you aware of how to change the CommandTimeout? Packages timeout after 30 seconds. There doesn't appear to be a way to set the overall timeout when executing a package in this way.

    ReplyDelete
    Replies
    1. Unfortunately I've not worked that out yet. The main reason I was doing this was to run test packs in a CI environment. None of the test packs took more than a few seconds so I never came across the 30 second timeout problem. I've tried the obvious such as setting the command timeout on the SqlConnection and the IntegrationServices objects, but with no success. If I do work it out I'll post it on my blog.

      Delete
    2. It uses the default ADO timeout of 30 seconds. I did implement this polling (VB.NET):

      executionId = package.Execute(use32BitRuntime, Nothing, setValueParameters)
      execution = integrationServices.Catalogs(task.Catalog).Executions(executionId)

      ' poll for completion
      While Not execution.Completed
      execution.Refresh()
      System.Threading.Thread.Sleep(3000)
      End While

      Delete
    3. I couldn't contain my disdain for polling (it's like a child continually asking "are we nearly there yet" for the whole of a journey) so I finally put together a fix of sorts. See my latest post

      Delete
  6. Any chance you solved the timeout issue? I'm working at it as well, and it is frustrating.

    Thanks.

    ReplyDelete
    Replies
    1. Sorry, haven't looked into it lately. It's on my list of things to revisit when I have a spare hour or three.

      Delete
    2. I encountered that same problem. To fix it I ended up writing my own (and extremely limited) version of the IntegrationServices API, in which I execute SQL stored procedures provided by the SSISDB database to create, run and stop executions. This way I'm in control of the SqlConnection and SqlCommand objects and can set the timeout as needed. It matches my needs as of now, but I was really hoping I could use the already provided class library. I find it really stupid that I had to reengineer the whole thing just because of the timeout issue.

      I tried the polling solution, but I faced a differnt problem; when I had multiple packages running simultaneously on different threads, and I tried refreshing the ssis object (ssis.Refresh()) to get the status of each execution, I got a "There's already an open DataReader for this Command that must be closed before using it." error. I couldn't work it out so I wrote my own solution.

      If you manage to solve it (or find something close to a solution, other than the polling thing) please let me know. I still find it hard to believe that the guys down at Microsoft overlooked something like this. To me, with the 30 second timeout, the library is totally useless.

      P.S. What's with all the "Do not reference this object directly in your code. It supports the SQL infrastructure." properties and methods?

      Delete
    3. Writing your own wrapper sounds like a good solution. The official one is a bit of a Friday afternoon product! Could you make it publicly available?

      Delete
    4. See my latest post for a fix of sorts.

      Delete
  7. How can I pass additional parameters to the package.
    For example. Connections ["ConfigDB"]. ConnectionString =

    ReplyDelete
  8. Best Excample I found for the 2012 approach but I am facing some problems with the authentification:
    I used the same connectionstring you did in your code "Data Source=myServer\myIntance;Initial Catalog=master;Integrated Security=SSPI;" an expected the executing domain account to be passed through for authentification.
    The code runs and i get back a executionIdetifier but when I take a look at the execution report of the .dts package the execution failed.
    0x80040E4D Error logon user NT AUTHORITY\ANONYMOUS LOGON
    So why is the NT AUTHORITY\ANONYMOUS LOGON used for execution?

    Thanks in advance for your Ideas, and greeting from Germany, Ph!llip

    ReplyDelete
    Replies
    1. Sounds like a problem with Service Principal Names (SPN). If everything (including data sources and destinations in the SSIS package) runs on the same server then it should be ok, but if you're working across multiple servers then you need to create SPNs for all the SQL Servers. See the following article: http://technet.microsoft.com/en-us/library/bb735885.aspx

      Delete
    2. The dtsx package I try to execute uses datasources from differend sql servers.
      I made an SPN entry on the executing server that points to the server on which the datasource is located:

      entrys on the executing server: MSSQLSvc/myDataSourceServer:port and MSSQLSvc/myDataSourceServer.fully.qualified.domain:port

      unfortunately the error message is still the same

      Delete
  9. There's a known issue with running multiple instances of the same package at the same time SSIS Catalog has some dead locking issues. I have only seen this when call using the stored procs hopefully you don't get this with using the API also!

    ReplyDelete
    Replies
    1. I've not encountered the problem yet, although I am implementing a concurrent execution from WCF so I shall find out soon enough if this comes and bites me on the ass!

      Delete
  10. I am getting {"Object reference not set to an instance of an object."} error when defining the package variable


    var package = catalog
    .Folders[AppConfig.Instance.SSISConfig.FolderName]
    .Projects[AppConfig.Instance.SSISConfig.ProjectName]
    .Packages[packageName];

    ReplyDelete
  11. Try the slightly longer approach to writing the code (where I mention coding defensively) so you know where your null object reference is. Also, try hard-coding the folder, project, and package name - just in case it's your AppConfig.Instance references that are throwing the null reference exception. e.g.

    if (integrationServices.Catalogs.Contains("MyCatalog"))
    {
    var catalog = integrationServices.Catalogs["MyCatalog"];

    if (catalog.Folders.Contains("MyFolder"))
    {
    var folder = catalog.Folders["MyFolder"];

    if (folder.Projects.Contains("MyProject"))
    {
    var project = folder.Projects["MyProject"];

    if (project.Packages.Contains("MyPackage"))
    {
    var package = project.Packages["MyPackage"];
    }
    }
    }
    }

    Also, go into SQL Server Management Studio and check that your SSIS catalog actually exists on the SQL Server.

    Hope that helps.

    ReplyDelete
  12. Hi Ben,
    what is integrationServices object? is it a Database or Integration Service?

    ReplyDelete
  13. You'll need to read the whole article to understand the context - I declare it thus:

    var connection = new SqlConnection(@"Data Source=(local);Initial Catalog=master;Integrated Security=SSPI;");
    var integrationServices = new IntegrationServices(connection);

    Hope that helps

    Ben

    ReplyDelete