Sunday, October 28, 2007

Server Grid Option in Informatica 7.X

A server grid is a server object that distributes sessions in a workflow to servers based on server availability. Grid is used to o balance the server workload which increases workflow performance.

Multiple PowerCenter Servers can be added to grid. Grid maintains the connectivity between all servers connected in grid.

A server grid contains information about other servers in the grid. PowerCenter Server fetches the server grid object and creates a TCP/IP connection to the other servers in the grid.

Each server in the grid monitors the other servers to check connectivity status. As a result, the grid notifies each server when add, edit, or delete of any server in the grid.

If a PowerCenter Server loses its connection to the grid, it tries to reestablish a connection. When a PowerCenter Server cannot reestablish a connection to the grid, session and workflow completion depends on factors such as shut down mode and which server loses connectivity.

These servers can be classified into two categories based on the tasks they perform.

1. Master Server: The PowerCenter server to which workflow is assigned and run, is called as Master Server. Master server starts the workflow, runs all non-session tasks and assigns sessions to run on other servers in grid.
The master server distributes sessions that are ready to run to available worker servers in a round-robin fashion based on server availability.

2. Worker Server: A worker server is a server that runs sessions assigned to it by a master server.
If a worker server is running the maximum number of concurrent sessions, the master server assigns another worker server to run the session. If all worker servers are running the maximum number of concurrent sessions, the master server places the session in its own ready queue.

By default, each PowerCenter Server in a server grid is both a master server and a worker server. This means that a server in a grid can distribute sessions to and receive sessions from every server in the grid.

The server grid distribution options can be set at the server level, workflow level, and session level. PowerCenter Servers specified at the session level override both server level and workflow level properties.

In below diagram, Master Server C distributes tasks/sessions to all other worker server (A & B), at the same time, if there is any non-session task, which will be executed on Master Server Only.


In Informatica 7.X, a single session can be assigned to only one Worker server to execute it. There is no provision to run a single session on multiple worker servers to improve performance.

Informatica 7.X Architecture

Informatica 7.X Architecture consists of mainly 3 components
   1. PowerCenter Client Tools
   2. Repository Server and Database
   3. PowerCenter Servers (pmserver)


PowerCenter Client Tools

The PowerCenter Client Applications are used
   1. to manage the repository
   2. to design mappings, mapplets,
   3. to create sessions to load the data
   4. to run and monitor workflows running on PowerCenter Server

A. Repository Server Administration Console.
A Repository Server can manage multiple repositories. So Administration Console is used to administer the Repository Servers and repositories.
Main tasks performed by Administration Console are
      a. Add, edit, and remove repository configurations
      b. Create/backup/copy/delete a repository
      c. Start, stop, enable, and disable repositories
      d. Upgrade Repository

B. Repository Manager.
Repository Manager is used to administer the metadata repository. Using Repository manager, user can manage repository users and groups, assign privileges and permissions, and manage folders and locks.

C. Designer.
Designer is used to create mappings that contain transformation instructions for the PowerCenter Server. User can import or create source and target definitions, create mapplets and mappings with appropriate business logic.

D. Workflow Manager.
Workflow Manager is used to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. The PowerCenter Server runs workflow tasks according to the links connecting the tasks.

E. Workflow Monitor.
Workflow Monitor used to monitor scheduled and running workflows for each PowerCenter Server.


PowerCenter Repository and Repository Server

A. Repository
The PowerCenter Repository is the heart of Informatica tools. The PowerCenter repository is a relational database managed by the Repository Server and used by the PowerCenter Server and Client tools.
The repository uses database tables to store metadata. Metadata describes different types of objects, such as mappings or transformations, that you can create or modify using the PowerCenter Client tools.
All repository client applications access the repository database tables through the Repository Server.

B. Repository Server
The Repository Server protects metadata in the repository by managing repository connections and using object locking to ensure object consistency. The Repository Server also notifies you when objects you are working with are modified or deleted by another user.
The Repository Server uses multiple Repository Agent processes to manage multiple repositories on different machines on the network.
The Repository Server uses native drivers to communicate with the repository database. PowerCenter Client tools and the PowerCenter Server communicate with the Repository Server over TCP/IP. When a client application connects to the repository, the Repository Server connects the client application directly to the Repository Agent process.


PowerCenter Server

The PowerCenter Server is a repository client application. It connects to the Repository Server and Repository Agent to retrieve workflow and mapping metadata from the repository database. When the PowerCenter Server requests a repository connection from the Repository Server, the Repository Server starts and manages the Repository Agent. The Repository Server then re-directs the PowerCenter Server to connect directly to the Repository Agent.

Below diagram represents the communication channel between client, server and repository.

Every time when user executes workflow, Workflow Manager communicates with PowerCenter Server using TCP/IP connection.

The PowerCenter Server connects to the source or target database using ODBC or native drivers. It uses TCP/IP to connect to the Repository Server.

The PowerCenter Server maintains a database connection pool for stored procedures or lookup databases in a workflow. The PowerCenter Server allows an unlimited number of connections to lookup or stored procedure databases.

Friday, October 19, 2007

Informatica and Service Oriented Architecture (SOA)

What is Informatica?
Informatica Corp was founded in 1993 in the Silicon Valley by Indian Entrepreneurs Gaurav Dhillon and Diaz Nesamoney. It was based on the idea that data warehouses should not be "handcoded", but instead can be built more efficiently with graphical tools.
PowerCenter 8.1 is the company's flagship product. The advanced edition includes Metadata Manager (formerly SuperGlue), Data Analyzer (formerly PowerAnalyzer) and other options.
New features in PowerCenter 8.1 include Grid Computing support for scalability, Java Custom Transformation Support, HTTP Transformation Support, High Availability, Push Down Optimization (ELT Architecture), Enhanced web services to leverage a service oriented architecture and Mapping Template Creation support through Microsoft Visio. It also comes with adapters to various data source systems ranging from RDBMS to Message-oriented middleware's to Web Services to Applications.

What is SOA?
It is an architectural style where all business functionalities are grouped in Services. Each service communicates which each other by passing data from one service to another or by co-coordinating activity between one or more services.
SOA is neither product nor professional application. It is just an Architectural concept.
SOA is not a platform like J2EE, Dot.Net. It can be used as a bridge between two platforms.

Why SOA?
SOA promotes the goal of separating users (consumers) from the service implementations. Services can therefore be run on various distributed platforms and be accessed across networks. This can also maximize reuse of services


   1. Loose coupling reducing the chance that changes in one service necessitates changes in other service
   2. Location transparency providing an enterprise the flexibility to move to a different service without affecting the application
   3. Platform independence through an abstraction layer that enables integration without having to having understand individual components
   4. Reliability and robustness through inherent scalability of and lack of single point failures in SOA
   5. Improved responsiveness through a common architecture

How Informatica fits in SOA?
Informatica platform is designed to work in a Service Oriented Architecture

All richness of Informatica Data Integration Functionality can be exposed as Data Services over standards-based web service protocols

Using Web Services Hub, first time released with Informatica 7.X version, Web Services Provider provides some of the PowerCenter Server functionality through a service-oriented architecture.

Web Services Hub
   1. A PowerCenter Service gateway for external clients.
   2. Processes SOAP requests from web service clients that want to access PowerCenter web services.
   3. Receives requests from web service clients and passes them to the PowerCenter Server or the Repository Server.

The PowerCenter Server or the Repository Server process the requests and send a response to the web service client through the Web Services Hub. The content of the response depends on the type of service.

Web services
   1. Batch Web Services :
      a. Allows running of PowerCenter Workflows from external application
      b. Monitoring execution
      c. Used in integration with External Scheduler
   2. Metadata Web Services :
      a. Allows logging into to Repository to browse Metadata
      b. Used for performing Authentication Functions to log in and Log out of repository
   3. Real-time Web Services :
      a. Used to expose Transformation logic as a service.
      b. Create service workflows that allow you to read and write messages to a web service client through the Web Services Hub

When you put all services together, the PowerCenter Web Service Architecture looks like as below.

From Business point of view, above architecture delivers all Data service in company’s SOA.


~Girish Chavan

Thursday, October 18, 2007

Welcome to my blog!!!

Hi Friends,
Welcome to INFROMATICA web blog!!!

The Information published in this blog is summarization of Knowledge that I have gained with experience, Articles published over internet and Informatica Help (Best of all).

I always wanted to keep all these information handy so that I can refer it any time without going thorough all articles again, just trying to avoid re-invention of the wheel.

This blog is the answer that I have found. This is an effort to add a couple of drops to ocean of Information related to Informatica ETL Tool.

I hope, this blog helps you in gaining extra knowledge

Thanks,
Girish