Sunday, December 5, 2010

Import Relational Source & Target Definitions

To import a relational source definition:

In the Source Analyzer, click Sources > Import from Database.

Select the ODBC data source used to connect to the source database.

If you need to create or modify an ODBC data source, click the Browse button to open the ODBC Administrator. Create the data source, and click OK. Select the new ODBC data source.

Enter a database user name and password to connect to the database.

Note: The user name must have the appropriate database permissions to view the object.

You may need to specify the owner name for database objects you want to use as sources.

Click Connect.

If no table names appear or if the table you want to import does not appear, click All.

Scroll down through the list of sources to find the source you want to import. Select the relational object or objects you want to import.

You can hold down the Shift key to select a block of sources within one folder, or hold down the Ctrl key to make non-consecutive selections within a folder. You can also select all tables within a folder by selecting the folder and clicking Select All. Use the Select None button to clear all highlighted selections.

Click OK.

The source definition appears in the Source Analyzer. In the Navigator, the new source definition appears in the Sources node of the active repository folder, under the source database name.

Click Repository > Save.

How to remove Duplicates of records using Informatica?

Lets say you have 2 columns in source table. Do the following in an expression:

A(in)
B(in)
A_Pre(V) = A
B_Pre(V) = B
Comp(V) = IIF( A= A_Pre and B= B_Pre, ‘Y’,'N’)
Basically you need to store the previous values in a variable and compare with incoming data.

Router Scenario in Informatica

Router Scenario:

Lets say i have more then have record in source table and i have 3 destination table A,B,C. I have to insert first 1 to 10 record in A then 11 to 20 in B and 21 to 30 in C.
Then again from 31 to 40 in A, 41 to 50 in B and 51 to 60 in C……So on upto last record.

Solution:

Generate sequence number using informatica, add filter or router transformations and define the conditions accordingly…

Define group condition as follows under router groups….

Group1 = mod(seq_number,30) >= 1 and mod(seq_number,30) <= 10
Group2 = mod(seq_number,30) >= 11 and mod(seq_number,30) <= 20
Group3 = (mod(seq_number,30) >=21 and mod(seq_number,30) <= 29 ) or mod(seq_number,30) = 0

Connect Group1 to A, Group2 to B and Group3 to C

What is Domain Configuration in Informatica?

Domain Configuration:

Domain Configuration Some of the configurations for a domain involves assigning host name, port numbers to the nodes, setting up Resilience Timeout values, providing connection information of metadata Database, SMTP details etc. All the Configuration information for a domain is stored in a set of relational database tables within the repository. Some of the global properties that are applicable for Application Services like ‘Maximum Restart Attempts’, ‘Dispatch Mode’ as ‘Round Robin’/’Metric Based’/’Adaptive’ etc are configured under Domain Configuration

Informatica Power Center 8.6 Architecture and Installation Steps

Pre-Requisites:Installtion need atleast 2GB temp space .

Need 16K default page tablespace for domain creation.

1)First Install domain and create domain tables.

2)Now Define domain name and assign nodes

3)In PowerCenter Administrator console you have to create repository service.

4)Now Create repository (or upgrade repository)

5)In the Power Center Administrator console you have to create Integration service.

6)In the Powe rCenter Administrator console you have to create Metadata service.

7)Also Create Data profiling warehouse table in Data Profiling schema.

8)In PowerCenter Administrator console you have to create Reporting service.

9)In reporting service you have to create data analyzer repository in Data Analyzer schema.

To Start the Service

../server/tomcat/bin/infaservice.sh startup

Installation Error-Informatica 8.6 in windows server 2008

Hello All,
Does Informatica 8.6 is compatible with windows server 2008 OS , i am getting the below error message when i am trying to install to it, can anyone help me out to resolve this, appreciate your time.

Kindly provide me the list of powercenter editions and itz compatibity info against all Windows OS versions

Error Message:
cannot start Informatica Services. Use the error below and catalina.out and node.log in the server/tomcat/logs directory on the current machine to get more information. Select Retry to continue the installation.
EXIT CODE: S

Info:

OS-Win Server 2008

DB-ORCLE 10G XE

INFA Edition- PC 8.6

Netscape vs. Google

If Netscape was the standard bearer for Web 1.0, Google is most certainly the standard bearer for Web 2.0, if only because their respective IPOs were defining events for each era. So let's start with a comparison of these two companies and their positioning.

Netscape framed "the web as platform" in terms of the old software paradigm: their flagship product was the web browser, a desktop application, and their strategy was to use their dominance in the browser market to establish a market for high-priced server products. Control over standards for displaying content and applications in the browser would, in theory, give Netscape the kind of market power enjoyed by Microsoft in the PC market. Much like the "horseless carriage" framed the automobile as an extension of the familiar, Netscape promoted a "webtop" to replace the desktop, and planned to populate that webtop with information updates and applets pushed to the webtop by information providers who would purchase Netscape servers.

In the end, both web browsers and web servers turned out to be commodities, and value moved "up the stack" to services delivered over the web platform.

Google, by contrast, began its life as a native web application, never sold or packaged, but delivered as a service, with customers paying, directly or indirectly, for the use of that service. None of the trappings of the old software industry are present. No scheduled software releases, just continuous improvement. No licensing or sale, just usage. No porting to different platforms so that customers can run the software on their own equipment, just a massively scalable collection of commodity PCs running open source operating systems plus homegrown applications and utilities that no one outside the company ever gets to see.

At bottom, Google requires a competency that Netscape never needed: database management. Google isn't just a collection of software tools, it's a specialized database. Without the data, the tools are useless; without the software, the data is unmanageable. Software licensing and control over APIs--the lever of power in the previous era--is irrelevant because the software never need be distributed but only performed, and also because without the ability to collect and manage the data, the software is of little use. In fact, the value of the software is proportional to the scale and dynamism of the data it helps to manage.

Google's service is not a server--though it is delivered by a massive collection of internet servers--nor a browser--though it is experienced by the user within the browser. Nor does its flagship search service even host the content that it enables users to find. Much like a phone call, which happens not just on the phones at either end of the call, but on the network in between, Google happens in the space between browser and search engine and destination content server, as an enabler or middleman between the user and his or her online experience.

While both Netscape and Google could be described as software companies, it's clear that Netscape belonged to the same software world as Lotus, Microsoft, Oracle, SAP, and other companies that got their start in the 1980's software revolution, while Google's fellows are other internet applications like eBay, Amazon, Napster, and yes, DoubleClick and Akamai.

What Is Web 2.0

Design Patterns and Business Models

The bursting of the dot-com bubble in the fall of 2001 marked a turning point for the web. Many people concluded that the web was overhyped, when in fact bubbles and consequent shakeouts appear to be a common feature of all technological revolutions. Shakeouts typically mark the point at which an ascendant technology is ready to take its place at center stage. The pretenders are given the bum's rush, the real success stories show their strength, and there begins to be an understanding of what separates one from the other.

The concept of "Web 2.0" began with a conference brainstorming session between O'Reilly and MediaLive International. Dale Dougherty, web pioneer and O'Reilly VP, noted that far from having "crashed", the web was more important than ever, with exciting new applications and sites popping up with surprising regularity. What's more, the companies that had survived the collapse seemed to have some things in common. Could it be that the dot-com collapse marked some kind of turning point for the web, such that a call to action such as "Web 2.0" might make sense? We agreed that it did, and so the Web 2.0 Conference was born.

In the year and a half since, the term "Web 2.0" has clearly taken hold, with more than 9.5 million citations in Google. But there's still a huge amount of disagreement about just what Web 2.0 means, with some people decrying it as a meaningless marketing buzzword, and others accepting it as the new conventional wisdom.

This article is an attempt to clarify just what we mean by Web 2.0.

In our initial brainstorming, we formulated our sense of Web 2.0 by example:

Web 1.0		Web 2.0
DoubleClick	-->	Google AdSense
Ofoto	-->	Flickr
Akamai	-->	BitTorrent
mp3.com	-->	Napster
Britannica Online	-->	Wikipedia
personal websites	-->	blogging
evite	-->	upcoming.org and EVDB
domain name speculation	-->	search engine optimization
page views	-->	cost per click
screen scraping	-->	web services
publishing	-->	participation
content management systems	-->	wikis
directories (taxonomy)	-->	tagging ("folksonomy")
stickiness	-->	syndication

The list went on and on. But what was it that made us identify one application or approach as "Web 1.0" and another as "Web 2.0"? (The question is particularly urgent because the Web 2.0 meme has become so widespread that companies are now pasting it on as a marketing buzzword, with no real understanding of just what it means. The question is particularly difficult because many of those buzzword-addicted startups are definitely not Web 2.0, while some of the applications we identified as Web 2.0, like Napster and BitTorrent, are not even properly web applications!) We began trying to tease out the principles that are demonstrated in one way or another by the success stories of web 1.0 and by the most interesting of the new applications.

Remove Virus From Virus Effected System

Hai Friends

Today i am going to give some information on how to remove the virus from a virus effected system without removing data or operating system.

Please download the tool from the following link and
just restart ur system in Safemode and run the tool

Hope ur enjoy the work.

Sunday, November 7, 2010

Get ready for virtual wireless LANs

One of the primary components of IT that is seeing a lot of investment is virtualization. In most cases, the term virtualization refers to server virtualization. At the Blue Socket seminar Jim was not discussing server virtualization, but wireless LAN virtualization. At first blush, the thought of a virtual wireless LAN seems a bit strange. One obvious question is "how do you virtualize an access point?" The quick answer is that you don't.

Part of the motivation for a virtual wireless LAN is the realization that wireless LANs are becoming an increasingly integral component of the network infrastructure. As such, wireless LANs need to be able to scale, both in terms of the amount of traffic that they can support and the services that they can enable. That is difficult, if not impossible, to do with a centralized architecture that requires that all traffic has to go to a centralized controller. The Blue Socket approach is to separate the data and the control plane in a fashion somewhat similar to the approach taken by the Cisco Nexus 1000V.

The data plane is distributed to the access points and the control plane is located in a centralized computer. For this approach to be successful, you have to be able to minimize the amount of traffic that flows over the LAN or the WAN to the centralized controller. The really interesting thing that Blue Socket does is that they virtualize the controller software and hence sell a virtual wireless LAN. Virtualizing the controller has a number of benefits, including reducing the acquisition cost and making it easier to add capacity as needed.

Serious Linux Kernel security hole

Linux has security problems like any other operating system. Most of them aren't that big a deal though. Many of the more serious ones require local user access to cause any real trouble, and except for Linux desktop users that's not a real concern. The latest Linux security problem with Reliable Datagram Sockets (RDS) though is a real problem.

RDS is an Oracle creation. It's used for sending multiple messages from a single network socket to multiple end-points. The point of RDS is that you can use it to keep inter-process communication (IPC) going without timeouts when a system is running under very heavy loads. Thus, you're most likely to be using RDS if you're running a mission-critical DBMS server or a Linux, Apache, MySQL, PHP/Python/Perl (LAMP) stack application.

VSR Security, the company that found the security hole, reports that Linux kernel, starting from 2.6.30, which was the first to include RDS, could be attacked in by almost any user in a way that would let them become the super-user, aka root. In short, someone coming in over an Internet connection could, in theory, take over a Linux server. This is Not good.

The core problem was that the "kernel functions responsible for copying data between kernel and user space failed to verify that a user-provided address actually resided in the user segment, [thus] a local attacker could issue specially crafted socket function calls to write arbitrary values into kernel memory. By leveraging this capability, it is possible for unprivileged users to escalate privileges to root."

I don't know if it will do that, but I was able to use the exploit code to knock out a SUSE Linux server in my lab remotely. Let me repeat myself: Not good. Others have reported that they've been able to use the exploit code to open up a root shell on Ubuntu 10.04.

For the problem to hit your system you have to have RDS on. Specifically, you have to have the CONFIG_RDS kernel configuration option set. That's usually an option in most distributions rather than a default. Of course, if you really need RDS, you're probably running it on a mission critical DBMS or Web server. That's the last place you want an attack to land. The other necessary condition for an attacker to get at your server is for there to be no restrictions on unprivileged users loading packet family modules. That, I regret to say, is the default on many distributions.

Fortunately, unlike some other operating systems I could name, security holes tend to get fixed really quick in Linux. Linus Torvalds has already issued a fix. You can either compile a new kernel with this fix, which few people do these days, or wait for your up-stream Linux distributor to issue the fix.

In the meantime, I recommend if you're running a Linux server, and you're using RDP, that you log in as root and run the following command:

echo "alias net-pf-21 off" > /etc/modprobe.d/disable-rds

Your server may run slower until the final fix is in, but in the meantime you'll be safe and that's the most important thing.

Three ways of solving data migration problems

Clustered storage represents one of the more significant trends in storage right now and for good reason. Clustered storage gives users a path to follow to avoid one of the more unpleasant realities that storage networks create: storage islands.

In most networked storage configurations, storage islands result when companies purchase multiple storage arrays. Though these storage arrays may communicate and even replicate data between one another, the real problem surfaces when one needs to retire a storage array. At that point, users must almost always schedule an application outage for servers attached to that array since the storage array provides no native, nondisruptive way to migrate data to a new storage array.

To date users have circumvented this in a couple of ways. One was to use volume managers on servers that have the capability to discover logical unit numbers or disks on a new storage array and then mirror the data in real time from the old to the new array. The other was to use a network-based virtualization software that resided between the server and storage array and could perform similar nondisruptive data migration functions.

Clustered storage arrays now give companies a third option in which to perform this task nondisruptively. With clustered storage, all storage arrays in a cluster communicate with one another which allows users to copy or move data between them in real time without requiring server or application outages.

Clustered storage arrays such as those from Isilon Systems Inc. still require companies to standardize on their products to deliver this functionality and do not address concerns about how to manage storage arrays companies may already own. But for companies that want to simplify storage management and standardize on a specific vendor's product, clustered storage arrays provide a viable alternative that avoid a key shortcoming of storage networks.

Data Migration in a SAP Implementation project

Data Migration is one of the key and complex areas in a SAP implementation project. Some of the implementation fails only due to lack of proper data migration strategy in place. If we are going to implement SAP by replacing the source (e.g Legacy) system the objective of data migration is to load the relevant business data currently residing in the source system into the appropriate SAP modules. This data loaded into SAP system needs to be validated with the source data.

So the Data Migration process includes:

Extraction of data from the source system
Transformation of the data to SAP format
Loading of data into the corresponding SAP module

Steps to be followed:

We need to first identify which are the objects we are going to migrate like Contracts, Assets, Business Partner etc and we need to finalize the solution architecture, whether we will be using any intermediate staging environment or not and whether to use an ETL tool for that or not. Also we need to finalize the strategy and approach/ techniques for data loading of each object (whether to use BAPI/IDoc/Batch Input/Direct Input).

The steps to be followed are:·

Identification of Source data: Source Data Dictionary
Identification of Target data requirement: Target Data Dictionary
Finalizing the technique of data loading
Mapping of source to target: Mapping document with gaps
Resolution of data gaps: By applying business rules
Preparation of test plan and data: To be used for unit testing
Finalizing Validation and reconciliation strategy: Before uploading the data the data (count, sum of some specific field) should be checked against some control file (reconciliation) and after uploading the data the data loaded into SAP tables should be validated against the source file
Development: Data conversion program
Developing the validation and reconciliation report
Integration Testing: Check for the completeness of the entire data flow
Volume testing: To check the performance
Transport into Production

Get SAP add-in for QTP 8.2

You can download QTP SAP 8.2 add-in from below link....

Pls specify search keyword as SAP and choose Trial in refine search....

Hope this was useful...!!

https://h10078.www1.hp.com/cda/hpdc/display/main/index.jsp?zn=bto&cp=54_4012
_100__

and also we can go for

QTP 10.0 comes with all add-ins inbuilt.

re-run the set up again

click on "Modify".....and then select SAP add-in.

PS: You need to have admin rights to perform this task.

QTP Tips & Tricks

Data Table

Two Types of data tables
Global data sheet: Accessible to all the actions
Local data sheet: Accessible to the associated action only

Usage:
DataTable("Column Name",dtGlobalSheet) for Global data sheet
DataTable("Column Name",dtLocalSheet) for Local data sheet

If we change any thing in the Data Table at Run-Time the data is changed only in the run-time data table. The run-time data table is accessible only

through then test result. The run-time data table can also be exported using DataTable.Export or DataTable.ExportSheet

How can i save the changes to my DataTable in the test itself?
Well QTP does not allow anything for saving the run time changes to the actual data sheet. The only work around is to share the

spreadsheet and then access it using the Excel COM Api's.

How can i check if a parameter exists in DataTable or not?
The best way would be to use the below code:
code:
on error resume next
val=DataTable("ParamName",dtGlobalSheet)
if err.number<> 0 then
'Parameter does not exist
else
'Parameter exists
end if

How can i make some rows colored in the data table?

Well you can't do it normally but you can use Excel COM API's do the same. Below code will explain some expects of Excel COM APIs

code:
Set xlApp=Createobject("Excel.Application")
set xlWorkBook=xlApp.workbooks.add
set xlWorkSheet=xlWorkbook.worksheet.add
xlWorkSheet.Range("A1:B10").interior.colorindex = 34 'Change the color of the cells
xlWorkSheet.Range("A1:A10").value="text" 'Will set values of all 10 rows to "text"
xlWorkSheet.Cells(1,1).value="Text" 'Will set the value of first row and first col

rowsCount=xlWorkSheet.Evaluate("COUNTA(A:A)") 'Will count the # of rows which have non blank value in the column A
colsCount=xlWorkSheet.Evaluate("COUNTA(1:1)") 'Will count the # of non blank columns in 1st row

xlWorkbook.SaveAs "C:\Test.xls"
xlWorkBook.Close
Set xlWorkSheet=Nothing
Set xlWorkBook=Nothing
set xlApp=Nothing

SMART Identification

Smart Identification is nothing but an algorithm used by QTP when it is not able to recognize one of the object. A very generic example as per the

QTP manual would be, A photograph of a 8 year old girl and boy and QTP records identification properties of that girl when she was 8, now when

both are 10 years old then QTP would not be able to recognize the girl. But there is something that is still the same, that is there is only one girl in

the photograph. So it kind of PI (Programmed intelligence) not AI.

When should i use SMART Identification?
Something that people don't think about too much. But the thing is that you should disable SI while creating your test cases. So that you

are able to recognize the objects that are dynamic or inconsistent in their properties. When the script has been created, the SI should be enabled,

so that the script does not fail in case of small changes. But the developer of the script should always check for the test results to verify if the SI

feature was used to identify a object or not. Sometimes SI needs to be disabled for particular objects in the OR, this is advisable when you use

SetTOProperty to change any of the TO properties of an object and especially ordinal identifiers like index, location and creationtime.

Descriptive Programming
Descriptive programming is nothing but a technique using which operations can be performed on the AUT object which are not present in

the OR. For more details refer to http://bondofus.tripod.com/QTP/DP_in_QTP.doc (right click and use save as...)

Recovery Scenarios

What is a Recovery Scenario?
Recovery scenario gives you an option to take some action for recovering from a fatal error in the test. The error could range in from

occasional to typical errors. Occasional error would be like "Out of paper" popup error while printing something and typical errors would be like

"object is disabled" or "object not found". A test case have more then one scenario associated with it and also have the priority or order in which it

should be checked.

What does a Recovery Scenario consists of?
Trigger: Trigger is nothing but the cause for initiating the recovery scenario. It could be any popup window, any test error, particular state

of an object or any application error.
Action: Action defines what needs to be done if scenario has been triggered. It can consist of a mouse/keyboard event, close application, call a

recovery function defined in library file or restart windows. You can have a series of all the specified actions.
Post-recovery operation: Basically defined what need to be done after the recovery action has been taken. It could be to repeat the step, move

to next step etc....

When to use a Recovery Scenario and when to us on error resume next?
Recovery scenarios are used when you cannot predict at what step the error can occur or when you know that error won't occur in your

QTP script but could occur in the world outside QTP, again the example would be "out of paper", as this error is caused by printer device driver. "On

error resume next" should be used when you know if an error is expected and dont want to raise it, you may want to have different actions

depending upon the error that occurred. Use err.number & err.description to get more details about the error.

Library Files or VBScript Files
How do we associate a library file with a test ?
Library files are files containing normal VBScript code. The file can contain function, sub procedure, classes etc.... You can also use executefile

function to include a file at run-time also. To associate a library file with your script go to Test->Settings... and add your library file to resources

tab.

When to associate a library file with a test and when to use execute file?
When we associate a library file with the test, then all the functions within that library are available to all the actions present in the test. But

when we use Executefile function to load a library file, then the function are available in the action that called executefile. By associated a library to

a test we share variables across action (global variables basically), using association also makes it possible to execute code as soon as the script

runs because while loading the script on startup QTP executes all the code on the global scope. We can use executefile in a library file associated

with the test to load dynamic files and they will be available to all the actions in the test.

Add-ins

Test and Run-time Object
What is the difference between Test Objects and Run Time Objects ?
Test objects are basic and generic objects that QTP recognize. Run time object means the actual object to which a test object maps.
Can i change properties of a test object
Yes. You can use SetTOProperty to change the test object properties. It is recommended that you switch off the Smart Identification for the

object on which you use SetTOProperty function.
Can i change properties of a run time object?
No (but Yes also). You can use GetROProperty("outerText") to get the outerText of a object but there is no function like SetROProperty to

change this property. But you can use WebElement().object.outerText="Something" to change the property.

Action & Functions
What is the difference between an Action and a function?
Action is a thing specific to QTP while functions are a generic thing which is a feature of VB Scripting. Action can have a object repository

associated with it while a function can't. A function is just lines of code with some/none parameters and a single return value while an action can

have more than one output parameters.

Where to use function or action?
Well answer depends on the scenario. If you want to use the OR feature then you have to go for Action only. If the functionality is not about any

automation script i.e. a function like getting a string between to specific characters, now this is something not specific to QTP and can be done on

pure VB Script, so this should be done in a function and not an action. Code specific to QTP can also be put into an function using DP. Decision of

using function/action depends on what any one would be comfortable using in a given situation.

Checkpoint & Output value
What is checkpoint?
Checkpoint is basically a point in the test which validates for truthfulness of a specific things in the AUT. There are different types of

checkpoints depending on the type of data that needs to be tested in the AUT. It can be text, image/bitmap, attributes, XML etc....
What's the difference between a checkpoint and output value?
Checkpoint only checks for the specific attribute of an object in AUT while Output value can output those attributes value to a column in data

table.
How can i check if a checkpoint passes or not?
code:
chk_PassFail = Browser(...).Page(...).WebEdit(...).Check (Checkpoint("Check1"))
if chk_PassFail then
MsgBox "Check Point passed"
else
MsgBox "Check Point failed"
end if

My test fails due to checkpoint failing, Can i validate a checkpoint without my test failing due to checpoint failure?
code:
Reporter.Filter = rfDisableAll 'Disables all the reporting stuff
chk_PassFail = Browser(...).Page(...).WebEdit(...).Check (Checkpoint("Check1"))
Reporter.Filter = rfEnableAll 'Enable all the reporting stuff
if chk_PassFail then
MsgBox "Check Point passed"
else
MsgBox "Check Point failed"
end if

Environment
How can i import environment from a file on disk
Environment.LoadFromFile "C:\Env.xml"
How can i check if a environment variable exist or not?
When we use Environment("Param1").value then QTP expects the environment variable to be already defined. But when we use

Environment.value("Param1") then QTP will create a new internal environment variable if it does not exists already. So to be sure that variable exist

in the environment try using Environment("Param1").value.

How to connect to a database?
code:

Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adUseClient = 3
Set objConnection = CreateObject("ADODB.Connection")
Set objRecordset = CreateObject("ADODB.Recordset")
objConnection.Open "DRIVER={Microsoft ODBC for Oracle};UID=;PWD=
"
objRecordset.CursorLocation = adUseClient
objRecordset.CursorType = adopenstatic
objRecordset.LockType = adlockoptimistic
ObjRecordset.Source="select field1,field2 from testTable"
ObjRecordset.ActiveConnection=ObjConnection
ObjRecordset.Open 'This will execute your Query
If ObjRecordset.recordcount>0 then
Field1 = ObjRecordset("Field1").Value
Field2 = ObjRecordset("Field2").Value
End if

How can I save the changes to my DataTable in the test itself?

Well QTP does not allow anything for saving the run time changes to the actual data sheet. The only work around is to share the

spreadsheet and then access it using the Excel COM Api's.

Saturday, October 30, 2010

BI - Business Intelligence in Data warehouse

Business Intelligence is a term introduced by Howard Dresner of Gartner Group in 1989. He described Business Intelligence as a set of concepts and methodologies to improve decision making in business through use of facts and fact based systems. Over time as use of Business Intelligence has become mainstream more definitions of Business Intelligence have emerged. Business intelligence (BI) is a broad category of application programs and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. BI applications include the activities of decision support, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining.

Business intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. BI applications include the activities of decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining.

Business intelligence applications can be:

* Mission-critical and integral to an enterprise's operations or occasional to meet a special requirement
* Enterprise-wide or local to one division, department, or project
* Centrally initiated or driven by user demand

ETL - Extraction Transformation and Loading In Data warehosuing

The extract-transform-load (ETL) system, or more informally, the "back room," is often estimated to consume 70 percent of the time and effort of building a data warehouse. But there hasn't been enough careful thinking about just why the ETL system is so complex and resource intensive. Everyone understands the three letters: You get the data out of its original source location (E), you do something to it (T), and then you load it (L) into a final set of tables for the users to query.

When asked about breaking down the three big steps, many designers say, "Well, that depends." It depends on the source, it depends on funny data idiosyncrasies, it depends on the scripting languages and ETL tools available, it depends on the skills of the in-house staff, and it depends on the query and reporting tools the end users have.

The "it depends" response is dangerous because it becomes an excuse to roll your own ETL system, which in the worst-case scenario results in an undifferentiated spaghetti-mess of tables, modules, processes, scripts, triggers, alerts, and job schedules. Maybe this kind of creative design approach was appropriate a few years ago when everyone was struggling to understand the ETL task, but with the benefit of thousands of successful data warehouses, a set of best practices is ready to emerge.

I have spent the last 18 months intensively studying ETL practices and ETL products. I have identified a list of 38 subsystems that are needed in almost every data warehouse back room. That's the bad news. No wonder the ETL system takes such a large fraction of the data warehouse resources. But the good news is that if you study the list, you'll recognize almost all of them, and you'll be on the way to leveraging your experience in each of these subsystems as you build successive data warehouses.

The 38 Subsystems

1. Extract system. Source data adapters, push/pull/dribble job schedulers, filtering and sorting at the source, proprietary data format conversions, and data staging after transfer to ETL environment.

2. Change data capture system. Source log file readers, source date and sequence number filters, and CRC-based record comparison in ETL system.

3. Data profiling system. Column property analysis including discovery of inferred domains, and structure analysis including candidate foreign key — primary relationships, data rule analysis, and value rule analysis.

4. Data cleansing system. Typically a dictionary driven system for complete parsing of names and addresses of individuals and organizations, possibly also products or locations. "De-duplication" including identification and removal usually of individuals and organizations, possibly products or locations. Often uses fuzzy logic. "Surviving" using specialized data merge logic that preserves specified fields from certain sources to be the final saved versions. Maintains back references (such as natural keys) to all participating original sources.

5. Data conformer. Identification and enforcement of special conformed dimension attributes and conformed fact table measures as the basis for data integration across multiple data sources.

6. Audit dimension assembler. Assembly of metadata context surrounding each fact table load in such a way that the metadata context can be attached to the fact table as a normal dimension.

7. Quality screen handler. In line ETL tests applied systematically to all data flows checking for data quality issues. One of the feeds to the error event handler (see subsystem 8).

8. Error event handler. Comprehensive system for reporting and responding to all ETL error events. Includes branching logic to handle various classes of errors, and includes real-time monitoring of ETL data quality

9. Surrogate key creation system. Robust mechanism for producing stream of surrogate keys, independently for every dimension. Independent of database instance, able to serve distributed clients.

10. Slowly Changing Dimension (SCD) processor. Transformation logic for handling three types of time variance possible for a dimension attribute: Type 1 (overwrite), Type 2 (create new record), and Type 3 (create new field).

11. Late arriving dimension handler. Insertion and update logic for dimension changes that have been delayed in arriving at the data warehouse.

12. Fixed hierarchy dimension builder. Data validity checking and maintenance system for all forms of many-to-one hierarchies in a dimension.

13. Variable hierarchy dimension builder. Data validity checking and maintenance system for all forms of ragged hierarchies of indeterminate depth, such as organization charts, and parts explosions.

14. Multivalued dimension bridge table builder. Creation and maintenance of associative (bridge) table used to describe a many-to-many relationship between dimensions. May include weighting factors used for allocations and situational role descriptions.

15. Junk dimension builder. Creation and maintenance of dimensions consisting of miscellaneous low cardinality flags and indicators found in most production data sources.

16. Transaction grain fact table loader. System for updating transaction grain fact tables including manipulation of indexes and partitions. Normally append mode for most recent data. Uses surrogate key pipeline (see subsystem 19).

17. Periodic snapshot grain fact table loader. System for updating periodic snapshot grain fact tables including manipulation of indexes and partitions. Includes frequent overwrite strategy for incremental update of current period facts. Uses surrogate key pipeline (see subsystem 19).

18. Accumulating snapshot grain fact table loader. System for updating accumulating snapshot grain fact tables including manipulation of indexes and partitions, and updates to both dimension foreign keys and accumulating measures. Uses surrogate key pipeline (see subsystem 19).

19. Surrogate key pipeline. Pipelined, multithreaded process for replacing natural keys of incoming data with data warehouse surrogate keys.

20. Late arriving fact handler. Insertion and update logic for fact records that have been delayed in arriving at the data warehouse.

21. Aggregate builder. Creation and maintenance of physical database structures, known as aggregates, that are used in conjunction with a query-rewrite facility, to improve query performance. Includes stand-alone aggregate tables and materialized views.

22. Multidimensional cube builder. Creation and maintenance of star schema foundation for loading multidimensional (OLAP) cubes, including special preparation of dimension hierarchies as dictated by the specific cube technology.

23. Real-time partition builder. Special logic for each of the three fact table types (see subsystems 16, 17, and 18) that maintains a "hot partition" in memory containing only the data that has arrived since the last update of the static data warehouse tables.

24. Dimension manager system. Administration system for the "dimension manager" who replicates conformed dimensions from a centralized location to fact table providers. Paired with subsystem 25.

25. Fact table provider system. Administration system for the "fact table provider" who receives conformed dimensions sent by the dimension manager. Includes local key substitution, dimension version checking, and aggregate table change management.

26. Job scheduler. System for scheduling and launching all ETL jobs. Able to wait for a wide variety of system conditions including dependencies of prior jobs completing successfully. Able to post alerts.

27. Workflow monitor. Dashboard and reporting system for all job runs initiated by the Job Scheduler. Includes number of records processed, summaries of errors, and actions taken.

28. Recovery and restart system. Common system for resuming a job that has halted, or for backing out a whole job and restarting. Significant dependency on backup system (see subsystem 36).

29. Parallelizing/pipelining system. Common system for taking advantage of multiple processors, or grid computing resources, and common system for implementing streaming data flows. Highly desirable (eventually necessary) that parallelizing and pipelining be invoked automatically for any ETL process that meets certain conditions, such as not writing to the disk or waiting on a condition in the middle of the process.

30. Problem escalation system. Automatic plus manual system for raising an error condition to the appropriate level for resolution and tracking. Includes simple error log entries, operator notification, supervisor notification, and system developer notification.

31. Version control system. Consistent "snapshotting" capability for archiving and recovering all the metadata in the ETL pipeline. Check-out and check-in of all ETL modules and jobs. Source comparison capability to reveal differences between different versions.

32. Version migration system. development to test to production. Move a complete ETL pipeline implementation out of development, into test, and then into production. Interface to version control system to back out a migration. Single interface for setting connection information for entire version. Independence from database location for surrogate key generation.

33. Lineage and dependency analyzer. Display the ultimate physical sources and all subsequent transformations of any selected data element, chosen either from the middle of the ETL pipeline, or chosen on a final delivered report (lineage). Display all affected downstream data elements and final report fields affected by a potential change in any selected data element, chosen either in the middle of the ETL pipeline, or in an original source (dependency).

34. Compliance reporter. Comply with regulatory statutes to prove the lineage of key reported operating results. Prove that the data and the transformations haven't been changed. Show who has accessed or changed any such data.

35. Security system. Administer role-based security on all data and metadata in the ETL pipeline. Prove that a version of a module hasn't been changed. Show who has made changes.

36. Backup system. Backup data and metadata for recovery, restart, security, and compliance requirements.

37. Metadata repository manager. Comprehensive system for capturing and maintaining all ETL metadata, including all transformation logic. Includes process metadata, technical metadata, and business metadata.

38. Project management system. Comprehensive system for keeping track of all ETL development.

Data Warehouse Architecture

A great listing found on it.toolbox.com

A data warehouse architecture is primarily based on the business processes of a business enterprise taking into consideration the data consolidation across the business enterprise with adequate security, data modeling and organization, extent of query requirements, meta data management and application, warehouse staging area planning for optimum bandwidth utilization and full technology implementation.

The Data Warehouse Architecture includes many facets. Some of these are listed as follows:

1 Process Architecture
2 Data Model Architecture
3 Technology Architecture
4 Information Architecture
5 Resource Architecture
6 Various Architectures
7 More Resources

Process Architecture
Describes the number of stages and how data is processed to convert raw / transactional data into information for end user usage.
The data staging process includes three main areas of concerns or sub- processes for planning data warehouse architecture namely “Extract”, “Transform” and “Load”.

These interrelated sub-processes are sometimes referred to as an “ETL” process.

1)Extract- Since data for the data warehouse can come from different sources and may be of different types, the plan to extract the data along with appropriate compression and encryption techniques is an important requirement for consideration.

2)Transform- Transformation of data with appropriate conversion, aggregation and cleaning besides de-normalization and surrogate key management is also an important process to be planned for building a data warehouse.

3)Load- Steps to be considered to load data with optimization by considering the multiple areas where the data is targeted to be loaded and retrieved is also an important part of the data warehouse architecture plan.

Data Model Architecture
In Data Model Architecture (also known as Dimensional Data Model), there are 3 main data modeling styles for enterprise warehouses:
1.3rd Normal Form - Top Down Architecture, Top Down Implementation
2.Federated Star Schemas - Bottom Up Architecture, Bottom Up Implementation
3.Data Vault - Top Down Architecture, Bottom Up Implementation

Technology Architecture
Scalability and flexibility is required in all facets. The extent of these features are largely depend upon organizational size, business requirements, nature of business etc.
Technology or Technical architecture primary evolved from derivations from the process architecture, meta data management requirements based on business rules and security levels implementations and technology tool specific evaluation.
Besides these, the Technology architecture also looks into the various technology implementation standards in database management, database connectivity protocols (ODBC, JDBC, OLE DB etc), Middleware (based on ORB, RMI, COM/DOM etc.), Network protocols (DNS, LDAP etc) and other related technologies.

Information Architecture
Information Architecture is the process of translating the information from one form to another in a step by step sequence so as to manage the storage, retrieval, modification and deletion of the data in the data warehouse.

Resource Architecture
Resource architecture is related to software architecture in that many resources come from software resources. Resources are important because they help determine performance. Workload is the other part of the equation. If you have enough resources to complete the workload in the right amount of time, then performance will be high. If there are not enough resources for the workload, then performance will be low.

Various Architectures
Please notice that with the different architectures there is one that stands out: Data Model Architecture. What is happening in the integration industry at large is: the ability to integrate information across the enterprise is becoming dependent on the quality of the data model architecture below.
The ability to be compliant, consistent and repeatable depends on how the data model is built under the covers.
There are 3 main data modeling styles for enterprise warehouses:
3rd Normal Form - Top Down Architecture, Top Down Implementation
Federated Star Schemas - Bottom Up Architecture, Bottom Up Implementation
Data Vault - Top Down Architecture, Bottom Up Implementation
You can read more about the Data Vault by searching for "Data Vault Data Model" on the web.
The point to Data Warehousing Architecture, is it is not JUST a data warehouse anymore. It is now a full-scale data integration platform, including right-time (real-time) data, and batch or strategic data sets in a single, auditable (and integrated) data store.

Requirements of a Data Warehouse

The data warehouse must make an organization’s information easily accessible.

The contents of the data warehouse must be understandable. The data must be intuitive and obvious to the business user, not merely the developer. Understandability implies legibility; the contents of the data warehouse need to be labeled meaningfully. Business users want to separate and combine the data in the warehouse in endless combinations, a process commonly referred to as slicing and dicing. The tools that access the data warehouse must be simple and easy to use. They also must return query results to the user with minimal wait times.

The data warehouse must present the organization’s information consistently.

The data in the warehouse must be credible. Data must be carefully assembled from a variety of sources around the organization, cleansed, quality assured, and released only when it is fit for user consumption. Information from one business process should match with information from another. If two performance measures have the same name, then they must mean the same thing. Conversely, if two measures don’t mean the
same thing, then they should be labeled differently. Consistent information means high-quality information. It means that all the data is accounted for and complete. Consistency also implies that common definitions for the contents of the data warehouse are available for users.

The data warehouse must be adaptive and resilient to change.

We simply can’t avoid change. User needs, business conditions, data, and technology are all subject to the shifting sands of time. The data warehouse must be designed to handle this inevitable change. Changes to the data warehouse should be graceful, meaning that they don’t invalidate existing data or applications. The existing data and applications should not be changed or disrupted when the business community asks new questions or new data is added to the warehouse. If descriptive data in the warehouse is modified, we must account for the changes appropriately.

The data warehouse must be a secure bastion that protects our informationassets.

An organization’s informational crown jewels are stored in the data warehouse. At a minimum, the warehouse likely contains information about what we’re selling to whom at what price—potentially harmful details in the hands of the wrong people. The data warehouse must effectively control access to the organization’s confidential information.

The data warehouse must serve as the foundation for improved decision making.

The data warehouse must have the right data in it to support decision making. There is only one true output from a data warehouse: the decisions that are made after the data warehouse has presented its evidence. These decisions deliver the business impact and value attributable to the warehouse. The original label that predates the data warehouse is still the best description of what we are designing: a decision support system.

The business community must accept the data warehouse if it is to be deemed successful.

It doesn’t matter that we’ve built an elegant solution using best-of-breed products and platforms. If the business community has not embraced the data warehouse and continued to use it actively six months after training, then we have failed the acceptance test. Unlike an operational system rewrite, where business users have no choice but to use the new system, data warehouse usage is sometimes optional. Business user acceptance has more to do with simplicity than anything else.

As this list illustrates, successful data warehousing demands much more than being a stellar DBA or technician. With a data warehousing initiative, we have one foot in our information technology (IT) comfort zone, while our other foot is on the unfamiliar turf of business users. We must straddle the two, modifying some of our tried-and-true skills to adapt to the unique demands of data warehousing. Clearly, we need to bring a bevy of skills to the party to behave like we’re a hybrid DBA/MBA.

SAS DATA Step

A SAS program is composed of two parts: data steps that deal with data cleaning and data format, and procedures that perform required statistical analyses and/or graphically present the results. Data steps are important for several reasons. First, the dataset may not be in a SAS compatible format, although this is usually not the case for the datasets in class examples or exercises. Second, sometimes you need to extract some of the variables or some of the observations from the dataset to perform analysis. Third, different procedures may require the same dataset in different format. A data step is needed to transform the dataset into the appropriate format for a procedure.

Mathematical operations are listed in the following table:

Manipulating variables in a data step (recoding, if/then statements)

To illustrate the data manipulation, let’s take a sample data set:

data a1;
input gender $ age weight;
cards;
M 13 143
M 16 132
F 19 140
M 20 120
M 15 110
F 18 95
F 22 105
;

Suppose you want a data set of females only. The following SAS code will create a new data set call aa and store those observations whose value for the variable gender is not ‘M’. The set a1 statement after the data aa statement tells SAS to make a copy of the dataset a1 and save it as aa. The if/then statement deletes the observations in dataset aa whose gender variable has a value ‘M’. Quotation marks are used on M because gender is a categorical variable. The dollar sign ($) is used when you have a text variable rather than a numerical variable (i.e., gender coded as M, F rather than as 1 denoting male and 2 denoting female).

data aa;
set a1;
if gender eq 'M' then delete;
5
or
if gender eq 'F';
run;

If you want to include those who are 16 years or older, you can do:

data ab;
set a1;
if age lt 16 then delete;
run;

You can also select variables from a dataset for analysis. The statement is keep or drop. For example, if you do not need the variable age in your analysis, you can do:

data ac;
set a1;
drop age;
or
data ac;
set a1;
keep gender weight;

This last statement will create a dataset that only contains the two variables specified, gender and weight.

SAS Issues

Save the file
Now you are familiar with program editor window, log window, and output window. If you want to save the work you’ve done in a session, you’ll need to save the contents of each window separately. Usually, you only need to save the program; you can always run the program to get the log and output. To save a
program file, you’ll need first to make sure the program editor is the active window, then go to file and select save command. Similarly, you can save a log file when a log window is active, or an output file when the output window is active. You do not have to run the entire program every time you make a correction to your SAS program. Each SAS procedure is relatively independent of other procedures. As long as you have the dataset you need in this procedure in SAS, you can run only part of the program by highlighting the part of the program you want to run and then clicking the run button in the tool bars.

Missing values
a. Dots for missing observations. If your data set has missing values, you’ll need to specify them as adot in the SAS dataset.
b. What if data set does not have dots? You can add a dot to the corresponding missing value locations using a data step. For example, if you have two variables, X and Y, in your data set, and 10 observations. The ninth value of Y is missing. The following code with an if statement will do:

data a2; set a1;
If _n_ eq 9 then Y=.

c. Reading in data @@. You’ve already learned that when you input your dataset after a CARDS or DATALINES statement, every observation needs to be on an individual line. In case you want to make better use of the window and want to have more than one observation per line, @@ is the syntax that tells SAS where the end of one observation is. For example:

data b1;
input x y z @@;
cards;
1.1 2.2 3.3 4.4 5.5 6.5
;

It may be that your variables are data strings instead of numbers, for example gender or disease type. We call these variables categorical. In this case, SAS wants you to specify which variables are categorical by adding a $ sign right after the name of the variable in the input statement. Sample code follows:

data b1;
input state $ county $ name $ gender $ weight;
cards;
indiana tipp brown female 125
;

d. What if my Excel data file is not reading properly into SAS or not at all? If the Excel data file is not reading into SAS at all, most likely it’s because your Excel data file is open. The Excel file must be closed before you import it into SAS. There are other reasons that the Excel data file is not reading in properly. It could be that the data type of your Excel cells is not correctly defined. Inappropriate reading also happens when you do not have a header in the first row, since the import procedure takes the first row as header by default. However, this can be changed during the import procedure under options.
How do you know if SAS is reading your dataset correctly? Use the proc print procedure and see if the dataset in SAS is what you expected.

Exporting to Excel, Access, or SPSS (.txt, .xls, .prn)
Exporting a data set to Excel is the opposite procedure of the import process. If you go to FILE and then select EXPORT DATA, an export wizard window pops up. Then just follow the wizard through the following steps.
Step 1: Choose a data set that you created in the WORK library (where the SAS datasets are stored automatically by SAS). Click next button when you are done.
Step 2: Choose the file type you want to export to. Available types include Excel, Access, dBase, delimited file, and many others. Choose Excel 2000 and then click next.
Step 3: Type in the directory path where you want to save your data file in. If you are not sure of the path, click on the browse button and find the location. At this time, you may click on the OK button to finish the export. To export the data to Access, procedures are basically the same except that you need to choose Access type or SPSS in step 2 of the above exporting procedure.

How to use the help menu
The SAS help menu is helpful if you want to self-improve your knowledge of SAS procedures. There are two ways of getting SAS help. One is to go to the help menu and then SAS system help. Then go to the Index tab and type in the name of the procedure. SAS will give you the syntax of the procedure as well as
some examples. If you have a specific question, you can use the Search tab, and type in the key word of your question. The other way of getting help is go to the books and training and then online doc. Online doc is easier to browse.

SAS Procedure

A SAS program is composed of one or more (statistical) procedures. Each procedure is a unit, although some are needed to run others. Some often-used procedures for statistical analysis are explained in detail.

Proc print
The output of this procedure is the data set that you specified by writing data=dataname option after the print key word. This data= option is common for almost every SAS procedure. It is a good habit to use this option all the time so that you know with which dataset you are working. This is helpful especially when there are multiple datasets, which is usually the case when you are performing statistical analysis using SAS. Here’s an example of how proc print works. In the data step section, we created a data set called a1 with three variables (gender, age, weight), and seven observations. It’s a good idea to always check if SAS has read in your dataset correctly before performing any analyses on the data.

proc print data=a1;
run;

If you highlight this section of code and click on the run button, you’ll see the dataset in the output window
as follows:

Obs gender age weight
1 M 13 143
2 M 16 132
3 F 19 140
4 M 20 120
5 M 15 110
6 F 18 95
7 F 22 105
6

If you want to see only some variables in the data set, you could add a statement after the proc print line in the format of var gender age;. This would generate output similar to that shown above except the weight variable would not be included.

Proc univariate
It is one of the most important procedures for elementary statistical analysis. It outputs the basic statistics of one or more variables, and has optional statements to generate qqplots and histograms. Sample code follows:

proc univariate data=a1;
var weight;
qqplot;
histogram;
run;

The var statement is optional. Without this statement, a univariate analysis is performed for all numeric variables in the order they appear in the dataset.

Proc capability
It has a variety of functions including creating a normal qq plot, histogram, and probability plots, although it is often used to create a normal qq plot in elementary statistical analysis. A normal qq plot and a histogram can be created using the code in the univariate example, just replacing univariate with
capability.

Proc sort
Proc sort sorts the observations in a dataset by some variables in either ascending or descending order. Forexample:

proc sort data=a1 out=a2;
by gender;
run;

The observations of dataset a1 are sorted in ascending order, by default, of the variable gender, and the sorted data is saved in a dataset named a2. Without the out=a2 option, the unsorted dataset named a1 will be replaced by the sorted dataset. You can also sort the observations in the descending order of some
variable by specifying the descending option in the by statement, e.g. by gender descending. If you need to sort by more than one variable, list all the variables in the by statement. For example, by gender age will sort in the ascending order by gender, and then the observations with the same gender value will be sorted in the ascending order by the values of age.

Proc means
This procedure produces simple univariate descriptive statistics for numeric variables. It also calculates confidence limits for the mean, and identifies extreme values and quartiles. Here’s an example for mean and its confidence limit calculation:

proc means data=a2 alpha=0.05 clm mean median n min max;
run;

The mean, median, sample size, minimal value, maximal value, and 95% confidence intervals will be computed for variables age and weight. The alpha option specifies the confidence level for the confidence limit, clm tells SAS to calculate the confidence interval of the mean. Since gender is a categorical variable, no mean will be computed for it.

If you have a lot of variables and you only want to calculate the mean for some of them, use the var option and list the variables after the keyword var. If you want the means of the variables by group, use the by option. For example,

proc means data=a2 alpha=0.05 clm mean;
var weight;
by gender;
run;

tells SAS to compute the mean and confidence interval of weight for each value of gender, i.e. male and female. If the by statement is used, the observations need to be sorted by the same variable before the

proc means procedure. Note data a2, the sorted dataset, was used in our proc means example.

Proc summary
It computes descriptive statistics on numeric variables in a SAS dataset and outputs the results to a new SAS dataset. The syntax of proc summary is the same as that of proc means. An example follows:

proc summary data=a2 print;
var weight;
by gender;
output out=3;
run;

Proc summary will not run without either the print option or the output statement.

Proc corr
This procedure is used for calculating the correlation between numeric variables. For example, the Pearson correlation coefficient and its P-value can be computed.

proc corr data=a1;
var age weight;
run;

A correlation coefficient matrix is created:
Pearson Correlation Coefficients, N = 7
Prob > |r| under H0: Rho=0
age weight
age 1.00000 -0.43017
0.3354
weight -0.43017 1.00000
0.3354

The correlation coefficient between age and weight in this example is -0.43017, and 0.3354 is the P-value for testing the null hypothesis that the coefficient is zero. In this case, the P-value is greater than 0.05, and the null hypothesis of zero coefficient cannot be rejected.

Proc glm
It performs simple and multiple regression, analysis of variance (ANOVA), analysis of covariance, multivariate analysis of variance, and repeated measures analysis of variance.

proc glm data=a1;
model weight=age;
output out=a3 p=pred r=resid;
run;

performs a simple linear regression with weight as the dependent variable and age the independent variable. The predicted values of weight (the dependent variable) and the residuals are saved in a new dataset called a3 using the output statement. For multiple regression where you have more than one independent
variable, simply list in the model statement all the variables on the right hand side of the equal sign with one space in between, e.g.

model weight=age height;

In the case of ANOVA, a class statement is needed for categorical variables before the model statement.The following code is an ANOVA analyzing the effect of gender on weight. It tests whether the weight is the same for females and males.

proc glm data=a1;
class gender;
model weight=gender;
run;

Proc reg

Proc reg is a procedure for regression. It is capable of more regression tasks than proc glm. It allows multiple model statements in one procedure, can do model selection, and even plots summary statistics and normal qq-plots.You can specify several PLOT statements for each MODEL statement, and you can specify more than one plot in each PLOT statement.

proc reg data=a1;
model weight=age;
plot weight*age;
plot predicted.*age;
plot residual.*age;
plot nqq.*residual.;
run;

In the above example, a simple regression is performed with weight as the response and age as the explanatory variable. The plot statements request four plots: weight versus age, predicted values of weight versus age, residuals versus age, and normal qq plot versus residuals. Predicted., residual., and nqq. are keywords that SAS recognizes. Make sure you keep a dot after the word.

SAS Introduction

What is SAS?
SAS is a statistical software package that allows the user to manipulate and analyze data in many different ways. Because of its capabilities, this software package is used in many disciplines (not just statistics!), including medical sciences,biological sciences, and social sciences.

After you open SAS, you will see three windows,
1.Program editor
2.Explorer window
3.Log Window

Also, there is an output window that is hidden until you actually have output.

Programe Editor
The program editoris where you will type the program that you will eventually run. It works almost exactly like Microsoft Word. (You can cut, paste, move the cursor, etc.)The enhanced program editor will give you color-coded procedures, statements, and options (more on these later) that will help you to find errors in your program before you even run it.

Explorer Window
With the explorer window, you can open\view data you have read into SAS.Click on libraries, then the work folder, and this will show you any datasets you have read into or created in SAS for that session.

Log Window
The log window will inform you of any errors in your program and the reason for the errors. This window is EXTREMELY IMPORTANT if you hope to figure out what is wrong with your program. Always check it first to see if your program ran properly! The output window is where, once you run your program from the program editor, your output appear. (Note: You can also cut, paste, etc. from
the log and output windows.)

Informatica PowerCenter 9.0 Mapping Analyst for Excel Guide

Some of the topics covered in this tutorial are:-

1.Understanding Mapping Analyst for Excel
2.Configuring one Mappings
3.Configuring Multiple Mappings
4.Importing and exporting Mapping Specifications
5.Customizing Mapping Specification Templates

Download Informatica PowerCenter 9.0 Mapping Analyst for Excel Guide

How to implement nested querey in INFORMATICA

You can use in Source Qualifier or Lookup SQL override depending on your requirements. I slightly changed SQL:

select nvl(a.RATING,'NULL') vRating
from W_MBS_AM_BLDG_INSP_HDR_F a
where (W_UPDATE_DT, inspection_date_wid) =(select max(w_update_dt), max(INSPECTION_DATE_WID
from W_MBS_AM_BLDG_INSP_HDR_F b
where b.PROJECT_WID = a.PROJECT_WID)

How to implement nested querey in INFORMATICA

Based on the syntax, I am assuming that you picked this code up from a pl/sql code block and trying to reuse this in Informatica.

While you can use the exact query in the source qualifier (after removing the "INTO " clause), I'd suggest you rewrite the same as follows for the query to be more efficient ,let the code speak for itself and to avoid so many references of the same table.

Select max(nvl(a.RATING,'NULL')) keep
(DENSE_RANK FIRST ORDER BY W_UPDATE_DT desc,
INSPECTION_DATE_WID desc) rating from W_MBS_AM_BLDG_INSP_HDR_F
where PROJECT_WID = vPROJECT_WID;

@Nick,

If they are interested in seeing the latest rating based on a given project very frequently, then a view would make sense. Hopefully that is what you meant....
Otherwise, creating a view just for this one case and that too without the project_wid in the view statement would be far less useful.

Thursday, September 16, 2010

Cognos 8 Business Intelligence installation and set up guide

This tutorial shows a few easy steps to get started with cognos 8 BI.
The easiest and quickest installation process goes as follows:

Install Cognos 8 BI Server and a service pack if available – this is the core of cognos BI system.
Choose a database server which will be used as a content store. Cognos 8 supports a wide range of database engines.
Start cognos configuration and set up a connection to the database which will serve as a content store. This is usually the most important option to set up.
Configuration fom Microsoft SQL Server as Cognos content store:

4. Adjust other options if necessary, save configuration and test the content store connection.
5. Start the cognos 8 service using Cognos Configuration
6. Now it is required to configure the Web server. Microsoft IIS web server is used in the following tutorial.

- Create the following virtual directories:
  cognos8 - c8_location/webcontent with read permission
  cognos8/cgi-bin - c8_location/cgi-bin with execute permissions
- It is recommended to use cognos8 as the alias name. Otherwise the Gateway URI option will need to be changed in cognos configuration.
- A sample IIS configuration is depicted below:
  Configuration of Cognos 8 on Microsoft IIS:

A correctly installed and configured Cognos 8 BI installation shows the following welcome page in a web browser on the http://localhost/cognos8/ web address.
Properly configured Cognos 8 instance:

Once the screen shown above appears, the IBM Cognos 8 BI is set up correctly and ready to use.
We suggest you start with publishing some Cognos examples using Framework Manager (gosalesdw - great outdoors sales data warehouse) and then play around with Cognos Query Studio, Analysis Studio and then Report Studio.

Siebel Configuration – An Overview

If Siebel could be visualized as human body then configuration has to be the heart of that body. Without configuration Siebel cannot work and configuration is what comprises of 70% of Siebel Development.

Siebel configuration is done using an IDE provided by Siebel i.e. Siebel Tools. If you want to be a successful Siebel professional then you need have strong configuration skills. Configuration is mostly based on concepts and facts. Once you have understanding of the concepts and facts properly then you wouldn’t face any problems. I would like to like to give example of computer networking here to explain the above statement.

“In networking a RJ-45 jack is required for UTP cable to connect to system. Once we have this knowledge it doesn’t change it doesn’t matter how many times you perform that task it is always going to remain same. Similar case is with Siebel configuration once you know how to create an Applet you are done. It doesn’t matter how many times you create and Applet the procedure will remain the same. Though there are several ways but they all work in predefined way and once you know them they will always be same.”

Siebel Configuration is a little different from Siebel scripting. Unlike scripting, in which everybody has his own unique ways to approach things Siebel configuration works on predefined set of rules that have been decided by Siebel. So you can be successful at Siebel configuration even if don’t have strong programming skills but as a downside Configuration and Scripting goes hand in hand. So, the chances that you will be working on configuration alone are pretty slim.

Siebel Configuration is one area which comprises of 50% of interview questions. So, when you want to appear for interview make sure you are up to date with your configuration concepts and fundamentals. Main responsibility of a Siebel configurator is to work on business layer of Siebel and implement business logic and business rules.

Hope this helps you to point into right direction.

Steps for Installing Informatica 6.2

Note: Default means Don't Change

1. Login as Administrator with the Administrators rights User Id & Password.
2. Install all Informatica clients, repository servers and work flow mangers.
3. Install Oracle 8i/9i. While installing give Global database a name and SID as ORCL.
4. Create a user at the Oracle.
5. Select * from all_users
Create user Infodev identified by Infodev default tablespace users temporary tablespace temp. Grant connect, resource, select any table, insert any table, update any table, delete any table to Infodev.

Configuring Repository Server:
6. Note down your Computer name or IP address
7. Open Repository management console. Then, right click on Informatica repository server, then add “New server registration” as your computer name.
8. Right click repository => add repository (Infodev)
9. Repository Configuration:
Database type: Oracle
Code page: default (Don't change)
Repository Name: Infodev
Connect String: ORCL
DB User:Infodev (Created User at the Oracle Backend)
DB password: Infodev

The rest of the network & configuration Tabs maintain as default.

10. Repository Server:
Server Port number: 5001
Administrator Password: Give Administrator Password
Minimum Port: 5002
11. Once you have given all this information, it will start creating Repository (OPB) tables at the Oracle user you have created.
12. Once the repository is created, it will create a local repository and an option will be provided by Informatica to convert the local repository to a Global repository.
13. Convert the local repository to Global repository.
14. Now go to services. There you will find Informatica Repository server service. Start that service by giving the Administrator User ID & password.

Informatica Server Configuration:
15. Go to Informatica Server set up. Install the Informatica Server and login as Administrator. Use Administrator's rights User Id & Password and Domain as your computer name.
16. In Server set up, enter all the License keys for platform key, Oracle key ODBC key in Key tab.
17. In the server tab, you will need to enter server name once the sever is created at workflow manger. However, leave that blank for the time being.
18. Enter the TCP/IP address or computer name in this place, the rest leave set as the default.
19. In repository Tab
Repository Name: Infodev
Repository User: Infodev
Repository password: Infodev
Repository host name: Computer name
Repository port number: 5001
20. Compatibility & Database TAB: default values
21. Configuration TAB: default values
22. Open Server Manager, click the Add folders to Repository Server Manager.

Steps for registering Server in Workflow manager:
23. In the workflow manger, connect to the repository.
24. Choose connection server.
25. Add register to new server.
26. Give the name as temp_server.
27. Now server dialog box appears.
28. Enter the following Info:
Host Name: Your Computer name/IP address (IP address is better)
Port Number: 4001(default)
Code page: default
Timeout: 60
27. Server Variables:
$PMRootDir: C:\Program Files\informatica\informatica PowerCenter 6.2-Server - all others should be set to the default values
28. Now go back to the Informatica configuration. There in Server tab enter the name of server as temp_server. Click OK & apply.
29. Go to services, start Informatica Service. Login as Administrator with the User Id & password.
30. In workflow manager, right click the server and select monitor.
31. Now th

HRP-2m Choromet

The Choromet is expected to be available from General Robotics in September, with price which is less then five grands. The Choromet is about 13-3/4 inches tall, and is capable of walking upright on two legs. Four companies in Japan have created a relatively low-cost, user-programmable humanoid robot targeting educational and research applications. The HRP-2m Choromet uses technology from Japan's National Institute of Advanced Industrial Science and Technology (AIST), and is user-programmable thanks to open software running on a user-space real-time Linux implementation. AIST hopes Choromet's ability to run software-based movement programs on a real-time Linux platform will enable researchers and schools to experiment with the effectiveness of humanoid robot motion pattern applications. The Choromet is based on several technologies developed by AIST, including A business-card sized SBC (single board computer) 240MHz SH-4 processor, 32MB of RAM, "ARTLinux," an operating system that provides a user-space real-time Linux environment. Humanoid motion application software based on OpenHRP (Humanoid Robotics Project) Some other Choromet features are: Triaxial force sensors on legs, Accelerometer and gyroscope in trunk, and real-time sensor feedback. More info at http://linuxdevices.com/news/NS8377820601.html

ASIMO Humanoid Robot

ASIMO (Advanced Step in Innovative MObility) is a bipadel humanoid robot from Honda . This robot has been evolving since its inception in 1986. Current version of ASIMO is 1.2 meter tall and weighs 43 Kg. This size enables ASIMO to actually perform tasks within the realm of a human living environment. It also walks in a smooth fashion which closely resembles that of a human being.
Advanced Walking Technology Predicted Movement Control (for predicting the next move and shifting the center of gravity accordingly) is combined with existing walking control know-how to create i-WALK (intelligent real-time flexible walking) technology, permitting smooth changes of direction. The latest updates on the ASIMO robot are available at Honda .
Following image gives the different version of this robot.

Wednesday, July 21, 2010

GET Yahoo ip while chatting

Friends we all know that it is impossible to get ip of victim while chatting in yahoo.

But buy using some sort of tools it is possible of get that.

u need 1) Yahoo messenger
2) Sharp IP getter
that it by them u can get the victim ip and sure i tried it and its working for me....

Using Dual Boot for Windows XP and Mac

First u need to install windows xp in one primary partition.
then diskpart cmd and change the boot loader.

> in cmd prompt type diskpart
diskpart > select disk 0
diskpart > list partition
diskpart > create partition primary id=af

it will create a partition in unknown format.
then install mac in that partition and try..

Hope u get the rest.

If any problem post me.

Informatica Server Configuring

Information that needs to be provided is as below:
Server Tab:
Server Name: The name of the PowerCenter Server to register with the repository. This name must be unique to the repository. This name must also match the name you specify when you use the Workflow Manager to register the PowerCenter Server.
TCP/IP Host Address: The TCP/IP host address as an IP number (such as 123.456.789.1), or a local host name (such as RECDB), or a fully qualified name (such as RECDB.INVOICE.COM). If you leave this field blank, the PowerCenter Server uses the default local host address.
Max No. of Concurrent Sessions: The maximum number of sessions the PowerCenter Server runs at a time. Increase this value only if you have sufficient shared memory. Default is 10.
Shared Memory: The amount of shared memory available for use by the PowerCenter Server Load Manager process. For every 10 sessions in Max Sessions, you need at least 2,000,000 bytes reserved in Load Manager Shared Memory. Default is 2,000,000 bytes.
Error Severity Level for Log Files: The level of error messages written to the PowerCenter Server log. Specify one of the following message levels:
Error. Writes ERROR code messages to the log.
Warning. Writes WARNING and ERROR code messages to the log.
Information. Writes INFO, WARNING, and ERROR code messages to the log.
Tracing. Writes TRACE, INFO, WARNING, and ERROR code messages to the log. Debug. Writes DEBUG, TRACE, INFO, WARNING, and ERROR code messages to the log.
Fail Session if Maximum Number of Concurrent Sessions Reached: Enable this option if you want the PowerCenter Server to fail the session if the number of sessions already running is equal to the value configured for Maximum Number of Concurrent Sessions. If you disable this option, the PowerCenter Server places the session in a ready queue until a session slot becomes available. This option is disabled by default.
Allow mapping/session debugging: If selected, you can run the Debugger. This option is enabled by default.
Time Stamp Workflow Log: Enable this option if you want to append a time stamp to messages written to the workflow log. This option is disabled by default.
Output to Event Log: Enable this option if you want to write PowerCenter Server messages to the Windows Event Log. This option is enabled by default.
Output to File: Enable this option if you want to write PowerCenter Server log messages to a file. When you enable this option, enter a file name for the PowerCenter Server log.

Repository Tab:
Repository Name : The name of the repository to connect to. You create a repository in the Repository Server Administration Console.
Repository User: The account used to access the repository. When you first create a repository, the Repository User is the database username. You create other Repository Users in the Repository Manager.
Repository Password : The password for the Repository User. When you first create a repository, the password is the password for the database user.
Repository Server Host Name: The name of the machine hosting the Repository Server.
Repository Server Port Number: The port number the Repository Server uses to communicate with repository client applications.
Repository Server Timeout: The maximum number of seconds that the PowerCenter Server tries to establish a connection to the Repository Server. If the PowerCenter Server is unable to connect to the Repository Server in the time specified, the PowerCenter Server shuts down. Default is 60 seconds.

Licenses Tab:
Enter the license Key’s and then click update

Compatibility and Database Tab:
PMServer 3.X aggregate compatibility: If selected, the PowerCenter Server handles Aggregator transformations as it did in PowerMart 3.x. This overrides both Aggregate treat nulls as zero and Aggregate treat rows as insert.
If you select this option, the PowerCenter Server treats nulls as zeros in aggregate calculations and performs aggregate calculations before flagging records for insert, update, delete, or reject in Update Strategy expressions. If you do not select this option, the PowerCenter Server treats nulls as nulls and performs aggregate calculations based on the Update Strategy transformation.
PMServer 6.X Joiner source order compatibility: If selected, the PowerCenter Server processes master and detail pipelines sequentially as it did in versions prior to 7.0. The PowerCenter Server processes all data from the master pipeline before starting to process the detail pipeline. Also, if you enable this option, you cannot specify the Transaction level transformation scope for Joiner transformations. If you do not select this option, the PowerCenter Server processes the master and detail pipelines concurrently.
Aggregate Treat Nulls as Zero: If selected, the PowerCenter Server treats nulls as zero in Aggregator transformations. If you do not select this option, the PowerCenter Server treats nulls as nulls in aggregate calculations.
Aggregate Treat Rows as Insert : If selected, the PowerCenter Server performs aggregate calculations before flagging records for insert, update, delete, or reject in Update Strategy expressions. If you do not select this option, the PowerCenter Server performs aggregate calculations based on the Update Strategy transformation.
PMServer 4.0 date handling compatibility: If selected, the PowerCenter Server handles dates as in PowerCenter 1.0/PowerMart 4.0. Date handling significantly improved in PowerCenter 1.5 and PowerMart 4.5. If you need to revert to PowerCenter 1.0 or PowerMart 4.0 behavior, you can configure the PowerCenter Server to handle dates as in PowerCenter 1.0 and PowerMart 4.0.
Treat CHAR as CHAR on Read: If you have PowerCenter Connect for PeopleSoft, you can use this option for PeopleSoft sources on Oracle. You cannot, however, use it for PeopleSoft lookup tables on Oracle or PeopleSoft sources on Microsoft SQL Server.
Max LKP/SP DB Connections: Allows you to specify a maximum number of connections to a lookup or stored procedure database when you start a workflow. If the number of connections needed exceeds this value, session threads must share connections. This can result in a performance loss. If you do not specify a value, the PowerCenter Server allows an unlimited number of connections to the lookup or stored procedure database.
If the PowerCenter Server allows an unlimited number of connections, but the database user does not have permission for the number of connections required by the session, the session fails. A default value is not specified.
Max Sybase Connections: Allows you to specify a maximum number of connections to a Sybase database when you start a session. If the number of connections required by the session is greater than this value, the session fails. Default is 100.
Max MSSQL Connections: Allows you to specify a maximum number of connections to a Microsoft SQL Server database when you start a workflow. If the number of connections required by the workflow is greater than this value, the workflow fails. Default is 100.
Number of Deadlock Retries: Allows you to specify the number of times the PowerCenter Server retries a target write on a database deadlock. Default is 10.
Deadlock Sleep Before Retry (seconds): Allows you to specify the number of seconds before the PowerCenter Server retries a target write on database deadlock. Default is 0 and the PowerCenter Server retries the target write immediately.

Configuration Tab
Data Movement Mode: Choose ASCII or Unicode. The default data movement mode is ASCII, which passes 7-bit ASCII character data. To pass 8-bit ASCII and multibyte character data from sources to targets, use Unicode mode.
Validate Data Code Pages: If you enable this option, the PowerCenter Server enforces data code page compatibility. If you disable this option, the PowerCenter Server lifts restrictions for source and target data code page selection, stored procedure and lookup database code page selection, and session sort order selection. This option is only available when the PowerCenter Server runs in Unicode data movement mode. By default, this option is enabled.
Output Session Log In UTF8: If you enable this option, the PowerCenter Server writes to the session log using the UTF-8 character set. If you disable this option, the PowerCenter Server writes to the session log using the PowerCenter Server code page. This option is available when the PowerCenter Server runs in Unicode data movement mode. By default, this option is disabled.
Warn About Duplicate XML Rows: If you enable this option, the PowerCenter Server writes duplicate row warnings and duplicate rows for XML targets to the session log. By default, this option is enabled.
Create Indicator Files for Target Flat File Output: If you enable this option, the PowerCenter Server creates indicator files when you run a session with a flat file target.
Output Metadata for Flat File Target: If you enable this option, the PowerCenter Server writes column headers to flat file targets. It writes the target definition port names to the flat file target in the first line, starting with the # symbol. By default, this option is disabled.
Treat Database Partitioning As Pass Through: If you enable this option, the PowerCenter Server uses pass-through partitioning for non-DB2 targets when the partition type is Database Partitioning. Enable this option if you specify Database Partitioning for a non-DB2 target. Otherwise, the PowerCenter Server fails the session.
Export Session Log Lib Name: If you want the PowerCenter Server to write session log messages to an external library, enter the name of the library file.
Treat Null In Comparison Operators As: Determines how the PowerCenter Server evaluates null values in comparison operations. Enable one of the following options:
a)Null. The PowerCenter Server evaluates null values as null in comparison expressions. If either operand is null, the result is null. This is the default behavior.
b)High. The PowerCenter Server evaluates null values as greater than non-null values in comparison expressions. If both operands are null, the PowerCenter Server evaluates them as equal. When you choose High, comparison expressions never result in null.
c)Low. The PowerCenter Server evaluates null values as less than non-null values in comparison expressions. If both operands are null, the PowerCenter Server treats them as equal. When you choose Low, comparison expressions never result in null.
WriterWaitTimeOut: In target-based commit mode, the amount of time in seconds the writer remains idle before it issues a commit when the following conditions are true:
a)The PowerCenter Server has written data to the target.
b)The PowerCenter Server has not issued a committed.
The PowerCenter Server may commit to the target before or after the configured commit interval. Default is 60 seconds. If you configure the timeout to be 0 or a negative number, the PowerCenter Server defaults to 60 seconds.
Microsoft Exchange Profile: Microsoft Exchange profile used by the Service Start Account to send post-session email. The Service Start Account must be set up as a Domain account to use this feature.
Date Display Format: If specified, the PowerCenter Server validates the date display format and uses it in session log and server log entries. If the date display format is invalid, the PowerCenter Server uses the default date display format. The default date display format is DY MON DD HH 24:MI:SS YYYY. When you specify a date display format, it displays in the test window. An invalid date display format is marked invalid.
Test Formatted Date: Read-only field that displays the current date using the format selected in the Date Display Format field.

JVM Options Tab:
You can configure JVM options if you run Java-based programs with PowerCenter Connect products, such as PowerCenter Connect for JMS or PowerCenter Connect for webMethods.

HTTP Proxy Tab:

Server Name: Name of the HTTP proxy server.
Server Port: Port number of the HTTP proxy server.
Username: Authenticated user name for the HTTP proxy server. This is required if the proxy server requires authentication.
Password: Password for the authenticated user. This is required if the proxy server requires authentication.
Domain: Domain for authentication.

Once you have completed configuring the Repository Server and Power Center Server – Login to the Repository Server Admin Console and create a new Repository

Once the Repository is created (to create the repository a repository schema needs to be created in the database server); the Repository owner information can be used to Login to Designer, Workflow Manager and Workflow Monitor.

The next few posts will cover the following:
a)How to create Repository User accounts and managing security in Informatica
b)Create a mapping , session, workflow and successfully execute a workflow
c)Version Control in Informatica

d)How to debug mappings in Informatica.