Tuesday, June 11, 2013

Dataset Handling in Unix with Orchadmin


                DataStage stores data in persistent internal (specific to DataStage) format in the form of Data sets. Orchestrate Data set aid in the parallel processing of data and are much faster performance wise. They help in achieving end-to-end parallelism by writing data in partitioned  form and maintaining the sort order. Orchestrate Data set consists of one or more data  files stored on multiple processing nodes. A parallel data set is represented on disk by: 

• A  single descriptor  file  - defines  the  record  schema of  the data  set and  the  location  of  all  data  files  in  the  set.  It  does  not  contain  the actual data. 

• Data  files  (which  contain  the  actual  data)  located  on  one  or more processing nodes.

Orchadmin Utility 

This  is an Orchestrate Administrator Utility.  It  can perform  operations on Data sets which cannot be performed by normal UNIX file commands. The basic syntax is: 

orchadmin [command] [options] [descriptor_files] 

Commands 

The  various  commands  that  are  available  with  orchadmin  are  dump, delete, truncate, copy and describe. 

Dump Command 

This  command  can  be  used  to write  records  from  a  given  data  set  onto standard output or can be redirected to a sequential file. The syntax is: 

Syntax :

orchadmin dump [options] descriptor_file 

If  no  option  is  specified,  all  the  records will  be  returned  to  the  standard output.

ex 1)orchadmin dump test.ds
    2)orchadmin dump test.ds>temp.txt

In the second example temp.txt file will contain data present in test.ds

Dump Command with  -field name option will print Single column data fro Dataset .

Note:There are still more options available with dump like -x,-n ,-name,-skip.


Delete Command  

rm deletes only  descriptor file and the actual data  is  not  deleted  as  it  is  present  in  the  data  files  which  reside  on  the processing  nodes.To remove  the  persistent  data  from  the  data  sets  the conventional approach is the use of Data set management in data stage Designer.
  
Orchadmin  utility  simplifies  the  whole  process  by  providing  the
delete command. The syntax is: 

Syntax :

orchadmin delete | del | rm [-option] ds_1 ... ds_N 

ex :
orchadmin delete test.ds

Describe Command  

This command outputs a report about the datasets specified. The syntax is: 

orchadmin describe [-options] descriptor_file 

ex-:
orchadmin describe test.ds

Copy Command

This  command  can  be  used  to  create  an  identical  dataset  with  the  same column definition and number of records. Orchadmin  copy  command  can  be  used  to  take  backups  of  existing datasets.  
  
Syntax :
orchadmin copy | cp  source-ds  target-ds 

ex:
orchadmin copy temp.ds temp_target.ds

Note:1)If  one  uses  the  UNIX  cp  command  then  only  the  descriptor  file  is copied, and these descriptor  files point to the same data  files residing in the processing nodes.

2)Type orchadmin on command prompt to get help information about this command

0 comments:

 

Datastage Doctrina Copyright © 2011 -- Template created by O Pregador -- Powered by Blogger

Receive all updates, tips and tricks via Facebook. Just Click the Like Button Below