From: walt@mathcs.emory.edu (Walt Hultgren {rmy})
To: stetzer@convex.csd.uwm.edu
Subject: dbftosas
Date: Thu, 6 May 93 15:23:07 -0400

Frank,

I saw a posting on Usenet that mentioned a perl program dbftosas that you
have that will convert dBase files to an SAS format.  Since the article
said that dbftosas converts the DBF files to intermediate ASCII files,
I thought that I might be able to convert your program to load Informix
tables directly from DBF files.

To that end, I'd like a copy of your program if you're letting it out.

Thanks in advance,

Walt.

-- 
Walt Hultgren              Internet: walt@rmy.emory.edu       (IP 128.140.8.1)
Emory University               UUCP: {...,gatech,rutgers,uunet}!emory!rmy!walt
954 Gatewood Road, NE        BITNET: walt@EMORY
Atlanta, GA  30329  USA       Voice: +1 404 727 0648


From: Frank Stetzer <stetzer@convex.csd.uwm.edu>
To: walt@mathcs.emory.edu
Subject: Re:  dbftosas
Date: Thu, 6 May 93 14:26:36 -0500

Here is the program.  I'm not a perl whiz (or a dbf whiz) but it
has worked for me with dbf files from several sources.
I have appended a "man page" to the end of the file, for reference.
--
Frank Stetzer

#!/usr/bin/perl 
#
#   This program reads a dbf file (somename.dbf) and outputs two ascii
#     files.  The first (somename.sas) contains the "infile", "format" ,
#     and "input" statements of a SAS data step.  The second (somename.dat)
#     contains the raw data to be read by the  SAS program.
#                                                --
#                                                Frank Stetzer, CSD
#                                                August 12, 1992
# ------------------------------------------------------------------------
#
#   First check whether we are big or little endian
#
$a=ord(pack("I",0xdeadbeef));
if ($a == 0xef) {$endian="LITTLE";}
elsif ($a == 0xde) {$endian="BIG";}
else {die "Can't determine the byte order of this computer\n";}
print "This is a $endian\-ENDIAN machine\n";
#
if ($#ARGV == -1) {die "Usage:  dbftosas file.dbf\n (outputs file.sas and file.dat)\n";}
#
$indbf=$ARGV[0];
if (substr($indbf,length($indbf)-4) ne '.dbf') {
  $froot=$indbf;
  $indbf=$froot.'.dbf';
}
else {$froot=substr($indbf,0,length($indbf)-4);}

open (IN,"$indbf")||die "file not found\n";
binmode(IN);
#
#   these are the output files 
#
$outvar=$froot.'.sas';
$outdat=$froot.'.dat';
open (VAR,">$outvar");
open (DAT,">$outdat");
#
print VAR "data one;\n";
#
#  read the initial dbf header line and interpret
#
read(IN,$buf,32);
($vn,$year,$month,$day,$nr,$hs,$lr,$junk)=
                    unpack("A1 c1 c1 c1 a4 a2 a2 A20",$buf);
if ($endian eq "BIG") {
  $rn=reverse($nr);
  $sh=reverse($hs);
  $rl=reverse($lr);
}
elsif ($endian eq "LITTLE") {
  $rn=$nr;
  $sh=$hs;
  $rl=$lr;
}
$nrec=unpack("i",$rn);
$heads=unpack("s",$sh);
$lenrec=unpack("s",$rl);
print VAR "   infile \'$outdat\' lrecl=",$lenrec-1," pad;\n";
print "Date of dbf file=$month/$day/$year\n";
print "Number of records=$nrec\n";
print "Length of header=$heads\n";
print "Length of records=$lenrec\n";
$nfields=int($heads/32)-1;
print "Number of fields=$nfields\n";
#
#   Read the header fields for each dbf variable
#     Store the info in arrays @name, @flen, @type, @fds, and @fdec
#
$nv=0;
$nlong=0;
for ($i=1;$i<=$nfields;$i++) {
   read (IN,$buf,32);
   ($name,$type,$fds,$flen,$fdec,$junk)=unpack("A11 A1 a4 c1 c1 A14",$buf);
   if ($endian eq "BIG") {$sdf=reverse($fds);}
   elsif ($endian eq "LITTLE") {$sdf=$fds;}
   $fds=unpack("i",$sdf);
   $name[$i]=$name;
   if (length($name) >8) {$nlong++;}
   $flen[$i]=$flen;
   $type[$i]=$type;
   $fds[$i]=$fds;
   $fdec[$i]=$fdec;
}
#
#  Check all variable names for length.  If longer than 8 characters,
#   reduce length by (first), eliminating any of -, _, *, A, E, I, O, or U
#   from the end of the variable forward down to 8 characters.  If the
#   resulting name is not unique, make a name "VDBn" for n=1,2,...
#
if ($nlong >0){
   print "$nlong field names are longer than 8 characters:\n";
   for ($i=1;$i<=$nfields;$i++) {
      if (length($name[$i]) > 8) {
         $name1=$name[$i];
         $l=length($name1);
         $eman=reverse($name1);
         foreach (9..$l) {$eman =~ s/[\-\_\*AEIOU]//;}
         $name=reverse($eman);
          if (length($name) >8) {$name=substr($name,0,8);}
          if ($i > 1) {
          for ($j=$i-1;$j>=1;$j--) {
             if ($name[$j] eq $name) {
               $nv++;
               $lnv=5-length("$nv");
               $name="VDB" . "0" x $lnv . $nv;
               $last;
             }
           }
         }
         print "Old name:  $name1     New name:    $name\n";
         $name[$i]=$name;
      }
   }
}
#
#   Put out the SAS format statement
#
print VAR "   format ";
for ($i=1;$i<=$nfields;$i++) {
    $j=int(($i-1)/3)*3 - ($i-1);
    if ($j == 0) {printf VAR "%4s","\n    ";}
    printf VAR "%10s",$name[$i];
    $f=" %10s ";
    if ($type[$i] eq "C") {printf VAR $f,' $'.$flen[$i].".";}
    if ($type[$i] eq "N") {printf VAR $f,' '.$flen[$i].'.'.$fdec[$i];}
    if ($type[$i] eq "D") {printf VAR $f," MMDDYY8.";}
}
print VAR ";\n";
# 
#  put out the SAS input statement
#
print VAR "   input";
for ($i=1;$i<=$nfields;$i++){
   $j=int(($i-1)/3)*3 - ($i-1);
   if ($j == 0) {printf VAR "%4s","\n    ";}
   if ($type[$i] eq "D") {
     $range='@'.$fds[$i]." ".$name[$i]. " YYMMDD".$flen[$i].". "; 
     printf VAR "%25s",$range;
     next;
   }
   printf VAR "%10s" ,$name[$i];
   printf  VAR " ";
   if ($type[$i] ne "N") {print VAR "\$ ";}
   else {print VAR "  ";}
   $lb=$fds[$i];
   $ub=$lb+$flen[$i]-1;
   if ($flen[$i] > 1) {$range=$lb.'-'.$ub;}
    else {$range=$lb;}
   printf VAR "%12s",$range;
}
print VAR ";\n";
close VAR;
print "\nFile $outvar written\n";
#  
#  Read the next character; should be a carriage return "\r"
#
read(IN,$del,1);
if ($del ne "\r") {die "Out of Sync\n";}
#
#   Put out the data file...a straight copy with line feeds added
#
for ($i=1;$i<=$nrec;$i++) {
   read(IN,$del,1);
   read(IN,$buf,$lenrec-1);
   if ($del ne '*') {print DAT "$buf\n";}
   else {print "Record $i deleted\n";}
}
close DAT;
print "File $outdat written\n";
##########################end of program dbftosas#############################


DBFTOSAS(1)          UWM Local Documentation          DBFTOSAS(1)



NAME
     dbftosas - Convert dbf files from a PC for input to SAS

SYNOPSIS
         dbftosas somefile.dbf

DESCRIPTION
     Dbftosas processes a "dbf" file (e.g. somefile.dbf), pro-
     duced by dBase or another PC-based database program, and
     produces two output files beginning with the same file name
     and ending in the extensions .dat and .sas (e.g.
     somefile.dat and somefile.sas).  The latter contains a
     skeletal SAS data step to read the former, defining variable
     names, data types, and formats. This file can be enhanced by
     the user to include variable labels, missing value declara-
     tions, permanent SAS libraries, etc.  Dbftosas properly han-
     dles numeric, string, and date data types and shortens long
     field names to eight unique characters. A summary of the
     number of fields and cases processed, field names changed,
     etc., is written to STDOUT.

     The SAS program file can be modified by the user into an
     SPSS program file since the data description and input
     operations have similar syntax.

DIAGNOSTICS
     Dbf files contain few distinguishing markers.  It is possi-
     ble to run dbftosas on a damaged or bad dbf file without
     detecting any problem.  Be sure to examine the output sum-
     mary for any anomolies.  If the program detects a problem,
     an error message "Out of Sync" will be printed and the pro-
     gram will stop.

     Dbf files contain binary fields; therefore they must be
     uploaded from the PC with the file transfer program (e.g.
     kermit, ftp) in binary mode.  This is one possible source of
     difficulties.

BUGS
     Some database programs allow strings to be entered into
     fields declared as numeric. Also, the numeric formats speci-
     fied in the data base may not be appropriate for the data.
     Dbftosas does not check for correspondence between data
     declaration and actual data; these inconsistencies will gen-
     erate errors when the SAS program is run.

     If problems are encountered, send mail to "help".

AUTHOR
     Frank Stetzer
     CSD




Printed 11/10/92         August 12, 1992                        1
