Newsgroups: comp.databases.informix
Subject: Splitting string on blanks in 4GL


From: shj@dknet.dk (Stig Jacobsen)
Date: 14 Aug 1995 17:12:48 GMT

Howdy,

I have a bunch of strings which looks like "500 328 729", that
I'd like to seperate by whitespace and store into integer
variables - basically like:

   define str char(100)
   define counts array[3] of integer

   select value from table into str
   let counts = split(str, ...)

.. after which counts[1] equals 500, counts[2] is 328, etc.

What is the best way to do this?

I thought of unloading str to an ascii file, using run to call
a shell script or somesuch which would massage the unload file
and produce a load file, which is the loaded into a suitable
table created on the fly. Pretty horrible thought, yes? I haven't
tried it yet, but I expect it to be a lot slower than my users
would appreciate.

Any input is welcomed.

I'm running INFORMIX-4GL Version 4.11.UC2 on Solaris 2.3.
--
Stig Jacobsen / shj@dknet.dk		           http://www.dknet.dk/~shj


From: Nils.Myklebust@CCMAIL.telemax.no
Date: 14 Aug 1995 14:28:03 -0400

I know of no way the above syntax can work in 4GL, but why not simply:
      let counts[1] = str[1,3]
      let counts[2] = str[5,7]
      let counts[3] = str[9,11]
if your input data is as well formed as you indicate.

If not a for loop will do it with variables for str[p_start,p_end]
possibly with a whenever any error continue and test on status
or perhaps better, tests inside the for loop on every character
in str.
If you don't have millions of rows with these data even the for
loop approach should be reasonably fast.

Nils.Myklebust@ccmail.telemax.no
NM-data, Dalsbergstien 7, N-0170 Oslo, Norway
My opinions are those of my company


From: perryd@fourgen.com (Perry Dillard)
Date: 14 Aug 1995 17:03:03 -0400

I must warn you that the Informix substring manipulation is
_exteremely_ expensive in the compiled version.  I'd think
about writing a routine in C to split your data.

-perryd (Guru)

====================================================================
= Perry Dillard - CASE Tools Development Team Leader               =
= FourGen Software, Inc.                                           =
= FourGen Building                                                 =
= 115 NE 100th Street                                              =
= Seattle, W.A.  98125-8098                                        =
=                                                                  =
= e-mail: perryd@fourgen.com                                       =
====================================================================


From: steinar@balder.no (Steinar O. Cook)
Date: 14 Aug 1995 17:28:04 -0400

An even better idea would be to use strtok(3c):

1) Write a C function called c_strtok() which basically is only an
   interface to the strtok(3C) call.

2) Now you can do like this:
	
	LET i=1
	WHILE TRUE
		LET array[i] = c_strtok(str)
		IF (array[i] IS NULL) THEN
			EXIT WHILE
		END IF
		LET i = i + 1
		IF (i > max_no_of_elements) THEN
			MESSAGE "Too many numbers ...."
			EXIT WHILE
		END IF
	END WHILE

--
Steinar Overbeck Cook
Balder Programvare AS


From: Nils.Myklebust@CCMAIL.telemax.no
Date: 15 Aug 1995 11:10:26 -0400

A good point, but we use it a lot without any performance problems.
Even things that are "expensive" is often masked completely by
database access.

Most of the time we find 4GL programs easier to write than
C code. Also we want as little C code as possible to avoide problems
should we decide to use RDS. For future compiler technologies from
Informix we do however see it as important that they optimise better
so these problems disappear. I you look at the C-code generated it
can't be fast doing anything.

An example of something that is too expensive is if you use a lot
of sum/group sum in a report. You may see runtimes where these
calculations take most of the time to finish the program.

Nils.Myklebust@ccmail.telemax.no
NM-data, Dalsbergstien 7, N-0170 Oslo, Norway
My opinions are those of my company


From: jparker@hpbs3645.boi.hp.com (Jack Parker)
Date: 15 Aug 1995 19:40:02 -0400

This is an obvious candidate for a 'c' function.  Does one such exist already?
Anyone want to write one?  I'm too busy and my 'c' code stinks, but will do it
if there are no other takers.  I can think of two approachs at the moment,

1 - chop up the entire string and then worry about popping off the stack
upon return to 4gl.

2 - split on the first delimiter and return the first and remainder field,
the caller then calls as many times as necessary.

cheers
j.
_____________________________________________________________________________
Jack Parker - Hewlett Packard, BSMC Boise, Idaho, USA
jparker@hpbs3645.boi.hp.com 
_____________________________________________________________________________

        	Discover America, get lost on a rally.
_____________________________________________________________________________
   Any opinions expressed herein are my own and not those of my employers.
_____________________________________________________________________________


From: marco greco <mar.greco@agora.stm.it>
Date: 16 Aug 1995 09:54:09 GMT

My Lit.32 on the subject: one of the few c routines in 4glWorks

#include <string.h>

int get_token();

get_token()
{
    int i;
    char *p, *q;
    char bf[132];

    popquote(bf, 132);
    p=bf;
    if (!*p)
    {
        retquote("");
        retquote("");
        return(2);
    }
    i=strspn(p, " \t");
    if (i==132)
    {
        retquote("");
        retquote("");
        return(2);
    }
    p=p+i;
    i=strcspn(p, " \t");
    q=p+i;
    *q=0;
    q++;
    i=strspn(q, " \t");
    q=q+i;
    retquote(p);
    retquote(q);
    return(2)
}

the following 4gl code demonstrates its use:

define i integer,
       val array[20] of char(20),
       str char(100),

let i=1
while (str is not null) and (i<21)
    call get_token(str) returning val[i], str
    let i=i+1
end while

cheers, 
marco

........................................................................
                        tear along dotted line!

marco greco	mar.greco@agora.stm.it

Work		Achea Srl		tel	39 95 503117 / 447828 Pbx
		Catania, Italy		fax	      446558


From: Mark.Denham@bbc.co.uk (Mark Denham)
Date: 16 Aug 1995 08:28:03 -0400

I have a function, called as follows:

CALL split_string(str, delimstr, returns, strict) RETURNING stat, strpart1, 
strpart2....

WHERE:

str  - String you want to break up
delimstr  - Delimiter string
returns   - Integer indicating how many STRING variables you want returned.
strict    - Boolean TRUE/FALSE. When true, the function returns a stat of -1 
if the
        string to split does not contain EXACTLY returns sub-strings in it.

On the return side, stat is always returned along with the number of strings 
requested.

I will did it out if anyone is interested.

Mark Denham
BBC
London, UK
Mark.Denham@bbc.co.uk


From: Mark.Denham@bbc.co.uk (Mark Denham)
Date: 22 Aug 1995 18:13:03 -0400

By popular demand,

here is the string splitting function that I use.

Usage
call split_string(src, delim, return, strict) returning stat, str1..strn

src      - Is string to split, max 5k
delim - Field separator
return - No. of strings to return (max).
strict   - Set status to -ve value if the no. of strings in src is not the 
same as return.

Note: Get rid of any bugs yourself! Only kidding.

Mark Denham
BBC
London, UK
Mark.Denham@bbc.co.uk

--------------------------- Cut here --------------------------------------

/*  c source *}
{*************************************************************************
*
*  $Author$
*
*  $Date$
*
*  $Revision$
*
*  Doc Refs:
*
*  Purpose:  Allows an RDS program to split a string using a delimiter
*        given by the user.
*
*  Usage:    split_string  string    String to split.
*                delimiter  Delimiter string.
*                returns    Max no. of strings to return.
*                      When there are less strings in
*                      source than returns and strict is
*                      FALSE, null strings are returned
*                      for the remaining values.
*                strict    If set causes the routine to return
*                      a failure status if the number of
*                      elements in string does not match
*                      the number specified by returns.
*
*        where:
*
*        string    char[5120]    Null terminated string to split.
*        delimiter  char[10]    Null terminated delimiter string.
*        returns    smallint    Max. no. of strings to return.
*        strict    integer      TRUE/FALSE. Causes function to
*                      fail when no. of strings found
*                      does not match returns.
*
*  Returns:  stat    integer      0 for ok, -ve as below:
*                      -1 = Too few strings in source
*                      -2 = retcnt > MAX_STRINGS
*                      -3 = Source contains more strs
*        str[0..n]  string      As many strings as required.
*
*  Library
*  Functions:
*
*  Notes:    WHEN USING RDS.
*        This function is MUST be linked with the RDS runner and debugger
*        using the cfglgo and cfgldb commands. Refer to the Interactive
*        debugger manual for details.
*        If you have a NULL value separated by 2 delimiters,
*        fred||john|...
*        This routine will not produce the result you expect!
*
*  Modification Log
*============================================================================
*
*  $Log$
*
****************************************************************************  
*}
/*

*/

int dummy_string_split() {

  static char *rcsid = "@(#)$Header: $";

}


#include <stdio.h>
#include <stdlib.h>

#define MAX_STRINGS      320

#define MAX_DELIM_LEN    10

#define MAX_STR_SIZE    5120

int split_string(numargs)
int numargs;
{
  char *strtok();

  char src_string[MAX_STR_SIZE+1];

  char delim[MAX_DELIM_LEN+1];

  char *strlist[MAX_STRINGS];

  char *nullstr = "",
      *str;

  int strict = 0,
    retcnt = 0,
    rval = 0,
    stat = 0,
    curridx = 0,
    idx = 0;

  if( numargs != 4 ) {      /* This is a problem and cannot easily */
    rval=-1;          /* 4GL will most likely produce an error*/
    retint(rval);        /* indicating that the no. of returned */
    return(1);          /* values is incorrect....*/
  }

  /* Read parameter list */

  popint(&strict);
  popint(&retcnt);
  popvchar(delim, MAX_DELIM_LEN);
  popvchar(src_string, MAX_STR_SIZE);

  /* Check that the no. strings required does not exceed max allowed */

  if( retcnt <= MAX_STRINGS ) {
    str = src_string;

    /* Split up string, fail if srtict and end of source string reached */

    for(; curridx < retcnt; curridx++) {
      strlist[curridx] = strtok(str, delim);
      str = NULL;
      if( strict && strlist[curridx] == NULL ) {
        rval = -1;
        break;
      }
    }
  }
  else
    rval = -2;

  if( rval == 0 ) {
    if( strict ) {
      /* See if the source string is empty, if not fail */

      if(strtok(str, delim) != NULL )
        rval = -3;
    }
  }

  /* Return extraction status */

  retint(rval);

  /* Return all filled in values, may help debugging when an error occurs */

  for(idx=0; idx < curridx; idx++ ) {
    retvchar(strlist[idx]);
  }

  /* Now do rest of strings, if any */

  while(idx < retcnt) {
    retvchar(nullstr);
    idx++;
  }

  return(idx+1);
}
