I want to identify records in my transaction dataset which have
desired City Names .
My first dataset A1 has 1 column contains few thousand city names.
My second dataset A2 is transaction level data, there is 1 string
field X that may have the City Name. Some sample contents in field
X :
PAYPAL *THETOYCAVE 40293577 CREDIT
CDN TIRE STORE #000986 SMITH FALL CREDIT
WAL-MART SUPERCENTE SCARBOROUGH CREDIT
INEX VENTURES INC 80047031 CREDIT
I want to find all records in A2 which has a hit of any city name in
A1.
I want to know how do you usually solve this kind of problem?
|
|
0
|
|
|
|
Reply
|
wuhaihong (6)
|
1/11/2010 6:44:20 PM |
|
'Easiest' is a multifacted verb. Easy in terms of understanding,
processing, replication, and/or something else. One way of doing what you
want would be something like the following:
/*Build city test file*/
data cities;
informat city $50.;
input city &;
cards;
SMITH FALL
SCARBOROUGH
Toronto
Richmond Hill
;
/*Build transaction test file*/
data transactions;
informat transaction $100.;
input transaction &;
cards;
PAYPAL *THETOYCAVE 40293577 CREDIT
CDN TIRE STORE #000986 SMITH FALL CREDIT
CDN TIRE STORE #000986 Richmond Hill CREDIT
WAL-MART SUPERCENTE SCARBOROUGH CREDIT
INEX VENTURES INC 80047031 CREDIT
;
data want (drop=i j _: city);
array _cities (99999) $50.;
do until (eof1); /*Read cities into an array*/
set cities end=eof1;
i+1;
_cities(i)=city;
end;
do until (eof2); /*Read each transaction*/
set transactions end=eof2;
call missing(match);
do j=1 to i; /*Check if any city is in transaction file*/
if index(upcase(transaction),trim(upcase(_cities(j)))) ge 1 then do;
match=1;
j=i;
end;
end;
output;
end;
run;
HTH,
Art
--------
On Mon, 11 Jan 2010 10:44:20 -0800, aSaSnUt <wuhaihong@GMAIL.COM> wrote:
>I want to identify records in my transaction dataset which have
>desired City Names .
>My first dataset A1 has 1 column contains few thousand city names.
>My second dataset A2 is transaction level data, there is 1 string
>field X that may have the City Name. Some sample contents in field
>X :
>
>PAYPAL *THETOYCAVE 40293577 CREDIT
>CDN TIRE STORE #000986 SMITH FALL CREDIT
>WAL-MART SUPERCENTE SCARBOROUGH CREDIT
>INEX VENTURES INC 80047031 CREDIT
>
>I want to find all records in A2 which has a hit of any city name in
>A1.
>I want to know how do you usually solve this kind of problem?
|
|
0
|
|
|
|
Reply
|
art297 (4237)
|
1/11/2010 8:52:13 PM
|
|
Before the language police come after me I, of course, meant adjective not
verb.
Art
--------
On Mon, 11 Jan 2010 15:52:13 -0500, Arthur Tabachneck <art297@NETSCAPE.NET>
wrote:
>'Easiest' is a multifacted verb. Easy in terms of understanding,
>processing, replication, and/or something else. One way of doing what you
>want would be something like the following:
>
>/*Build city test file*/
>data cities;
> informat city $50.;
> input city &;
> cards;
>SMITH FALL
>SCARBOROUGH
>Toronto
>Richmond Hill
>;
>
>/*Build transaction test file*/
>data transactions;
> informat transaction $100.;
> input transaction &;
> cards;
>PAYPAL *THETOYCAVE 40293577 CREDIT
>CDN TIRE STORE #000986 SMITH FALL CREDIT
>CDN TIRE STORE #000986 Richmond Hill CREDIT
>WAL-MART SUPERCENTE SCARBOROUGH CREDIT
>INEX VENTURES INC 80047031 CREDIT
>;
>
>data want (drop=i j _: city);
> array _cities (99999) $50.;
> do until (eof1); /*Read cities into an array*/
> set cities end=eof1;
> i+1;
> _cities(i)=city;
> end;
> do until (eof2); /*Read each transaction*/
> set transactions end=eof2;
> call missing(match);
> do j=1 to i; /*Check if any city is in transaction file*/
> if index(upcase(transaction),trim(upcase(_cities(j)))) ge 1 then do;
> match=1;
> j=i;
> end;
> end;
> output;
> end;
>run;
>
>HTH,
>Art
>--------
>On Mon, 11 Jan 2010 10:44:20 -0800, aSaSnUt <wuhaihong@GMAIL.COM> wrote:
>
>>I want to identify records in my transaction dataset which have
>>desired City Names .
>>My first dataset A1 has 1 column contains few thousand city names.
>>My second dataset A2 is transaction level data, there is 1 string
>>field X that may have the City Name. Some sample contents in field
>>X :
>>
>>PAYPAL *THETOYCAVE 40293577 CREDIT
>>CDN TIRE STORE #000986 SMITH FALL CREDIT
>>WAL-MART SUPERCENTE SCARBOROUGH CREDIT
>>INEX VENTURES INC 80047031 CREDIT
>>
>>I want to find all records in A2 which has a hit of any city name in
>>A1.
>>I want to know how do you usually solve this kind of problem?
|
|
0
|
|
|
|
Reply
|
art297 (4237)
|
1/11/2010 11:01:21 PM
|
|
On Jan 11, 11:44=A0pm, aSaSnUt <wuhaih...@gmail.com> wrote:
> I want to identify records in my transaction dataset which have
> desired City Names .
> My first dataset A1 has 1 column contains few thousand city names.
> My second dataset A2 is transaction level data, there is 1 string
> field X that may have the City Name. Some sample contents in field
> X :
>
> PAYPAL *THETOYCAVE =A0 =A0 =A0 40293577 CREDIT
> CDN TIRE STORE #000986 SMITH FALL CREDIT
> WAL-MART SUPERCENTE SCARBOROUGH =A0 CREDIT
> INEX VENTURES INC =A0 =A0 =A0 =A080047031 CREDIT
>
> I want to find all records in A2 which has a hit of any city name in
> A1.
> I want to know how do you usually solve this kind of problem?
data a1;
input cityname $;
datalines;
delhi
mumbai
chennai
;
run;
data a2;
length x $20;
input x $;
datalines;
cbdsabndlkcdelhiflvnv
fvbdjkvbwdklv
svfcjehmumbaivndl
mumbai
;
run;
proc sql;
select a2.* from a1,a2
where index(a2.x,trim(a1.cityname)) >0;
quit;
|
|
0
|
|
|
|
Reply
|
Kulpreet
|
1/12/2010 6:41:24 AM
|
|
On Jan 11, 11:41=A0pm, Kulpreet Khanna <kulpreetkha...@gmail.com> wrote:
> On Jan 11, 11:44=A0pm, aSaSnUt <wuhaih...@gmail.com> wrote:
>
> > I want to identify records in my transaction dataset which have
> > desired City Names .
> > My first dataset A1 has 1 column contains few thousand city names.
> > My second dataset A2 is transaction level data, there is 1 string
> > field X that may have the City Name. Some sample contents in field
> > X :
>
> > PAYPAL *THETOYCAVE =A0 =A0 =A0 40293577 CREDIT
> > CDN TIRE STORE #000986 SMITH FALL CREDIT
> > WAL-MART SUPERCENTE SCARBOROUGH =A0 CREDIT
> > INEX VENTURES INC =A0 =A0 =A0 =A080047031 CREDIT
>
> > I want to find all records in A2 which has a hit of any city name in
> > A1.
> > I want to know how do you usually solve this kind of problem?
>
> data a1;
> input cityname $;
> datalines;
> delhi
> mumbai
> chennai
> ;
> run;
>
> data a2;
> length x $20;
> input x $;
> datalines;
> cbdsabndlkcdelhiflvnv
> fvbdjkvbwdklv
> svfcjehmumbaivndl
> mumbai
> ;
> run;
>
> proc sql;
> select a2.* from a1,a2
> where index(a2.x,trim(a1.cityname)) >0;
> quit;
This won't work, first,You'll get this:
NOTE: The execution of this query involves performing one or more
Cartesian product joins that
can not be optimized.
NOTE: No rows were selected.
Second, my dataset is huge.
|
|
0
|
|
|
|
Reply
|
aSaSnUt
|
1/12/2010 4:41:03 PM
|
|
On Jan 11, 1:44=A0pm, aSaSnUt <wuhaih...@gmail.com> wrote:
> I want to identify records in my transaction dataset which have
> desired City Names .
> My first dataset A1 has 1 column contains few thousand city names.
> My second dataset A2 is transaction level data, there is 1 string
> field X that may have the City Name. Some sample contents in field
> X :
>
> PAYPAL *THETOYCAVE =A0 =A0 =A0 40293577 CREDIT
> CDN TIRE STORE #000986 SMITH FALL CREDIT
> WAL-MART SUPERCENTE SCARBOROUGH =A0 CREDIT
> INEX VENTURES INC =A0 =A0 =A0 =A080047031 CREDIT
>
> I want to find all records in A2 which has a hit of any city name in
> A1.
> I want to know how do you usually solve this kind of problem?
Depending on the size of A2...
SQL is often one of the cleaner syntaxes for performing data
operations.
This sample code demonstrates the use of the INDEX function to perform
a match-criteria test.
----------------------------------------
data tokens;
input token $ @@;
cards;
aa ab ba bc bd ca cz de en fj
run;
data transactions;
do rowid =3D 1 to 10000;
length X $10;
do _n_ =3D 1 to 10;
substr(X,_n_,1) =3D byte(rank('a') + 26*ranuni(1234));
end;
output;
end;
run;
proc sql;
create table study as
select
*
, index (upcase(transactions.X), trim(upcase(tokens.token)))
as hitindex
from
transactions
, tokens
having
hitindex > 0
;
quit;
----------------------------------------
There are many other functions for testing two strings -- regular
expressions, FIND, SOUNDEX, SOUNDS LIKE, etc...
--
Richard A. DeVenezia
http://www.devenezia.com
|
|
0
|
|
|
|
Reply
|
Richard
|
1/12/2010 7:13:54 PM
|
|
Use the NOBS option of the SET statement to determine the number of city names in A1. Create an array with this many entries. Loop through A1 and populate the array.
Loop through A2. For each observation, loop through the array and use the INDEX function to determine if the desired city name is present. Stop this inner loop as soon as a match is found and mark the observation.
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of aSaSnUt
Sent: Tuesday, January 12, 2010 8:41 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Easiest way to search a list of String in a field?
On Jan 11, 11:41 pm, Kulpreet Khanna <kulpreetkha...@gmail.com> wrote:
> On Jan 11, 11:44 pm, aSaSnUt <wuhaih...@gmail.com> wrote:
>
> > I want to identify records in my transaction dataset which have
> > desired City Names .
> > My first dataset A1 has 1 column contains few thousand city names.
> > My second dataset A2 is transaction level data, there is 1 string
> > field X that may have the City Name. Some sample contents in field
> > X :
>
> > PAYPAL *THETOYCAVE 40293577 CREDIT
> > CDN TIRE STORE #000986 SMITH FALL CREDIT
> > WAL-MART SUPERCENTE SCARBOROUGH CREDIT
> > INEX VENTURES INC 80047031 CREDIT
>
> > I want to find all records in A2 which has a hit of any city name in
> > A1.
> > I want to know how do you usually solve this kind of problem?
>
|
|
0
|
|
|
|
Reply
|
barry.a.schwarz (608)
|
1/12/2010 7:48:52 PM
|
|
The only problem I see is if you have a large about of cities to look up
then a combination of techniques will be needed. Otherwise the basic
method I would use is as follows:
Data Need ;
Set A2 ;
Where ( PRXMatch( '/delhi|mumbai|chennai/io' , X ) > 0 ) ;
Run ;
Toby Dunn
On Tue, 12 Jan 2010 08:41:03 -0800, aSaSnUt <wuhaihong@GMAIL.COM> wrote:
>On Jan 11, 11:41 pm, Kulpreet Khanna <kulpreetkha...@gmail.com> wrote:
>> On Jan 11, 11:44 pm, aSaSnUt <wuhaih...@gmail.com> wrote:
>>
>> > I want to identify records in my transaction dataset which have
>> > desired City Names .
>> > My first dataset A1 has 1 column contains few thousand city names.
>> > My second dataset A2 is transaction level data, there is 1 string
>> > field X that may have the City Name. Some sample contents in field
>> > X :
>>
>> > PAYPAL *THETOYCAVE 40293577 CREDIT
>> > CDN TIRE STORE #000986 SMITH FALL CREDIT
>> > WAL-MART SUPERCENTE SCARBOROUGH CREDIT
>> > INEX VENTURES INC 80047031 CREDIT
>>
>> > I want to find all records in A2 which has a hit of any city name in
>> > A1.
>> > I want to know how do you usually solve this kind of problem?
>>
>> data a1;
>> input cityname $;
>> datalines;
>> delhi
>> mumbai
>> chennai
>> ;
>> run;
>>
>> data a2;
>> length x $20;
>> input x $;
>> datalines;
>> cbdsabndlkcdelhiflvnv
>> fvbdjkvbwdklv
>> svfcjehmumbaivndl
>> mumbai
>> ;
>> run;
>>
>> proc sql;
>> select a2.* from a1,a2
>> where index(a2.x,trim(a1.cityname)) >0;
>> quit;
>
>This won't work, first,You'll get this:
>NOTE: The execution of this query involves performing one or more
>Cartesian product joins that
> can not be optimized.
>NOTE: No rows were selected.
>Second, my dataset is huge.
|
|
0
|
|
|
|
Reply
|
tobydunn (6070)
|
1/12/2010 9:05:12 PM
|
|
Here is the "easiest" most understandable (IMHO!) solution:
* Create list of cities;
data cities;
input city$ 20.;
datalines;
Brisbane
Sydney
Melbourne
Perth
Adelaide
;
* Create list of addresses to search for cities;
data address;
input address $50.;
datalines;
boomerang street Brisbane
Cresendo crescend Sydney nsw 2342
Where is Melbourne?
You will not believe that Perth exists
Is Adelaide hot or cold
;
* Join datasets and use contains operator. ;
PROC SQL;
CREATE TABLE WORK.resolved AS
SELECT t1.City,
t2.Address,
(case when t2.Address contains trim(t1.City) then "Yes" else
"No" end) AS ContainInd
FROM WORK.cities AS t1 CROSS JOIN WORK.address AS t2
WHERE (CALCULATED ContainInd) = "Yes";
QUIT;
run;
* How easy was that;
Kind Regards
Francois van der Walt
Senior Business Analyst
G J I
'Stay in Front'
DATA - PRINT - MAIL
On Mon, 11 Jan 2010 10:44:20 -0800, aSaSnUt <wuhaihong@GMAIL.COM> wrote:
>I want to identify records in my transaction dataset which have
>desired City Names .
>My first dataset A1 has 1 column contains few thousand city names.
>My second dataset A2 is transaction level data, there is 1 string
>field X that may have the City Name. Some sample contents in field
>X :
>
>PAYPAL *THETOYCAVE 40293577 CREDIT
>CDN TIRE STORE #000986 SMITH FALL CREDIT
>WAL-MART SUPERCENTE SCARBOROUGH CREDIT
>INEX VENTURES INC 80047031 CREDIT
>
>I want to find all records in A2 which has a hit of any city name in
>A1.
>I want to know how do you usually solve this kind of problem?
|
|
0
|
|
|
|
Reply
|
francoisw (52)
|
1/12/2010 10:55:48 PM
|
|
There is one more sql solution. But SQL can duplicate transactions
observations. You should be careful with sql joins.
/*Build city test file*/
data cities; retain match 1;
informat city $50.;
input city &;
cards;
SMITH FALL
SCARBOROUGH
Toronto
Richmond Hill
;
/*Build transaction test file*/
data transactions;
informat transaction $100.;
input transaction &;
cards;
PAYPAL *THETOYCAVE 40293577 CREDIT
CDN TIRE STORE #000986 SMITH FALL CREDIT
CDN TIRE STORE #000986 Richmond Hill CREDIT
WAL-MART SUPERCENTE SCARBOROUGH CREDIT
INEX VENTURES INC 80047031 CREDIT
;
proc sql noprint;
create table want as select a.*, b.match from transactions as a left
join cities as b
on upcase(a.transaction) contains trim(upcase(b.city));
quit;
|
|
0
|
|
|
|
Reply
|
olopatkin (10)
|
1/13/2010 1:52:05 PM
|
|
|
9 Replies
308 Views
(page loaded in 0.113 seconds)
|