Parsing Related Data

  • Follow


Here's a good one for the group, cause it's kind of tough.  Let's say
I've got a database with two tables, one called Contacts and one
called Categories.  Categories is used to categorize contacts, so the
user can make as many categories as he wants and assign as many
contacts as he wants -- a many-to-many relation where the contact pk
(a field called IDC) is stored in a list in Categories (a field called
IDCs) through a relation we'll call CIDC_GIDCs (that is, the Contact
IDC field to the Category IDCs field).

My client wants to be able to send lists of contacts to people without
access to the database using Excel so recipients can manipulate the
data however they need.  There's no problem doing this.  However, she
wants to include the categories for each contact in columns so that
the rows can be sorted and compared by category.  In other words, she
wants an Excel list that might look something like this:


Name                 Tall          Freshman
Scholarship          Registered
Jack Sprat          Tall
Scholarship          Registered
Mary Franklin                     Freshman        Scholarship
Registered
Linda Chafey       Tall         Freshman
Steve Flynn                       Freshman
Hugh Prince
Scholarship         Registered


Since the lists of contacts she exports are determined by ad hoc
criteria, she has to be able to export random found sets from the
Contacts table.  If I include a field like List(CIDC_GIDCs::Category),
it exports the categories one under the other so that each contact
takes up more than one row and it's not sortable.  If I include a
field with this calculation substituting tabs for line breaks, it
exports without the tabs so that all the categories end up in one
field/column (and this wouldn't work in the end, because columns
wouldn't line up properly).

Instead, I created a bunch of generic category fields in Contacts and
wrote a script that parses the categories each time she wants to
export a list.  In other words, the Contacts table has a bunch of
fields named "Cat1," "Cat2," Cat3" etc. and I use the "Set Field by
Name" step to loop through and insert categories on an as-needed
basis, then wipe them clean.  The generic fields aren't used to store
data -- just to create a spreadsheet-like arrangement out of related
data for exporting.

It's a great script, but takes forever.  The client has a very fast
setup, but it takes about a minute per contact record.  That's a hell
of a long time!  Roughly speaking, it works like this:

1.  From the Contacts table, go to the Category records related to the
current found set of contacts
2.  Loop through the categories and make a list of them so we know
what categories to include on the spreadsheet
3.  Go back to Contacts and loop through each record to set the
generic category field as follows:
     -  start with the first category in the list we compiled in step
2
     -  go to the first record
     -  if the category is assigned to the contact, insert it in the
first category field, otherwise leave it empty
     -  go to the next record and repeat
     -  at the end of doing this for the first category in the list,
do it for the second and repeat
4.  When finished assigning categories, save the records as an Excel
spreadsheet
5.  Clean up the category fields and return to zero.

There's plenty of error-checking to make sure nothing funny happens
and a progress bar to keep the user apprised, but this is the basic
routine.  I'd be happy to post the entire thing if someone wanted to
look at it more closely.  The question is:  Can the spreadsheet be
created in a more straightforward way?  Is there some kind of export
I'm not aware of that wouldn't take so long?  One of the challenges is
that the columns need to line up in Excel, so the script needs to take
account of all the categories assigned to the current found set and
leave empty cells where categories aren't assigned.  Her lists often
involve thousands of contacts, so it needs to run faster.  Seems like
there must be a better way.  You say?

Thanks for any help.

Best,
-Jahn
0
Reply jahnbigbooty (106) 4/14/2010 10:54:45 PM

In article
<7fde5240-683a-437b-be4f-7579349700a2@g30g2000yqc.googlegroups.com>, jahn
<jahnbigbooty@yahoo.com> wrote:

> Here's a good one for the group, cause it's kind of tough.  Let's say
> I've got a database with two tables, one called Contacts and one
> called Categories.  Categories is used to categorize contacts, so the
> user can make as many categories as he wants and assign as many
> contacts as he wants -- a many-to-many relation where the contact pk
> (a field called IDC) is stored in a list in Categories (a field called
> IDCs) through a relation we'll call CIDC_GIDCs (that is, the Contact
> IDC field to the Category IDCs field).
> 
> My client wants to be able to send lists of contacts to people without
> access to the database using Excel so recipients can manipulate the
> data however they need.  There's no problem doing this.  However, she
> wants to include the categories for each contact in columns so that
> the rows can be sorted and compared by category.  In other words, she
> wants an Excel list that might look something like this:
> 
<snip>
> 
> It's a great script, but takes forever.  The client has a very fast
> setup, but it takes about a minute per contact record.  That's a hell
> of a long time!  Roughly speaking, it works like this:
> 
<snip>


The reason that is slow so that you're looping through all the Found Set
of Contacts records multiple times, as well as the Category records. If
there are 10 categories, then you're looping through the Contacts records
10 times ... 1000 records means 10,000 loops.

An improvement would be to loop through the Found Set once, setting all
the CatX fields for each record at the same time (see the last part
below), but you could skip the Script entirely and use Calculation fields.

In the Contacts Table you need a Calculation Field that collects all the
assigned Categories using the List function that you've already
discovered, but add an extra Return character to the end to make it easier
to extract the last Category.
e.g.
     AllCategories      Calculation, Text, Unstored
             = List(CIDC_GIDCs::Category) & "**"

Note: "**" is really the Return character / pilcrow symbol - the 'reverse
P' symbol on one of the buttons in the Define Calculation window.

Each individual assigned category in this field will be separated by a
Return character. 

At this point you could simply use the Substitute function to swap these
Return characters to something else unique (e.g. "#-#"), export the
records to a Text file and then use Microsoft Word or similar to
Find-Replace the "#-#" with a Tab character. Then re-save the file as Text
ready for importing into Excel ... but that's a bit of an unnecessary
hassle for the users.

Instead let FileMaker do the work. You can use the position of those
Return characters to allow a set of Calculation Cat Fields to extract each
individual Category using the text functions of Middle and Position.

To keep things simple, each Category can be extracted using the same basic
formula - the Xth category is all the characters starting after the
(X-1)th Return character up to just before the Xth Return character. This
is easy to do using the Middle function:

   Middle (AllCategories; 
           location of (X-1)th Return + 1 character; 
           location of Xth Return - location of (X-1)the Return - 1 character)

The second parameter of the Middle function is where in the original text
we want to start extracting characters - in this case it is the location
of the (X-1)th Return plus 1 character because we don't want the Return
itself.

The third parameter of the Middle function is how many characters we want
to extract - in this case it can be calculated as the location of the Xth
Return minus (X-1)th Return, minus 1 to again skip the Return character.

The location of the Nth Return character can be found using the Position
function:

   Position (AllCategories; "**"; 1; N)

This means the formula to extract the Xth category is:

   Middle(AllCategories; 
          Position(AllCategories; "**"; 1; X-1) + 1; 
          Position(AllCategories; "**"; 1; X) - Position(AllCategories; 
                                                     "**"; 1; X-1) - 1)

Using this you can create the set of Cat fields to extract all the
individual categories:

   Cat1            Calculation, Text, Unstored
      = Middle(AllCategories; 
               Position(AllCategories; "**"; 1; 0) + 1; 
               Position(AllCategories; "**"; 1; 1) - Position(AllCategories; 
                                                           "**"; 1; 0) - 1)

   Cat2            Calculation, Text, Unstored
      = Middle(AllCategories; 
               Position(AllCategories; "**"; 1; 1) + 1; 
               Position(AllCategories; "**"; 1; 2) - Position(AllCategories; 
                                                           "**"; 1; 1) - 1)

   Cat3            Calculation, Text, Unstored
      = Middle(AllCategories; 
               Position(AllCategories; "**"; 1; 2) + 1; 
               Position(AllCategories; "**"; 1; 3) - Position(AllCategories; 
                                                           "**"; 1; 2) - 1)

   etc.

Using a 0 in the Cat1 formula means FileMaker simply starts at the
beginning of the AllCategories field. The Calculation fields will stay
empty if there is no appropriate Category, i.e. if Contact record only has
three categories, then Cat4, Cat5, etc. will remain empty.

Export all these CatX fields, plus whatever else you need (name, etc.), to
whatever file format you want.



Using Calculation fields to do all the work might slow the database down
in general use if there are A LOT of Cat fields. You could instead use a
Script similar to your original which is run only when you need to produce
the export file ... all the forumlas will be the same, you just have them
as the second parameter of the Set Field command.
e.g.
     Set Field [AllCategories; List(List(CIDC_GIDCs::Category) & "**"]
     Set Field [Cat1; Middle(AllCategories; 
                 Position(AllCategories; "**"; 1; 0) + 1; 
                 Position(AllCategories; "**"; 1; 1) 
                                - Position(AllCategories; "**"; 1; 0) - 1)]
     Set Field [Cat2; Middle(AllCategories; 
                 Position(AllCategories; "**"; 1; 1) + 1; 
                 Position(AllCategories; "**"; 1; 2) 
                                - Position(AllCategories; "**"; 1; 1) - 1)]
     Set Field [Cat3; Middle(AllCategories; 
                 Position(AllCategories; "**"; 1; 2) + 1; 
                 Position(AllCategories; "**"; 1; 3) 
                                - Position(AllCategories; "**"; 1; 2) - 1)]

   etc.

This Script will need to loop through all Found Set records, but only once.


Helpful Harry  :o)
0
Reply your.name2 (344) 4/15/2010 4:12:01 AM


Hi, Harry -

A fantastically well thought-out answer.  I did think about the idea
of exporting to a text file and having the user do find/replace on a
string, but I try to provide clients with push-button ease and the
extra steps seemed like too many.  With respect to looping, I didn't
realize that it would take more time to loop through each record and
set each field rather than set each field and loop through the
records, though I can see how this could be the case.  But is looping
definitely more processor-intensive?  For instance, taking the Word
example, if you do a find/replace on a document with a hundred
carriage returns or one carriage return, it's all the same to the
application, all other things being equal.  Does FileMaker treat the
data differently on the back end?  I mean, is it anything more than
rows separated by carriage returns in a 1s and 0s kind of way, or does
a step from one record to the next really entail more processing?  I'd
love to know the answer if anyone's done some tests on this, because
it would certainly impact the way I write my scripts.

With respect to the idea of a calculation, I was kind of puzzling over
that when I was writing my post because it seemed like it would save
all that time the script runs, but I wonder if there aren't a couple
problems here.  If I understand correctly, I'd define each of the
generic category fields as a calculation versus, as I have it now, an
empty text field that's filled by the script.  The calculation would
essentially be:  for Cat1, the first category; for Cat2, the second,
etc., but only if those categories are assigned to the record;
otherwise leave it blank.

So my first concern is that the Contacts table has about 50,000
records and the Categories table has about 50.  These will continue to
grow, but that gives you a sense of the magnitude -- in other words,
the Contacts table isn't a hundred records, but it's not a million
either.  If a new category is added, it shifts the other categories,
because the relationship CIDC_GIDCs is alphabetical, since my client
wants the category columns in the spreadsheet to line up
alphabetically.  I didn't mention this in my post, and in fact my
example doesn't demonstrate it, but it's one of the factors.  So if a
new category is added, but none of the Cat fields are being displayed
and the calculation is set not to be stored, does the database take
any kind of performance hit at all?  Or will it only perform the
calculations at the moment it gets ready to display/export them?

The other challenge -- this is more fundamental -- is let's say the
client wants to export a list that in total uses about 10 categories,
but one of the categories is toward the end of the list.  In that case
there'll be 40 empty columns interspersed among the columns with
data.  And in order to clean up the document, they'd have to scroll
through all the columns horizontally _and_ vertically to see whether
any given column has data in it somewhere along the way.  Then they'd
have to delete each of the 40 columns that wasn't being used.  This is
different from my script, which always only uses as many columns as
are necessary and puts them right next to each other.

If this isn't correct -- if I've misunderstood your setup -- let me
know.  I'd love to find a way to do it with calculations, but as far
as I know, there's no way to define List() as meaning only the records
in the current found set, sort of like the way you can have a script
step that goes to all the records that match the current found set.
That would be a fantastic calculation for this purpose.

An intermediate path between your calculations and my script would
probably be to obtain the list of related categories the way I do at
the beginning of my script and then use Replace All to set the Cat
fields using something similar to your calculations.  The reason I
avoided Replace All initially was that the looping allows me to use
"Set Field by Name," which means I only have to write a few lines to
accommodate all iterations of the Cat field (which currently goes
through Cat50, but will have to go higher to allow for growth), rather
than having to write a separate line for each iteration.  Needless to
say, it violates relational design principles, but that's what happens
when you get mixed up with Excel.

Thanks a ton for your input on this, and for anything else you can
add.

Best,
-J.
0
Reply jahnbigbooty (106) 4/15/2010 8:44:34 AM

On 2010-04-15 01:44:34 -0700, jahn <jahnbigbooty@yahoo.com> said:

> With respect to looping, I didn't
> realize that it would take more time to loop through each record and
> set each field rather than set each field and loop through the
> records, though I can see how this could be the case.  But is looping
> definitely more processor-intensive?  For instance, taking the Word
> example, if you do a find/replace on a document with a hundred
> carriage returns or one carriage return, it's all the same to the
> application, all other things being equal.

When looping and setting fields, for each record, Filemaker

1. Opens the record
2. Sets the field(s) data
3. Closes the record

Note that closing in this case is not the same as committing the record 
data. It's a separate operation.

So multiple loops means multiple opening and closing, which does take 
processor time. Quite a bit of processor time.

To set multiple fields at one pass only takes one open and one close, 
and so is the more efficient method, always.

Filemaker is not Word. Never make the  mistake of equating their 
operation to each other.

Lynn Allen
FileMaker 10 Certified Developer

0
Reply lynn6956 (12) 4/15/2010 3:32:32 PM

Hi, Lynn --

Thanks for that info.  I wasn't suggesting that FileMaker and Word
operate equivalently; only that we shouldn't confuse what something
looks like on the frontend with what's happening on the backend.  What
does it mean to "open" a record?  Is the database doing real
processing when that happens?  I have a client that thinks clicking
"Omit" will delete a record just because it disappears from view, but
we all know that doesn't really happen.  I have another who doesn't
understand where all the records from List view go when he switches to
Form view.  On the frontend, these feel like monumental changes to
some users.  On the backend the computer is yawning.

Similarly, we might look at a table and see the way FM displays a
frame around separate records, but does the processor really care
about that frame, or is this just a way of translating the 1s and 0s
so we can understand (a translation that doesn't change, as you point
out, unless the records are committed or the screen refreshed)?  Does
going from record 1 to record 2 take more processing power than going
from field A to field B?  If so, why?  What's the difference in logic
and processing time?

The Word comparison was just a metaphor to get us past thinking that
what we see on the screen mirrors what's happening on the chip.
Thoughts?

Best,
-J.
0
Reply jahnbigbooty (106) 4/15/2010 6:56:16 PM

In article
<ef95f60f-268a-48a6-b657-90c999a13ec6@q15g2000yqj.googlegroups.com>, jahn
<jahnbigbooty@yahoo.com> wrote:

> Hi, Harry -
> 
<snip>
> 
> With respect to the idea of a calculation, I was kind of puzzling over
> that when I was writing my post because it seemed like it would save
> all that time the script runs, but I wonder if there aren't a couple
> problems here.  If I understand correctly, I'd define each of the
> generic category fields as a calculation versus, as I have it now, an
> empty text field that's filled by the script.  The calculation would
> essentially be:  for Cat1, the first category; for Cat2, the second,
> etc., but only if those categories are assigned to the record;
> otherwise leave it blank.
> 
> So my first concern is that the Contacts table has about 50,000
> records and the Categories table has about 50.  These will continue to
> grow, but that gives you a sense of the magnitude -- in other words,
> the Contacts table isn't a hundred records, but it's not a million
> either.  If a new category is added, it shifts the other categories,
> because the relationship CIDC_GIDCs is alphabetical, since my client
> wants the category columns in the spreadsheet to line up
> alphabetically.  I didn't mention this in my post, and in fact my
> example doesn't demonstrate it, but it's one of the factors.  So if a
> new category is added, but none of the Cat fields are being displayed
> and the calculation is set not to be stored, does the database take
> any kind of performance hit at all?  Or will it only perform the
> calculations at the moment it gets ready to display/export them?

With the CatX Calculation Fields set to be 'unstored' (I think newer
versions of FileMaker call it "calculate when needed"), the contents are
only calculated when you export the records (or when you look at a Layout
on screen that contains those Fields).

The List function will automatically alphabetise the related Category
values, so the CatX values will be alphabetical as well.

BUT,
this also brings up another point that I've since realised with my method
which means it doesn't quite do what you want it to, which you've reminded
me of here as well ...



> The other challenge -- this is more fundamental -- is let's say the
> client wants to export a list that in total uses about 10 categories,
> but one of the categories is toward the end of the list.  In that case
> there'll be 40 empty columns interspersed among the columns with
> data.  And in order to clean up the document, they'd have to scroll
> through all the columns horizontally _and_ vertically to see whether
> any given column has data in it somewhere along the way.  Then they'd
> have to delete each of the 40 columns that wasn't being used.  This is
> different from my script, which always only uses as many columns as
> are necessary and puts them right next to each other.

With my method, the Category of "Scholarship" (for example) won't be in
the same CatX field / Excel column for all the Records, so it's not doing
quite what you wanted ... but it does mean it doesn't have blank columns
between values.

Getting the same value in each CatX Field / Excel column will be more
difficult and I'll need to do a bit more thinking ... it may even be that
the moethod you created is the best.




Helpful Harry  :o)
0
Reply your.name2 (344) 4/16/2010 1:36:42 AM

Hi, Harry -

Grip came up with a streamlined method using a variable and unstored
calculation (see thread above).  He mentioned a join table, but it's
not necessary.  The calc works well in the Contacts table and requires
almost no scripting, so moves very fast.  I'm putting it through its
paces right now, but it seems to do the trick.

Thanks for your input -- it was helpful and educational, even if I
went with a different solution.

Best,
-J.
0
Reply jahnbigbooty (106) 4/16/2010 2:17:43 AM

On 2010-04-15 11:56:16 -0700, jahn <jahnbigbooty@yahoo.com> said:

> Thanks for that info.  I wasn't suggesting that FileMaker and Word
> operate equivalently; only that we shouldn't confuse what something
> looks like on the frontend with what's happening on the backend.  What
> does it mean to "open" a record?  Is the database doing real
> processing when that happens?

Yes, real processing is done at that point.

Think of a record as being in one of several states:

1. Being viewed but not being edited
2. Having a cursor in a field but without new data being entered (opening)
3. Being actively edited (or data being inserted by script)
4. Editing finished but not yet committed
5. Being committed and returning to state 1

In versions of FM earlier than 10, states 2 & 3 were essentially the 
same, with record locking taking place as the cursor enters the field 
and unlocking with the cursor leaving the field. In FM 10 & 11, the 
locking doesn't take place until State 3. I'm not entirely sure where 
locking currently releases, I suspect it's still on Record Commit. 
Though it is possible to determine if a field has been changed without 
the entire record being committed, as this is what several script 
triggers are based on.

The state change to open & closed causes a processing & performance hit 
when a loop sets on field and then repeats through the record set to 
set another. One loop, one open/close per record is much more 
efficient, as the state change only occurs once, no matter how many 
fields are edited while the record is in the open state.

Lynn Allen
FileMaker 10 Certified Developer

0
Reply lynn6956 (12) 4/16/2010 2:53:05 AM

Hi, Lynn -

Thanks.  That's good information to know.  I spend a lot of time
trying to streamline scripts and calculations -- putting conditional
statements in the right order to minimize the steps, etc. -- so
understanding record states is helpful.  Where do you get the info
from?  Sometimes I have specific questions about which processes are
going to tax the database the most and haven't had that much luck
finding technical info.  Do you get it from FileMaker's knowledge
base?  Other forums?

Thanks again for your help.

Best,
-J.
0
Reply jahnbigbooty (106) 4/16/2010 6:21:08 AM

On 2010-04-15 23:21:08 -0700, jahn <jahnbigbooty@yahoo.com> said:

> Where do you get the info
> from?  Sometimes I have specific questions about which processes are
> going to tax the database the most and haven't had that much luck
> finding technical info.  Do you get it from FileMaker's knowledge
> base?  Other forums?
> 
> Thanks again for your help.

Start with Filemaker's Training Series.  That, added to all my years of 
experience, was what I used to study for the FM10 Certification exam.

Join the FBA and Technet, and pay attention to the discussions on each 
group/mailing list.  Go to a few DevCons, and hang around in the 
hallways listening to really smart people talk about FM. Go to whatever 
Under the Hood sessions are offered.  Ask questions of the FM engineers.

Read as many forums as you have time for, but give preference to time 
actually working with the program.  Get lots of clients with lots of 
divergent needs, so you *have* to learn this stuff.

Work w/Filemaker for at least 8 versions. Oh wait, due to the 
renumbering, it's more like 10 versions. I started when it was 
Filemaker II, (not FM Pro 2.0) back in 1992.

Let percolate for a decade or two.

Basically it's all just Never Stop Learning.

Lynn Allen
FileMaker 10 Certified Developer

0
Reply lynn6956 (12) 4/16/2010 5:22:18 PM

9 Replies
38 Views

(page loaded in 0.12 seconds)

Similiar Articles:













7/9/2012 8:48:08 AM


Reply: