I have a relatively straightforward modeling question that I was
hoping to get some preliminary suggestions about (read: "I'm sort of a
n00b"). I have some high level time series data for accounts--both
dollar balance and a frequency count for each month going back about 5
years. Part of the project involves tracking the accounts from
inception and modeling the decline in balance or frequency of the
accounts in groups according to the month they were acquired and the
account type. This decline is expected to follow the form of
exponential decay, and I'd like to verify that with some kind of
modeling.
The data has the following columns:
procurement month -- the first month this account showed up
actual month -- the actual month for which the data applies
age in months -- the difference between the first two dates,
essentially, expressed in number of months
account type -- a grouping mechanism for describing what type of
accounts these are
balance -- the total balance in dollars
count -- the total frequency of accounts
Here are my concerns:
1) the data has group variables for account type, which I would assume
can fall into a BY statement in whatever PROC I decide to use, but I
need to make sure that such a statement exists (or a comparable way of
modeling the data exists) because the groupings are materially
different
2) the data can change over time, and hence, I must factor into the
modeling the time when the account was acquired; I'm unsure if this
would be appropriately modeled as a time series factor, a variable in
a BY statement, or some other way
3) the frequency of accounts for a given account type and procurement
month is going to be monotonically decreasing as you track them
forward in time; hence, the most plausible options for modeling the
decay of the frequency are probably linear decay and exponential
decay--although more complicated (polynomial?) models might fit
better, I'd like to limit the results to one of these two forms
There are so many different ways to piece together the data that I'm
not sure what the best structure is for modeling, and I'm not sure
which PROCs within SAS are the best for this type of modeling. In
general, I can find my way around the PROCs themselves, but there are
so many different modeling related PROCs, especially because you could
consider this time series data and you could model it otherwise, that
I'm really unsure of the best place to start looking.
So, short story long, what is the best way to model data like this?
Thanks in advance.
|
|
0
|
|
|
|
Reply
|
murphy.ben (51)
|
6/4/2007 4:16:26 PM |
|
It sounds to me as though you may not be doing "modeling" in the SAS
sense of the word. True modeling is taking historic data to make
predictions about future behavior. Unless you intend on predicting, in
your case, balance decay or some other performance measure for your BY
groups, then you might not be doing true modeling and may not need a
SAS PROC at all.
Take a look at PROC REG (for continuous variables) and PROC LOGISTIC
(for discrete values) as a starting point. Those will give you
coefficients to evaluate the partial effect of the right-hand-side
variables on your dependent variable.
Good luck! Sounds like a fun challenge.
On Jun 4, 11:16 am, BJMurphy <murphy....@gmail.com> wrote:
> I have a relatively straightforward modeling question that I was
> hoping to get some preliminary suggestions about (read: "I'm sort of a
> n00b"). I have some high level time series data for accounts--both
> dollar balance and a frequency count for each month going back about 5
> years. Part of the project involves tracking the accounts from
> inception and modeling the decline in balance or frequency of the
> accounts in groups according to the month they were acquired and the
> account type. This decline is expected to follow the form of
> exponential decay, and I'd like to verify that with some kind of
> modeling.
>
> The data has the following columns:
> procurement month -- the first month this account showed up
> actual month -- the actual month for which the data applies
> age in months -- the difference between the first two dates,
> essentially, expressed in number of months
> account type -- a grouping mechanism for describing what type of
> accounts these are
> balance -- the total balance in dollars
> count -- the total frequency of accounts
>
> Here are my concerns:
>
> 1) the data has group variables for account type, which I would assume
> can fall into a BY statement in whatever PROC I decide to use, but I
> need to make sure that such a statement exists (or a comparable way of
> modeling the data exists) because the groupings are materially
> different
>
> 2) the data can change over time, and hence, I must factor into the
> modeling the time when the account was acquired; I'm unsure if this
> would be appropriately modeled as a time series factor, a variable in
> a BY statement, or some other way
>
> 3) the frequency of accounts for a given account type and procurement
> month is going to be monotonically decreasing as you track them
> forward in time; hence, the most plausible options for modeling the
> decay of the frequency are probably linear decay and exponential
> decay--although more complicated (polynomial?) models might fit
> better, I'd like to limit the results to one of these two forms
>
> There are so many different ways to piece together the data that I'm
> not sure what the best structure is for modeling, and I'm not sure
> which PROCs within SAS are the best for this type of modeling. In
> general, I can find my way around the PROCs themselves, but there are
> so many different modeling related PROCs, especially because you could
> consider this time series data and you could model it otherwise, that
> I'm really unsure of the best place to start looking.
>
> So, short story long, what is the best way to model data like this?
> Thanks in advance.
|
|
0
|
|
|
|
Reply
|
sonikson (29)
|
6/4/2007 6:27:23 PM
|
|
On Jun 4, 2:27 pm, sonik son <sonik...@gmail.com> wrote:
> It sounds to me as though you may not be doing "modeling" in the SAS
> sense of the word. True modeling is taking historic data to make
> predictions about future behavior. Unless you intend on predicting, in
> your case, balance decay or some other performance measure for your BY
> groups, then you might not be doing true modeling and may not need a
> SAS PROC at all.
>
> Take a look at PROC REG (for continuous variables) and PROC LOGISTIC
> (for discrete values) as a starting point. Those will give you
> coefficients to evaluate the partial effect of the right-hand-side
> variables on your dependent variable.
>
> Good luck! Sounds like a fun challenge.
>
> On Jun 4, 11:16 am, BJMurphy <murphy....@gmail.com> wrote:
>
>
>
> > I have a relatively straightforward modeling question that I was
> > hoping to get some preliminary suggestions about (read: "I'm sort of a
> > n00b"). I have some high level time series data for accounts--both
> > dollar balance and a frequency count for each month going back about 5
> > years. Part of the project involves tracking the accounts from
> > inception and modeling the decline in balance or frequency of the
> > accounts in groups according to the month they were acquired and the
> > account type. This decline is expected to follow the form of
> > exponential decay, and I'd like to verify that with some kind of
> > modeling.
>
> > The data has the following columns:
> > procurement month -- the first month this account showed up
> > actual month -- the actual month for which the data applies
> > age in months -- the difference between the first two dates,
> > essentially, expressed in number of months
> > account type -- a grouping mechanism for describing what type of
> > accounts these are
> > balance -- the total balance in dollars
> > count -- the total frequency of accounts
>
> > Here are my concerns:
>
> > 1) the data has group variables for account type, which I would assume
> > can fall into a BY statement in whatever PROC I decide to use, but I
> > need to make sure that such a statement exists (or a comparable way of
> > modeling the data exists) because the groupings are materially
> > different
>
> > 2) the data can change over time, and hence, I must factor into the
> > modeling the time when the account was acquired; I'm unsure if this
> > would be appropriately modeled as a time series factor, a variable in
> > a BY statement, or some other way
>
> > 3) the frequency of accounts for a given account type and procurement
> > month is going to be monotonically decreasing as you track them
> > forward in time; hence, the most plausible options for modeling the
> > decay of the frequency are probably linear decay and exponential
> > decay--although more complicated (polynomial?) models might fit
> > better, I'd like to limit the results to one of these two forms
>
> > There are so many different ways to piece together the data that I'm
> > not sure what the best structure is for modeling, and I'm not sure
> > which PROCs within SAS are the best for this type of modeling. In
> > general, I can find my way around the PROCs themselves, but there are
> > so many different modeling related PROCs, especially because you could
> > consider this time series data and you could model it otherwise, that
> > I'm really unsure of the best place to start looking.
>
> > So, short story long, what is the best way to model data like this?
> > Thanks in advance.- Hide quoted text -
>
> - Show quoted text -
Well, I realize that I didn't describe a typical modeling process. If
you think of each group of accounts as its own curve, where you graph
either dollars or count on the y-axis and the month of the data on the
x-axis, then we will have a separate line for each combination of
account type and starting month. I want to adaquately fit each of
these curves with a simple exponential decay model and a simple linear
model, and then compare the two. That part is not very difficult, and
doesn't really require any modeling. However, after this, I want to be
able to see if these curves form a predictable pattern going forward.
So if the oldest data is 5 years old, then we have 60 data points on
the line for that data. Ultimately, after fitting curves to each of
the lines, I want to see if there is something in the data that shows
some kind of pattern in the curves we can fit to each line. That is,
since the line graphs represent data getting newer and newer, can we
adaquately predict what the curve will look like for data that is
newer?
Maybe that helps illuminate the need for SAS PROCs. Thanks again for
any suggestions.
|
|
0
|
|
|
|
Reply
|
murphy.ben (51)
|
6/4/2007 7:03:42 PM
|
|
On Jun 4, 3:03 pm, BJMurphy <murphy....@gmail.com> wrote:
> On Jun 4, 2:27 pm, sonik son <sonik...@gmail.com> wrote:
>
>
>
>
>
> > It sounds to me as though you may not be doing "modeling" in the SAS
> > sense of the word. True modeling is taking historic data to make
> > predictions about future behavior. Unless you intend on predicting, in
> > your case, balance decay or some other performance measure for your BY
> > groups, then you might not be doing true modeling and may not need a
> > SAS PROC at all.
>
> > Take a look at PROC REG (for continuous variables) and PROC LOGISTIC
> > (for discrete values) as a starting point. Those will give you
> > coefficients to evaluate the partial effect of the right-hand-side
> > variables on your dependent variable.
>
> > Good luck! Sounds like a fun challenge.
>
> > On Jun 4, 11:16 am, BJMurphy <murphy....@gmail.com> wrote:
>
> > > I have a relatively straightforward modeling question that I was
> > > hoping to get some preliminary suggestions about (read: "I'm sort of a
> > > n00b"). I have some high level time series data for accounts--both
> > > dollar balance and a frequency count for each month going back about 5
> > > years. Part of the project involves tracking the accounts from
> > > inception and modeling the decline in balance or frequency of the
> > > accounts in groups according to the month they were acquired and the
> > > account type. This decline is expected to follow the form of
> > > exponential decay, and I'd like to verify that with some kind of
> > > modeling.
>
> > > The data has the following columns:
> > > procurement month -- the first month this account showed up
> > > actual month -- the actual month for which the data applies
> > > age in months -- the difference between the first two dates,
> > > essentially, expressed in number of months
> > > account type -- a grouping mechanism for describing what type of
> > > accounts these are
> > > balance -- the total balance in dollars
> > > count -- the total frequency of accounts
>
> > > Here are my concerns:
>
> > > 1) the data has group variables for account type, which I would assume
> > > can fall into a BY statement in whatever PROC I decide to use, but I
> > > need to make sure that such a statement exists (or a comparable way of
> > > modeling the data exists) because the groupings are materially
> > > different
>
> > > 2) the data can change over time, and hence, I must factor into the
> > > modeling the time when the account was acquired; I'm unsure if this
> > > would be appropriately modeled as a time series factor, a variable in
> > > a BY statement, or some other way
>
> > > 3) the frequency of accounts for a given account type and procurement
> > > month is going to be monotonically decreasing as you track them
> > > forward in time; hence, the most plausible options for modeling the
> > > decay of the frequency are probably linear decay and exponential
> > > decay--although more complicated (polynomial?) models might fit
> > > better, I'd like to limit the results to one of these two forms
>
> > > There are so many different ways to piece together the data that I'm
> > > not sure what the best structure is for modeling, and I'm not sure
> > > which PROCs within SAS are the best for this type of modeling. In
> > > general, I can find my way around the PROCs themselves, but there are
> > > so many different modeling related PROCs, especially because you could
> > > consider this time series data and you could model it otherwise, that
> > > I'm really unsure of the best place to start looking.
>
> > > So, short story long, what is the best way to model data like this?
> > > Thanks in advance.- Hide quoted text -
>
> > - Show quoted text -
>
> Well, I realize that I didn't describe a typical modeling process. If
> you think of each group of accounts as its own curve, where you graph
> either dollars or count on the y-axis and the month of the data on the
> x-axis, then we will have a separate line for each combination of
> account type and starting month. I want to adaquately fit each of
> these curves with a simple exponential decay model and a simple linear
> model, and then compare the two. That part is not very difficult, and
> doesn't really require any modeling. However, after this, I want to be
> able to see if these curves form a predictable pattern going forward.
>
> So if the oldest data is 5 years old, then we have 60 data points on
> the line for that data. Ultimately, after fitting curves to each of
> the lines, I want to see if there is something in the data that shows
> some kind of pattern in the curves we can fit to each line. That is,
> since the line graphs represent data getting newer and newer, can we
> adaquately predict what the curve will look like for data that is
> newer?
>
> Maybe that helps illuminate the need for SAS PROCs. Thanks again for
> any suggestions.- Hide quoted text -
>
> - Show quoted text -
Hi BJ,
It seems what you are trying to do could be as simple as a
distribution analysis by vintage, but it can be as complex as a
survival or logistic model can be. An event related (you have to
define your event - balance attrition or what is with interest for
you) type of analysis could be another alternative. What you need to
do is realign your information around the time of event for those
accts that have already gone through that event, then just try to
undersand the trends on your realigned data in timeframe that goes
back up to 6 periods of time (I gues you have monthly data).
Hope this will help,
Thanks,
Hansi
|
|
0
|
|
|
|
Reply
|
mytkolli (11)
|
6/4/2007 8:49:47 PM
|
|
On Jun 4, 4:49 pm, hansi_m <mytko...@gmail.com> wrote:
> On Jun 4, 3:03 pm, BJMurphy <murphy....@gmail.com> wrote:
>
>
>
>
>
> > On Jun 4, 2:27 pm, sonik son <sonik...@gmail.com> wrote:
>
> > > It sounds to me as though you may not be doing "modeling" in the SAS
> > > sense of the word. True modeling is taking historic data to make
> > > predictions about future behavior. Unless you intend on predicting, in
> > > your case, balance decay or some other performance measure for your BY
> > > groups, then you might not be doing true modeling and may not need a
> > > SAS PROC at all.
>
> > > Take a look at PROC REG (for continuous variables) and PROC LOGISTIC
> > > (for discrete values) as a starting point. Those will give you
> > > coefficients to evaluate the partial effect of the right-hand-side
> > > variables on your dependent variable.
>
> > > Good luck! Sounds like a fun challenge.
>
> > > On Jun 4, 11:16 am, BJMurphy <murphy....@gmail.com> wrote:
>
> > > > I have a relatively straightforward modeling question that I was
> > > > hoping to get some preliminary suggestions about (read: "I'm sort of a
> > > > n00b"). I have some high level time series data for accounts--both
> > > > dollar balance and a frequency count for each month going back about 5
> > > > years. Part of the project involves tracking the accounts from
> > > > inception and modeling the decline in balance or frequency of the
> > > > accounts in groups according to the month they were acquired and the
> > > > account type. This decline is expected to follow the form of
> > > > exponential decay, and I'd like to verify that with some kind of
> > > > modeling.
>
> > > > The data has the following columns:
> > > > procurement month -- the first month this account showed up
> > > > actual month -- the actual month for which the data applies
> > > > age in months -- the difference between the first two dates,
> > > > essentially, expressed in number of months
> > > > account type -- a grouping mechanism for describing what type of
> > > > accounts these are
> > > > balance -- the total balance in dollars
> > > > count -- the total frequency of accounts
>
> > > > Here are my concerns:
>
> > > > 1) the data has group variables for account type, which I would assume
> > > > can fall into a BY statement in whatever PROC I decide to use, but I
> > > > need to make sure that such a statement exists (or a comparable way of
> > > > modeling the data exists) because the groupings are materially
> > > > different
>
> > > > 2) the data can change over time, and hence, I must factor into the
> > > > modeling the time when the account was acquired; I'm unsure if this
> > > > would be appropriately modeled as a time series factor, a variable in
> > > > a BY statement, or some other way
>
> > > > 3) the frequency of accounts for a given account type and procurement
> > > > month is going to be monotonically decreasing as you track them
> > > > forward in time; hence, the most plausible options for modeling the
> > > > decay of the frequency are probably linear decay and exponential
> > > > decay--although more complicated (polynomial?) models might fit
> > > > better, I'd like to limit the results to one of these two forms
>
> > > > There are so many different ways to piece together the data that I'm
> > > > not sure what the best structure is for modeling, and I'm not sure
> > > > which PROCs within SAS are the best for this type of modeling. In
> > > > general, I can find my way around the PROCs themselves, but there are
> > > > so many different modeling related PROCs, especially because you could
> > > > consider this time series data and you could model it otherwise, that
> > > > I'm really unsure of the best place to start looking.
>
> > > > So, short story long, what is the best way to model data like this?
> > > > Thanks in advance.- Hide quoted text -
>
> > > - Show quoted text -
>
> > Well, I realize that I didn't describe a typical modeling process. If
> > you think of each group of accounts as its own curve, where you graph
> > either dollars or count on the y-axis and the month of the data on the
> > x-axis, then we will have a separate line for each combination of
> > account type and starting month. I want to adaquately fit each of
> > these curves with a simple exponential decay model and a simple linear
> > model, and then compare the two. That part is not very difficult, and
> > doesn't really require any modeling. However, after this, I want to be
> > able to see if these curves form a predictable pattern going forward.
>
> > So if the oldest data is 5 years old, then we have 60 data points on
> > the line for that data. Ultimately, after fitting curves to each of
> > the lines, I want to see if there is something in the data that shows
> > some kind of pattern in the curves we can fit to each line. That is,
> > since the line graphs represent data getting newer and newer, can we
> > adaquately predict what the curve will look like for data that is
> > newer?
>
> > Maybe that helps illuminate the need for SAS PROCs. Thanks again for
> > any suggestions.- Hide quoted text -
>
> > - Show quoted text -
>
> Hi BJ,
>
> It seems what you are trying to do could be as simple as a
> distribution analysis by vintage, but it can be as complex as a
> survival or logistic model can be. An event related (you have to
> define your event - balance attrition or what is with interest for
> you) type of analysis could be another alternative. What you need to
> do is realign your information around the time of event for those
> accts that have already gone through that event, then just try to
> undersand the trends on your realigned data in timeframe that goes
> back up to 6 periods of time (I gues you have monthly data).
>
> Hope this will help,
>
> Thanks,
>
> Hansi- Hide quoted text -
>
> - Show quoted text -
Hansi, what you suggested sounds intriguing, but I'm not sure I follow
the steps you are describing. A few others have asked for a more
detailed explanation of the data, so I'll provide that here in hopes
of clarifying for everyone.
I have monthly data for accounts that includes the month the account
originated, the current month, the current month's balance, and the
group to which the account belongs. I have aggregated this data so
that I have a row that shows the balance and number of accounts for
each unique combination of originating month, current month and
account group.
For reporting this historical data, I typically draw a graph where the
x-axis is the current month of data, the y-axis is the balance (or
frequency) and there is a separate line for each value of originating
month, and separate graphs for each account group. If I have 60
months of data for accounts that fall into 10 groups, this gives me 10
graphs and on each graph, there are 60 lines--there is a line with 60
data points (total balance or frequency for each month for all
accounts that originated in the first month), a line with 59 data
points (same total for accounts originating in the second month), and
so on, to a line with 2 data points (data for this month and last
month for accounts that are two months old) and a final data point
(data for this month for accounts that originated this month). If you
assume that these balances (or frequencies) are going to be following
some kind of decay function, then you can visually imagine a series of
stacked decay curves, one after the other, shrinking in length as you
view the graph from left to right.
What I want to determine is:
1) What is the best fit equation (both linear and exponential) for
each of these lines? (this part is pretty easy)
2) As I follow this series of stacked decay curves from the past, into
the present, what trend do they form, and what does that tell me about
the shape of curves that are not fully formed yet? In other words,
what information is shown by the curves with 50 data points that might
help me fill out the rest of a curve with only 10 data points? (this
is the part I cannot figure out how to do)
I can do (1) on a case by case basis, but I have yet to figure out how
to do them systematically in SAS. I have graphed the data in Excel for
the presentation to the end users, so I know that the data looks
predominantly either exponential or linear, so I'm not overly
concerned with my choice of those two graph types. I'm not sure if it
is easier to do (2) with the data points or the equations from (1),
but I can use either, theoretically. Either way, any help and
suggestions would be great. Thanks again.
|
|
0
|
|
|
|
Reply
|
murphy.ben (51)
|
6/5/2007 4:43:10 PM
|
|
|
4 Replies
43 Views
(page loaded in 0.161 seconds)
Similiar Articles: simple question about MODEL matrix - comp.graphics.api.opengl ...Basic inverse question on MPC code analysis - comp.soft-sys.matlab ... simple question about MODEL matrix - comp.graphics.api.opengl ... Basic inverse question on MPC code ... GARCH(1,1) MLE Question - comp.soft-sys.matlabDo you have any samples for ANY multivariate GARCH/ARCH type models (e.g. GARCH(1,1 ... Re: SAS Macro question 0 2 (6/26/2003 11:53:27 PM) Nan Bing wrote: > This is a ... "horizontal modeling" (TM) samples ? - comp.cad.solidworks ...Ideally, I'd like to find 2 models of the same part, one being ... of the horizontal method being patented is called into question. Bottom up is the "safest" way to model ... Re: PROC NLP and R^2 - comp.soft-sys.sasHi, This is a follow-up question regarding non-linear modeling from my post last month. Although I have received some useful input at that time, I... ARX modeling inaccurate - comp.soft-sys.matlabsimple question about MODEL matrix - comp.graphics.api.opengl ... ARX modeling inaccurate - comp.soft-sys.matlab simple question about MODEL matrix - comp.graphics.api ... PHREG w/ strata question - comp.soft-sys.sasI've run into a question regarding how PHREG functions when using the strata statement to generate risk sets for a conditional logistic regression modeling match case ... AR(1) model (autoregressive process) - comp.soft-sys.matlab ...So my question is, if you are trying to produce an AR(1) signal, under what ... from a journal paper is as follows: 1) First order autoregressive model is used 2 ... EVA, EPA and ETU channel fading models - comp.dspHi, I am not sure if the is the right forum for this kind of question, but I wanted to know if the channel models specified in the 3GPP standard (3... Ask the "write" EXIF question and Tiff tags - comp.soft-sys.matlab ...Since questions of a similar nature have previously gone unanswered, I will assume no. ... I only do this for a few tags that are relevant to the sync, like "Make" "Model ... Modeling a gearbox - comp.soft-sys.matlabMy question is how to implement a simple lookupswitch that can lock the torque ... > Hi, > > I'm in the startup phase of a model of a an automatic gearbox with ... +2 QUESTION PAPERS - SCERT KERALA+2 QUESTION PAPERS: Introduction: english_1set: malayalam-11-part1: malayalam-11-optional: hindi-optional: hindi_Part2: Tamil_Part ... HSC Mathematics -2nd Paper (Theoretical) Model Question-2Two forces P and Q acting parallel to the length and base of a plane inclined at an angle to the horizon. Support together a body placed on the plane. 7/23/2012 2:18:51 AM
|