stat question: comparing proportions in 2 dependent samples

  • Permalink
  • submit to reddit
  • Email
  • Follow


I'm looking for an appropriate statistical test for the difference in
proportions. I want to compare the proportion of a factor (for example,
males) in population A versus the proportion in population B.
However, population A is a partial subset of the larger population B (i.e.,
most, but not all, of population A is contained in population B).
I can't use the ChiSquare test as the 2 samples are not independent (such
as cases vs controls). I'm reluctant to use the McNemar's test for
dependent proportions as the 2 samples are not paired (such as before vs
after in the same population). Can anyone recommend a suitable statistical
test for the difference in these proportions? I'd be delighted with any SAS
code as well!
0
Reply wcw2 (31) 7/17/2007 5:58:52 PM

See related articles to this posting


Take a look at the proc freq documentation.  One thing you will find is:

McNemar's test
Beginning in Release 6.10, use the AGREE option in Base SAS PROC FREQ.
Before Release 6.10, create a three-way table with a stratum variable
identifying each subject (or matched group), a variable indicating each
occasion (condition or individual within matched group), and a binary
response variable. Then use the CMH option. For example, if each subject
gives a binary response to each of two drugs, use the statement:
  tables subject*drug*response/cmh2 noprint;

HTH,
Art
--------
On Tue, 17 Jul 2007 13:58:52 -0400, wcw2 <wcw2@CDC.GOV> wrote:

>I'm looking for an appropriate statistical test for the difference in
>proportions. I want to compare the proportion of a factor (for example,
>males) in population A versus the proportion in population B.
>However, population A is a partial subset of the larger population B
(i.e.,
>most, but not all, of population A is contained in population B).
>I can't use the ChiSquare test as the 2 samples are not independent (such
>as cases vs controls). I'm reluctant to use the McNemar's test for
>dependent proportions as the 2 samples are not paired (such as before vs
>after in the same population). Can anyone recommend a suitable statistical
>test for the difference in these proportions? I'd be delighted with any
SAS
>code as well!
0
Reply art297 (4212) 7/17/2007 11:10:00 PM

wcw2@CDC.GOV wrote:
>
>I'm looking for an appropriate statistical test for the difference in
>proportions. I want to compare the proportion of a factor (for example,
>males) in population A versus the proportion in population B.
>However, population A is a partial subset of the larger population B (i.e.,
>most, but not all, of population A is contained in population B).
>I can't use the ChiSquare test as the 2 samples are not independent (such
>as cases vs controls). I'm reluctant to use the McNemar's test for
>dependent proportions as the 2 samples are not paired (such as before vs
>after in the same population). Can anyone recommend a suitable statistical
>test for the difference in these proportions? I'd be delighted with any SAS
>code as well!

Why?

I'm not being frivolous.  The underlying reason for this is likely to
determine
whether this is statistically appropriate or not.  Either way, you ought to
be looking at sample B vs. (population A - sample B).

If you have two proportions which come from different samples, but
there is an issue with the populations that the samples come from, then
that is a completely different question.

So I'm not clear on exactly what you want.

HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_2G_0507
0
Reply davidlcassell (5566) 7/26/2007 9:16:39 PM
comp.soft-sys.sas 131076 articles. 28 followers. Post

2 Replies
439 Views

Similar Articles

[PageSpeed] 12


  • Permalink
  • submit to reddit
  • Email
  • Follow


Reply:

Similar Artilces:

Re: stat question: comparing proportions in 2 dependent samples #2
wcw2@CDC.GOV wrote: > >I'm looking for an appropriate statistical test for the difference in >proportions. I want to compare the proportion of a factor (for example, >males) in population A versus the proportion in population B. >However, population A is a partial subset of the larger population B (i.e., >most, but not all, of population A is contained in population B). >I can't use the ChiSquare test as the 2 samples are not independent (such >as cases vs controls). I'm reluctant to use the McNemar's test for >dependent proportions as the 2 samples a...

Re: stat question: comparing proportions in 2 dependent samples
Take a look at the proc freq documentation. One thing you will find is: McNemar's test Beginning in Release 6.10, use the AGREE option in Base SAS PROC FREQ. Before Release 6.10, create a three-way table with a stratum variable identifying each subject (or matched group), a variable indicating each occasion (condition or individual within matched group), and a binary response variable. Then use the CMH option. For example, if each subject gives a binary response to each of two drugs, use the statement: tables subject*drug*response/cmh2 noprint; HTH, Art -------- On Tue, 17 Jul 2007 13...

Sampling Question #2
Suppose I take samples at once per minute for a data logger. Do I still need an anti-aliasing filter? The sampling freq would be 1/T where T=60 secs...so how is this possible? The values of the capacitors etc would be hugh. Of course I could over-sample1000s of times...is this the norm? K. On 1 Aug, 22:12, kronec...@yahoo.co.uk wrote: > Suppose I take samples at once per minute for a data logger. Do I > still need an anti-aliasing filter? The sampling freq would be 1/T > where T=3D60 secs...so how is this possible? The values of =A0the > capacitors etc would be hugh. Of course ...

A Sampling Question #2
Hello Everyone, I'd like to get some inputs on how to solve a sampling requirement at my end. I have two datasets - Dataset A (6M records) and Dataset B(50K records). Both datasets have a common ID/ Strata (ex: B100101,B110101, etc). Please note that there are more than 500 unique IDs. I'd like to use the distribution in Dataset B to extract 50K records from DatasetA. Any help would be much appreciated. Thanks, Ganesh Sounds like the general process is: a) use PROC FREQ (or similar) to generate a frequency dataset from DATASET B for your distribution; something like proc freq data=...

Database Compare,Data Compare tool,sort sync scipts with database dependencies. #2
DBC ensuring your applications don't fail due to a missing database structure object.DBC (Database Compare) is a fast, comprehensive database structure and data comparison tool.Sort sync scipts with database dependencies. goto www.d-softs.com ...

Comparing two proportions in SPSS (2 non-exclusive diseases in 2 data columns)
Hi, Thanks in advance for any wisdom you can share. I'm using SPSS 14.0 for Windows. I have a sample population of 1000 subjects. In my data, I have one column that says whether or not subjects have disease A. Let's say 20% have disease A. I have a second column that says whether or not subjects have disease B. Let's say 80% have disease B. I want to run an analysis using SPSS that will reveal if the prevalence of disease B is significantly greater than the prevalence of disease A in my population. (I would expect so given the large sample size and the substanti...

A PROC COMPARE QUESTION #2
I have used PROC COMPARE. In the dataset Everything is matching ( when I am doing hand check). But in teh output I am finding it is not. More ever for the heading: Variables with Unequal Values --- I see " Ndif" "MaxDif" ... what they mean? Can anyone please help me out? thanks The COMPARE Procedure (Method=EXACT) Values Comparison Summary Number of Variables Compared with All Observations Equal: 2. ...

Quadrature Sampling Question #2
My understanding is that you phase shift one signal by 90 degrees with a hilbert tranformer than sample - is that right? With a carrier based system you need only use sin and cos and then sample giving I and Q. The advantage appears to be that you can sample at B (bandwidth) rather than 2BHz. Can we extend this and phase shift by pi/4 and sample with 4 ADCs? In general we would get a sampling freq of (2/n).B where n is the number of samplers. Would this work or does it only work for the quadrature case. Also where is the proof of quadraure sampling? I understand the standard theory, it beg...

DSP56F801EVM sample question! #2
Hey All, I've got the DSP56F801E kit and latest evaluation release of code warrior. Being lucky enough, i was able to run the 80MHz sample which uses a push button to switch between the compare match register (fast and slow blink of LED7). Every thing is working fine i can compile, load and debug the sample. Now i want to know what exactly below code is doing: asm(bfclr #$0020,X:GPIO_B_xxx); asm(bfset #$0020,X:GPIO_B_xxx); bfset and bfclr are meant to say set the bit and clear the bit (fair enough) and thats ok with me. 'xxx' stands for DDR , DR, PER and that is also ok...

Re: A sampling question #2
It's not a question of the validity of the results, but of the standard error of the data. If power is low, it means that, even if there IS something to find, you're not that likely to find it, at least if you judge by statistical significance (not the best basis on which to judge). If you have a random sample, then the estimates that you get are valid, in the sense that they are unbiased; but they may not be very reliable, in that they (may) have high standard errors (but they may not - sometimes a VERY small data set is fairly conclusive). You don't provide many details, but ...

Re: stats question #2
Yes it is ordinal.... How to handle such a variable in clustering ?? It depends, there are several options available, however each of the treatments revolve around the question as to what are the other variables, in addition to age, being considered and whether the dispersion i.e interval in one variable is more important than the other. In other words, you may have a variable Income which is on a continuous scale and Age on an ordinal scale then in clustering, if no variable standardization is done, then Income will turn out as the main dimension and will get more weight in clustering because...

Random Sampling Question #2
Hi All- I have a data set (n=6500) with several variables. I want to have 500 random samples ( each sample n=1000) from the data set. Samples are drawn with replacement. I'm only interested in one of the variables, so all the samples can have only that variable. I want the 500 samples stored in a new data file with each sample as a variable (e.g. Sample1, sample2, ...sample 500). Is it possible to do this in SAS? Anybody knows how to do this? Your help will be much appreicated. Regards, Sandra ...

Re: Sampling or not, this is the question #2
On Mon, Oct 27, 2008 at 1:09 PM, sophe88@yahoo.com <sophe88@yahoo.com>wrote: > Hi, > > Hope to hear some input on this question. This is more like a thought > question. At frequency table > > variable2 > Variable 1 x y > z Row total > 0 366,700,892 256,364,259 592,514,321 > 1,215,579,472 > 1 0 265,412,326 69,512,786 > 334,925,112 > 2 0 26,5...

Re: A stat question #2
And what did the distribution for all these variables look like and what was the sample size? I would expect such variables to be positively skewed since there is an obvious floor (0) but no apparent ceiling. Therefore, the variability in total sales is likely to be quite large. Your sample size might be large enough to overcome the potential distributional problems but if the groups differed in sample size and variance, this could e a problem. Perhaps we need to see the sample sizes and standard deviations in addition to the means. Paul R. Swank, Ph.D. Professor, Developmental Pediatrics Me...

Bootstrap sampling question #2
Hi All, =A0 I want to generate bootstrap samples based on the following original sampli= ng: =A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0 =A0 =A0 =A0=A0=A0=A0=A0=A0Testing Results at Instrument = 1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Testing Results at Instrument 2 Sample ID=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 SITE1=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 SITE2=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 SITE3=A0=A0 SITE4=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 SITE1=A0= =A0=20 =A0 1=A0=A0=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0=A0=...

Re: Stat question #2
Hi Peter, It seems to me your table looks all right, it has all the info regarding groups and number of partners. So I do not quite understand what you mean with multiple independent variables; could you give an example? Would you think of something like age (categories) or sex? If so, wouldn't it just become a multi-dimensional table, without any deduced variable (except for age category itself)? Could you please be more specific in your Q? Another issue striking me is the about equal means between both groups as you say. This could be true but it is clearly influenced by the relatively...

Simple Sampling question #2
Hi, Sorry if this question is simple. I am trying to get this confusion out of my head. I have a hard time visualizing and understanding this simpling question. Let's say that we have only one tone signal (eg. 100MHz). So, when you look in the frequecny domain you will see a tone at 100MHz. Now, when we do sample it let's say at 400MHz sampling frequency, why do we see tones at (400-100=300MHz) and (400+100=500MHz)? I understand the math behind it, and I also know that sampling deosn't change the signal frequency, then how come we see tones at those frequencies ? ...

small stat question #2
Is (*x)2 always larger than *x2 ? = Individuals who have received this information in error or are not authoriz= ed to receive it must promptly return or dispose of the information and not= ify the sender. Those individuals are hereby notified that they are strictl= y prohibited from reviewing, forwarding, printing, copying, distributing or= using this information in any way. ...

Re: Two more stat questions #2
Peter: Your 2nd question is to test the slopes are equal for different regression lines. Given the DVs are very similar, you can create a dummy var for each of your DVs (let us call it dvid with values of 1,2,3...) then the interaction terms (dvid*iv) in your glm model will give you the test of b1 = b1. HTH Tom. -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Peter Flom Sent: Wednesday, August 24, 2005 8:20 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Two more stat questions Hello again This is for a response to reviews of an article on which I ...

Question :Comparing 2 elements in xsl
Hello !! I have this xml file : <?xml version="1.0" standalone="yes"?> <?xml-stylesheet type="text/xsl" href="tables.xsl"?> <aaa xmlns="http://tempuri.org/My.xsd"> <Dog> <DogName>f1</DogName> <Shot>16</Shot> <DogDetail> <DogName>f1</DogName> <Vet>google</Vet> <Degree>danny</Degree> <Vaxin>false</Vaxin> <DogText>amstaf kind of wolf</DogText> ...

Re: A PROC COMPARE QUESTION #2
Howard and Rathindronath , The simple solution is: data one ; A = 1 ; B = 'Y' ; C = 2 ; run ; data two ; X = 1 ; Y = 'Y' ; Z = 2 ; run ; Proc compare Base = One Compare = Two ; Var A B C ; With X Y Z ; run ; Toby Dunn From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM> Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM> To: SAS-L@LISTSERV.UGA.EDU Subject: Re: A PROC COMPARE QUESTION Date: Fri, 26 May 2006 16:33:18 -0400 On Fri, 26 May 2006 15:55:46 -0400, Rathi...

Question: Comparing stats from ANALYSE and DBMS_STATS
Hi, all. I'm in the process of implementing DBMS_STATS and the CBO in our environment. One of the things I'd like to do is compare the plans which result from the stats generated by DBMS_UTILITY and DBMS_STATS/CBO. I presume that the structure of the stats in the dictionary is the same regardless of how they are generated. Therefore, it seems reasonable that I could pull out the current stats (generated by DBMS_UTILITY) with the export_database_stats and stick them in a separate table which I'd created with DBMS_STATS. Is this reasonable? If I want to retain the current diction...

Oracle Stats Tuning Question #2
Could someone comment on these figures from our Oracle 8.1.7 database? I don't know what is acceptable or if this needs some tuning? Thank you in advance. log file sync 1,236,225 2,135,472 75.54 latch free 843,822 240,455 8.51 db file sequential read 307,571 164,172 5.81 log file parallel write 922,302 117,881 4.17 buffer busy waits 40,700 113,049 4.00 ...

Re: small stat question #2
Are you referring to the sum of the squares versus the squared sum (for calculating, say, effective base size)? The sum of the squared values is always less than the square of the sum of the values, yes (less than or equal to, anyhow), for positive numbers. -Joe On Tue, Jan 6, 2009 at 8:35 AM, Jamil Ibrahim <jibrahim@acadaff.umsmed.edu>wrote: > Hi, > Is (*x)2 always larger than *x2 > > thank you.... > Waseem... > > > > Individuals who have received this information in error or are not > authorized to receive it must promptly return or dispose of the inf...