Statistical Design Plan

Statistical Design Plan.pdf

Health Center Patient Survey

Statistical Design Plan

OMB: 0915-0368

Document [pdf]
Download: pdf | pdf
Attachment 10
Statistical Design Plan

September 2013

2014 Health Center Patient Survey
Deliverable 5: Statistical Design Plan

Prepared for
Charles Daly
Health Resources and Services Administration
Bureau of Primary Health Care
Parklawn Building, 5600 Fishers Lane
Rockville, MD 20857
Draft: July 26, 2013
Revision #1: September 13, 2013
Revision #2:
Revision #3:
Final:
Prepared by
Patrick Chen
Shampa Saha
Kathleen Considine
RTI International
3040 Cornwallis Road
Research Triangle Park, NC 27709

RTI Project Number 0213547.001.008

RTI Project Number
0213547.001.008

2014 Health Center Patient Survey
Deliverable 5: Statistical Design Plan

Prepared for
Charles Daly
Health Resources and Services Administration
Bureau of Primary Health Care
Parklawn Building, 5600 Fishers Lane
Rockville, MD 20857
Draft: July 26, 2013
Revision #1: September 13, 2013
Revision #2:
Revision #3:
Final:
Prepared by
Patrick Chen
Shampa Saha
Kathleen Considine
RTI International
3040 Cornwallis Road
Research Triangle Park, NC 27709

_________________________________
RTI International is a trade name of Research Triangle Institute.

TABLE OF CONTENTS
Section 1. Introduction ........................................................................................................... 1-1
Section 2. Target Population .................................................................................................. 2-1
Section 3. Overview of Sample Design ................................................................................. 3-1
Section 4. Grantee Sample Selection ..................................................................................... 4-1
4.1

Sampling Frame Construction ............................................................................. 4-1

4.2

Stratification......................................................................................................... 4-4

4.3

Grantee Sample Allocation .................................................................................. 4-6

4.4

Select Stratified PPS Sample of Grantees ............................................................ 4-8

4.5

An Illustrative Grantee Sample............................................................................ 4-9

4.6

Grantee Selection Probability ............................................................................ 4-11

Section 5. Site Sample Selection ........................................................................................... 5-1
5.1

Determine Eligible Sites within Participating Grantees ...................................... 5-1

5.2

Evaluate Distances between Eligible Sites .......................................................... 5-2

5.3

Oversampling Sites with Concentrated Patients in Three Race/Ethnicity
Categories ............................................................................................................ 5-2

5.4

Site Selection and Selection Probability .............................................................. 5-2

Section 6. Patient Sample Selection....................................................................................... 6-1
6.1

Patient Interview Allocation to Grantee .............................................................. 6-1

6.2

Patient Interview Allocation to Sites within Grantee .......................................... 6-1

6.3

Patient Screening and Selection ........................................................................... 6-1

6.4

Patient Selection Probability ................................................................................ 6-3

6.5

Patient’s Probability of Inclusion in the Study .................................................... 6-3

iii

Section 7. Sample Sizes and Statistical Power ...................................................................... 7-1
Section 8. Sample Weights .................................................................................................... 8-1
8.1

Grantee Sample Selection Weights ...................................................................... 8-1

8.2

Site Sample Selection Weights ............................................................................ 8-1

8.3

Patient Sample Selection Weights ....................................................................... 8-1

8.4

Nonresponse and Poststratification Weight Adjustments .................................... 8-2

Section 9. Data Collection ..................................................................................................... 9-1
9.1

Schedule ............................................................................................................... 9-1

9.2

Costs..................................................................................................................... 9-1

Section 10. Strengths and Limitations of Study Design ...................................................... 10-1
10.1 Strengths ............................................................................................................ 10-1
10.2 Limitations ......................................................................................................... 10-2
Section 11. References ......................................................................................................... 11-1

iv

EXHIBITS
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

Target Sample Sizes for the 2014 Health Center Patient Survey .................................... 1-1
Summary of Features and Benefits of Sample Design .................................................... 3-3
Grantee Characteristics in the Sampling Frame (2012 UDS) .......................................... 4-2
Distribution of Patients Served in 2012 ........................................................................... 4-3
Race/ethnicity and Age Group Distribution of Patients Served in 2012 ......................... 4-3
Expected Grantee and Patient Yields from Unstratified Random Sampling ................... 4-4
Definition of First-Level Stratification ............................................................................ 4-5
Grantee Sample Final Stratification ................................................................................. 4-6
Optimum Grantee Sample Allocation .............................................................................. 4-7
Grantee Sample Allocation and Sampling Rates in Final Grantee Strata ........................ 4-8
Expected Yield of the Grantee Funding Type and Patients of a Stratified
Disproportionate Sampling .............................................................................................. 4-9
Expected Grantee and Patient Sample Distribution by Region, Urban/Rural Area
and Grantee Size ............................................................................................................ 4-10
Patient Coverage Rates of 166 Grantees in Race/Ethnicity ........................................... 4-11
Oversampling and Nonoversampling Patient Group ....................................................... 6-2
Detecting Differences in Percentage Estimates between the Patient Survey and the
NHIS ................................................................................................................................ 7-2
Description and Data Source of Terms in Formulas Calculating Sample Weights ......... 8-4

v

SECTION 1.
INTRODUCTION
The 2014 Health Center Patient Survey, sponsored by the Health Resources and Services
Administration (HRSA), aims to collect data on patients who use health centers funded under
Section 330 of the Public Health Service Act. Results from the study will guide and support the
Bureau of Primary Health Care (BPHC) in its mission to improve the health of the nation’s
underserved communities and vulnerable populations by assuring access to comprehensive,
culturally competent, quality primary health care service. The 2014 Health Center Patient Survey
will collect data from the patients of health centers funded through four BPHC grant programs:
the Community Health Center program (CHC), the Migrant Health Center program (MHC), the
Health Care for the Homeless program (HCH), and the Public Housing Primary Care program
(PHPC).
Our goal is to recruit 165 grantees and complete 6,600 interviews, among them 3,630 for
the CHC funding program, 1,210 for the MHC funding program, 1,210 for the HCH funding
program, and 550 for the PHPC funding program. In addition, to meeting BPHC’s research
interests in race/ethnicity groups, patients of American Indian/Alaska Native (AIAN), Native
Hawaiian/Pacific Islanders (NHPI), and Asian race groups will be oversampled. Patients aged 65
or older will also be oversampled. The target sample sizes in three design domains, namely
funding program, race/ethnicity and age group, are shown in Exhibit 1.
Exhibit 1.
Funding
Program

Target Sample Sizes for the 2014 Health Center Patient Survey
Target Sample
Size

Race / Ethnicity

Target Sample Size Age Group

Target Sample
Size

CHC

3,630

Hispanic

2,044

0–17

2,200

MHC

1,210

Non-Hispanic White

1,558

18–64

3,200

HCH

1,210

Non-Hispanic Black

1,618

65+

1,200

550

Non-Hispanic AIAN

409

Non-Hispanic Asian

647

Non-Hispanic NHPI

251

Non-Hispanic Others

73

PHPC

In this report, we define the target population of the 2014 Health Center Patient Survey in
Section 2. An overview of sample design is presented in Section 3, and a detailed discussion of
the proposed three-stage sample design is presented in Sections 4 through 6. An illustrative

1-1

example of grantee sample using 2012 BPHC’s Uniform Data System (UDS) data is also
presented. In Section 7, we discuss sample sizes and power calculation in the context of the
illustrative example. Section 8 details the procedure for calculating sample weights. Data
collection schedules and costs are presented in Section 9. In Section 10, we list some strengths
and limitations of the study design.

1-2

SECTION 2.
TARGET POPULATION
The target population for the 2014 Health Center Patient Survey (HCPS) comprises of
persons who meet the definition of a health center patient used in the BPHC’s Uniform Data
System (UDS). These persons receive face-to-face services from a CHC, MHC, HCH, or PHPC
grantee clinical staff member who exercises independent judgment in the provision of services. 1
Patients from grantees located within the 50 United States and the District of Columbia are
included; while patients from grantees within U.S. territories and possessions are excluded.
Only persons who received services through one of these grantees at least once in the
year prior to the current visit are considered eligible for the survey. This eligibility criterion will
be used because many of the questions in the survey ask about services received in the past year;
individuals without previous visits will not be able to answer these questions and, therefore, are
not considered eligible. This eligibility criterion was also implemented in the BPHC’s 2009
Primary Health Care Patient Surveys (PHCPS), the 2002 Community Health Center Survey, and
the 2003 Healthcare for Homeless Survey.

1

To meet the criterion for “independent judgment,” the provider must be acting on his/her own when serving the
patient and not assisting another provider.

2-1

SECTION 3.
OVERVIEW OF SAMPLE DESIGN
In the 2014 Health Center Patient Survey, the primary analytic units are patients who
receive services from health sites in funded grantees. The patients are clustered within health
sites and the sites are clustered within the grantees. RTI International 2 will use a stratified threestage sample design. The grantees are the first stage of selection units, also known as the primary
sampling units (PSUs). Sites within selected grantees are the second stage of selection units, and
patients within selected sites comprise the third stage of selection units. We expect to achieve the
design goals and target sample sizes for funding programs by oversampling grantees
participating in PHPC, MHC, and/or HCH funding programs at the first stage. We expect to
achieve the target sample sizes for race/ethnicity by oversampling grantees and site(s) with
concentrated patients in one of the three race categories (AIAN, Asian, NHPI) at the first and
second stages and by oversampling patients in these three race/ethnicity categories at the third
stage as well. To achieve the target sample size for patients aged 65 or older, we will oversample
the older patients at the third stage of selection.
At the first stage, grantees will be selected using the stratified probability proportional to
size (PPS) sampling method (Kish, 1995). Grantees participating in PHPC, MHC, and HCH
funding programs and grantees with concentrated AIAN, Asian, or NHPI patients will be
oversampled. The oversampling is achieved by stratification and application of different
selection probability among strata. The explicit stratification is based on the type of funding a
grantee receives; the stratum of grantees receiving CHC funding only is further stratified
according to the proportions of patients in one of the three oversampling race/ethnicity
categories. Additionally, sorting the grantee frame by region, urbanicity, and grantee size (large,
medium, or small 3) before selecting grantee sample serves as the implicit stratification, and
ensures that the grantee sample has good coverage of regions, urban and rural areas, and grantee
sizes. Because of the high costs associated with recruiting a grantee and hiring a field interviewer
(FI) to perform the data collection, we will select an independent site and patient sample from
each funding program for grantees receiving multiple funding programs.
At the second stage, sites will be selected within participating grantees, and a maximum
of three sites per funding program is allowed in each grantee. If a grantee has three or fewer sites
2
3

RTI International is a trade name of Research Triangle Institute.
Eligible grantees are sorted by the patient volume in each grantee, and then the top third of grantees as classified
large, the middle third of grantees as medium, and the bottom third of grantees as small.

3-1

in a funding program, all eligible sites will all be selected, assuming they are in reasonable
proximity for an FI. A grantee with more than three sites in a funding program will have three
sites selected using PPS sampling, based on the number of patients served. Again, to ensure a
success of oversampling AIAN, Asian, and NHPI patients, sites with concentrated patients in
those three race/ethnicity categories will be oversampled.
At the third stage, patients will be selected as they enter the site and register with the
receptionist. Patients in three oversampling race/ethnicity categories and patients aged 65 or
older will be identified and oversampled; that is, they will have a higher probability of selection
than patients who are not in the oversampling groups. The receptionist will refer the first eligible
patients who are not in the oversampling groups to the FI when the FI indicates he/she is ready
for the next interview. The receptionist will refer patients in oversampling groups to the FI more
frequently. For each funding program, the same number of patient interviews will be completed
from each grantee to reduce unequal weighting effects (UWE) and maintain a balanced workload
across grantees. The total number of patient interviews within a grantee will be divided among
multiple sites if more than one site is selected for a funding program.
In our design, we take every measure to meet the design goals and reduce the design
effect (deff 4) due to clustering and oversampling. In summary, we present key elements of the
sample design and the associated benefits in Exhibit 2.

4

The design effect (Deff) is a measure of the precision gained or lost by the use of the more complex design instead
of a simple random sample. For a multistage cluster sample like the 2014 Health Center Patient Survey, deff is a
function of the clustering effect and the unequal weighting effect (UWE) and can be defined as deff = UWE*(1 +
(m-1)*ICC), where m is the number of patient interviews within a grantee, ICC is the intracluster correlation
coefficient that measures the degree of similarity among elements within a cluster, and UWE measures variation
in the sample weight. Deff can be reduced by reducing either UWE or the clustering effect or both.

3-2

Exhibit 2.

Summary of Features and Benefits of Sample Design
Key Design Features

PROS, CONS, and Comments

First Stage: Grantee Sample Selection (165 grantees will be recruited)
Stratification

PROS: Ensures a representative grantee sample and
enough grantees are selected for each funding program;
ensures the selected grantees have good coverage of
patients in three oversampling race/ethnicity categories.

Oversample PHPC, MHC, and HCH grantees and
grantees with high proportion of patients in three
oversampling race/ethnicity categories

PROS: Achieves oversampling goals in funding type,
and race/ethnicity categories.
CONS: Disproportionate sampling increases UWE.
COMMENTS: Select PPS grantee sample from each
stratum; it can reduce UWE.

Select independent sample for each funding program if
grantee received grants from multiple programs

PROS: Reduces data collection costs and helps reduce
clustering effect.

Second Stage: Site Sample Selection (at most three sites per funding program)
Select multiple sites if a grantee has more than one site

PROS: Reduces clustering effect. For the funding
program with more than three sites, PPS selection of
sites reduces UWE too.
CONS: Site selection process is tedious. Managing
data collection from multiple sites is more costly.
COMMENTS: Select sites within reasonable
proximity for an FI.

Oversample sites with concentrated patients in three
oversampling race/ethnicity categories

PROS: Achieves oversampling goals
CONS: Disproportionate sampling increases UWE.

Third Stage: Patient Sample Selection (3,630 for CHC, 1,210 for MHC, 1,210 for HCH, and 550 for PHPC)
Within each funding program, allocate same number of
interviews to each grantee

PROS: Creates even workload for FIs and reduces
clustering effect.

Select random sample as patients enter site and are
registered

PROS: Is suitable for mobile nature of some of the
target population.

Allocate interviews evenly to sites that are selected
through PPS

PROS: Maintains roughly equal weights within a
stratum, thus reducing UWE; creates even workload for
FIs.

Allocate interviews to sites proportional to patient size
of sites (for grantees with two or three sites)

PROS: Reduces UWE.

Oversample patients in three oversampling
race/ethnicity categories and patients aged 65 or older

PROS: Achieves oversampling goals
CONS: Disproportionate sampling increases UWE.

3-3

SECTION 4.
GRANTEE SAMPLE SELECTION
This section discusses the first stage of sample selection: the selection of grantees. It
covers sample frame construction, stratification, sample allocation, and selection of stratified
PPS grantee samples. An illustrative grantee sample is also presented and calculation of grantee
selection probability is discussed.
4.1

Sampling Frame Construction

BPHC UDS grantee-level data from the most recent available year will be used to
construct a sampling frame for the first stage of selection. The UDS is compiled each year from
annual data submissions by each Section 330-funded grantee. The UDS contains data on the
number of patients served, grantee characteristics, such as the type(s) of grant funding received,
state, urbanicity, and number of sites. The grantee characteristics will be used in stratification. In
this report, we use data from the 2012 UDS to illustrate the statistical design plan. Once the
Office of Management and Budget (OMB) approval has been received, the final sample will be
drawn using the most current UDS data.
The 2012 UDS data were collected from 1,198 grantees. Of these, 49 grantees will be
excluded from the sampling frame, including
■

twenty-nine grantees located in U.S. territories or possessions (i.e., those in Puerto
Rico, the Virgin Islands, and the Pacific Basin);

■

five grantees funded through the CHC program that only operated school-based sites
(see Section 5.1 for more detail on this decision);

■

four grantee with fewer than 300 patients;

■

eleven grantee that received MHC funding only and that served clients through a
voucher program; and

■

any grantee that has exited or will soon be exiting the Section 330 Program.

The grantee sampling frame includes 1,149 eligible grantees that reported in 2012. We
show the distribution of key grantee characteristics in Exhibits 3, 4, and 5. Exhibit 3 breaks
down the grantees by funding program, region, urban/rural status, and number of sites within a
grantee. In the grantee sampling frame, 823 grantees had a single funding program, while 326
grantees received funding from multiple programs. A total of 1,079 grantees (93.9%) received
CHC funding, either solely or in combination with other funding programs; 241 grantees (21%)
received HCH funding, either solely or in combination with other funding programs; 149
grantees (13%) received MHC funding, either solely or in combination with other funding
programs; and only 71 grantees (6.2%) received PHPC funding, either solely or in combination
with other funding programs. Roughly 66.2% of grantees received CHC funding solely.
4-1

Exhibit 3.

Grantee Characteristics in the Sampling Frame (2012 UDS)

Domain Category

Number of Grantees

Percent Distribution

Funding Program Received
C

761

66.23%

H

57

4.96%

M

2

0.17%

P

3

0.26%

CH

122

10.62%

CM

110

9.57%

CP

28

2.44%

MH

1

0.09%

PH

7

0.61%

CMH

25

2.18%

CMP

4

0.35%

CPH

22

1.91%

7

0.61%

CMPH
Total

1,149

100%

a

Region

Northeast

207

18.02%

Midwest

225

19.58%

South

405

35.25%

West

312

27.15%

Total

1,149

100%

Urban/Rural Location
Urban

615

53.52%

Rural

534

46.48%

Total

1,149

100%

Number of Sites
1

143

12.45%

2

143

12.45%

3

151

13.14%

4–9

437

38.03%

10–14

134

11.66%

15–19

56

4.87%

≥ 20

85

7.40%

Total

1,149

100%

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant Health Center
program; P = Public Housing Primary Care program; multiple acronyms used together indicate that funding
was received from multiple programs, e.g., CMH = a grantee received CHC, MHC, and HPC funding;
CMP = a grantee received CHC, MHC, and PHPC funding;.
a
“Region” refers to the census region.

4-2

Exhibit 4.

Distribution of Patients Served in 2012
Patient Distribution

Number of Patients

Range of Number of Patients
Minimum

327

25th percentile (Q1)

5,422

Median

11,533

75th percentile (Q3)

22,536

Maximum

183,327
17,930

Mean Number of Patients per Grantee

20,601,579

Total Number of Patients Across All Grantees

Exhibit 5.

Race/ethnicity and Age Group Distribution of Patients Served in
2012

Domain Category

Number of Patients

Percent Distribution

Race/Ethnicity
Hispanic

6,642,837

32.24%

Non-Hispanic White

7,607,947

36.93%

Non-Hispanic Black

4,149,038

20.14%

Non-Hispanic AIAN

207,863

1.01%

Non-Hispanic Asian

599,712

2.91%

Non-Hispanic NHPI

120,379

0.58%

1,273,803

6.18%

Non-Hispanic Others
Total

20,601,579

100%

Age Group
0–17

6,495,038

31.53%

18–64

12,640,287

61.36%

1,466,254

7.12%

65+
Total

20,601,579

100%

The number of sites within a grantee ranged from 1 to 116, and 863 grantees had at least
3 sites, with an average of about 7.6 sites per grantee. The South had 405 grantees, while the
West had 312 grantees. The Northeast and Midwest had roughly the same number of grantees
each: 207 and 225, respectively. Slightly more grantees were in urban areas than were in rural
areas.

4-3

Another important grantee characteristic is the number of patients served in 2012
(Exhibit 4). Among the 1,149 eligible grantees in the grantee sampling frame, the number of
patients receiving at least one face-to-face encounter for services during 2012 varied among the
grantees, ranging from 327 to 183,327 and averaging 17,930. The total number of patients was
approximately 20.6 million. Exhibit 5 displays the patient distributions of race/ethnicity and age
group, clearly showing that patients in AIAN, Asian, and NHPI race/ethnicity categories, and
patients aged 65 or older need to be oversampled to achieve the target sample sizes.
4.2

Stratification

As shown in Section 4.1, the majority of grantees receive grants from CHC funding,
while relatively few grantees receive PHPC, MHC, or HCH funding. A random selection of
grantees without any stratification would result in very small grantee sample sizes for PHPC,
MHC, and HCH funding programs. Exhibit 6 displays the expected number of grantees 5 yielded
for each funding program from an unstratified random grantee sample based on an experimental
selection of 100 independent grantee samples.
Exhibit 6.

Expected Grantee and Patient Yields from Unstratified Random
Sampling

Grantee Funding
Type

Target Number of
Complete Patient
Interview

Number of Grantees
Selected

Number of Patients
Required per Grantee

C

155

3,630

23.4

H

34

1,210

35.6

M

22

1,210

55.0

P

10

550

55.0

221

6,600

Total

40

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant Health Center
program; P = Public Housing Primary Care program.

The unstratified random samples have 155 CHC grantees, 34 HCH grantees, 22 MHC
grantees, and only 10 PHPC grantees. To meet the target of completed interviews for each
funding program, we have to complete a large number of interviews for the PHPC and MHC
funding programs, which has two implications: (1) the difficulty in recruiting many patients from
PHPC and MHC grantees within a short period of data collection because of the low number of
patients in PHPC or MHC grantees; and (2) the clustering effect is inflated as the number of
5

For a selected grantee participating in multiple funding programs, we take an independent sample for each funding
program. For example, if a grantee receiving both CHC and MHC funding is recruited, this grantee would be
counted as a CHC grantee and also as an MHC grantee.

4-4

completed interviews per grantee increases, and consequently the estimates will have low
precision and the statistical power of comparison is reduced.
Stratification is needed to achieve target sample sizes for four funding programs with
relatively small cluster sizes. 6 We will group grantees into four exclusive strata according to the
types of funding they receive. These four groups will serve as the first-level strata and are
defined in Exhibit 7.
Exhibit 7.

Definition of First-Level Stratification
Grantee Funding
Type

First-Stage Strata

Number of Grantees
in Sampling Frame

Stratum 1: Grantees received PHPC funding solely or in
combination with other programs.

P; CP; PH; CMP;
CPH; CMPH

71

Stratum 2: Grantees received MHC funding solely or in
combination with other programs.

M; CM; MH; CMH

138

Stratum 3: Grantees received HCH funding solely or in
combination with other programs.

H; CH

179

C

761

Stratum 4: Grantees received CHC funding solely.
Total

1,149

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant Health Center
program; P = Public Housing Primary Care program.

AIAN, Asian, and NHPI patients are not evenly distributed among all grantees. They tend
to be clustered in a few grantees: 889 grantees had fewer than 100 AIAN patients, 1,000 grantees
had fewer than 100 NHPI patients, and 650 grantees had fewer than 100 Asian patients. The 20
grantees with highest proportion of AIAN patients account for 37.1% of total AIAN patients in
all 1,149 grantees; 20 grantees with highest proportion of NHPI patients account for 51.4% of
total NHPI patients; and 20 grantees with highest proportion of Asian patients account for 36.2%
of total Asian patients. Thus, to achieve target sample sizes in three race/ethnicity categories,
grantees with concentrated patients in those three race/ethnicity categories must be obtained and
selected at the first-stage selection. Grantees with more than 20% of patients in one of the three
race/ethnicity categories are considered patient-concentrated grantees. Stratum 4 (CHC funding
solely) has over 89% of such grantees, and very few such grantees are from Strata 1, 2, and 3.
Therefore, to effectively select grantees with concentrated patients in three race/ethnicity
categories, Stratum 4 is further divided into four second-level strata according to whether a

6

Cluster size is measured as the number of completed interviews within a grantee for a funding program.

4-5

grantee has concentrated patients (over 20%) in one of the three race/ethnicity categories. The
result is a total of seven final grantee strata, shown in Exhibit 8.
Exhibit 8.

Grantee Sample Final Stratification
Grantee Funding
Type

Final
Stratum

Stratum 1: Grantees received PHPC funding solely
or in combination with other programs.

P; CP; PH; CMP;
CPH; CMPH

1

71

Stratum 2: Grantees received MHC funding solely
or in combination with other programs.

M; CM; MH; CMH

2

138

Stratum 3: Grantees received HCH funding solely
or in combination with other programs.

H; CH

3

179

Stratum 4: Grantees received CHC funding solely.

C

Stratum 4.1: Grantees with more than 20% of
AIAN patients

C

4

31

Stratum 4.2. Grantees with more than 20% of
Asian patients

C

5

16

Stratum 4.3. Grantees with more than 20% of
NHPI patients

C

6

10

Stratum 4.4: All remaining grantees in Stratum 4

C

7

704

First-Stage and Second-Stage Strata

Total

Number of Grantees
in Sampling Frame

1,149

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant Health Center
program; P = Public Housing Primary Care program.

Although some grantees have a high proportion of patients aged 65 or older, these older
patients are distributed more evenly than the patients in three race/ethnicity categories. The 20
grantees with highest proportion of patients aged 65 or older only account for 2.04% of total
patients aged 65 or older. As a result, oversampling grantees with concentrated patients aged 65
or older at the first stage of selection will not be as effective as oversampling grantees with
concentrated patients in the three race/ethnicity categories. Thus, we decided not to oversample
grantees with concentrated patients aged 65 or older.
4.3

Grantee Sample Allocation

Before selecting a grantee sample from each final stratum, we need to determine the
grantee sample allocation for each final stratum. To minimize the variation in sample weights
introduced by oversampling grantees who received funding from PHPC, MHC, or HCH
programs, and grantees with concentrated patients in three oversampling race/ethnicity
categories, we allocate the grantee sample such that a minimum UWE is achieved. We employed

4-6

a nonlinear optimization procedure OPTMODEL in SAS 7, which minimizes the UWE with the
following constraints:
■

select 165 grantees;

■

complete 6,600 interviews;

■

complete 3,630 CHC interviews, 1,210 MHC interviews, 1,210 HCH interviews, and
550 PHPC interviews;

■

compete interviews per grantee: 22 for CHC, 25 for MHC, 25 for HCH, and 15 for
PHPC; and

■

select at least one grantee from each grantee type. 8

The optimum sample allocation to each grantee type is presented in Exhibit 9. After
aggregating grantee allocations to the seven final strata, the grantee sample allocation to the
seven strata along with the sampling rates in each stratum are shown in Exhibit 10. The
sampling rates for Strata 1, 2, 4, 5, and 6 are much higher than the overall sampling rate (14.5%),
indicating that we oversample grantees in these strata.
Exhibit 9.

Optimum Grantee Sample Allocation

Domain Category

Funding Program Received
C
H
M
P
CH
CM
CP
MH
PH
CMH
CMP
CPH
CMPH
Total

Number of Grantees

761
57
2
3
122
110
28
1
7
25
4
22
7
1,149

Grantee Sample Allocation

76
1
1
1
16
25
11
1
1
10
4
12
7
166*

Note: The optimum grantee sample allocation results in 166 grantees instead of 165 due to rounding.
C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant Health Center
program; P = Public Housing Primary Care program; multiple acronyms used together indicate that funding
was received from multiple programs, e.g., CMH = a grantee received CHC, MHC, and HPC funding; CMP
= a grantee received CHC, MHC, and PHPC funding;.

7
8

http://support.sas.com/documentation/cdl/en/ormpug/59679/HTML/default/viewer.htm#optmodel.htm
Grantee type is defined according to what funding program(s) a grantee participated or received funding from.

4-7

Exhibit 10. Grantee Sample Allocation and Sampling Rates in Final Grantee
Strata

First-Stage and Second-Stage Strata

Final
Stratum

Number of
Grantees in
Sampling
Frame

Grantee
Sample
Allocation

Sampling
Rate

Stratum 1: Grantees received PHPC funding solely
or in combination with other programs.

1

71

36

50.7%

Stratum 2: Grantees received MHC funding solely
or in combination with other programs.

2

138

37

26.8%

Stratum 3: Grantees received HCH funding solely or
in combination with other programs.

3

179

17

9.5%

Stratum 4.1: Grantees with more than 20% of
AIAN patients

4

31

25

80.6%

Stratum 4.2. Grantees with more than 20% of
Asian patients

5

16

13

81.3%

Stratum 4.3. Grantees with more than 20% of
NHPI patients

6

10

8

80.0%

Stratum 4.4: All remaining grantees in Stratum 4

7

704

30

4.3%

1,149

166

14.5%

Stratum 4: Grantees received CHC funding solely.

Total

4.4

Select Stratified PPS Sample of Grantees

As mentioned in Section 4.1, the grantees differ widely in the number of patients served.
PPS sampling is a commonly used method of unequal probability sampling to handle the large
variation in patients served among grantees. In this method, the probability of a cluster being
sampled is proportional to a size measure. The size measure will be the number of patients who
visited the grantee for services from the 2012 UDS file. We will use PPS sampling to select the
grantee sample from each final stratum.
A PPS grantee sample will be selected using the SAS SURVEYSELECT 9 procedure with
predetermined sample allocation in Exhibit 10 for each final stratum. During the selection, in
addition to the seven strata for grantee sample selection discussed above, we will sort the
sampling frame by region (Northeast, Midwest, South, and West), urban/rural location, and the
grantee size (large, medium, small) when applying Chromy’s (1981) probability minimal
replacement sequential PPS selection procedure. Sorting the sampling frame by these key
grantee characteristics and then applying the PPS sequential procedure induces implicit
stratification according to the order of the units in a stratum. Therefore, the selected grantee

9

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#surveyselect_toc.htm

4-8

samples will be distributed among various regions, urban/rural locations, and various grantee
sizes to ensure a representative grantee sample is selected.
4.5

An Illustrative Grantee Sample

In this section we present an illustrative example of a grantee sample based on a
simulation study where 100 independent grantee samples are selected, and the results are
averaged over the 100 samples.
In this example, 166 grantees were selected with the sample allocation for the final seven
strata specified in Exhibit 10. The PPS sequential method was used to select the grantees from
each of the seven strata, and this process was repeated 100 times. As stated in Section 4.2, an
independent sample was selected for each funding program, if a selected grantee participated in
multiple funding programs. This process yielded 292 grantees for four funding programs: 163
CHC grantees, 46 HCH grantees, 47 MHC grantees, and 36 PHPC grantees, as shown in
Exhibit 11. To achieve the interview targets for each funding program, the expected number of
complete interviews per grantee for each funding type was calculated, as displayed in
Exhibit 11.10
Exhibit 11. Expected Yield of the Grantee Funding Type and Patients of a
Stratified Disproportionate Sampling

Funding Program

Number of Grantees for
Each Funding Program

Average Number of Patients
per Grantee

Number of Completed
Interviews for Each Funding
Program

C

163

22.3

3,630

H

46

26.3

1,210

M

47

25.7

1,210

P

36

15.3

550

Total

292

6,600

NOTE: C = Community Health Center program; H = Healthcare for Homeless program; M = Migrant Health Center
program; P = Public Housing Primary Care program.

Exhibit 12 displays the grantee sampling frame and expected sample distribution by
region, urban/rural area, and grantee size from the illustrative example. In the distribution of
regions, the West has higher proportion in the grantee sample, while the proportions of the other
three regions in the grantee sample are lower compared to the grantee sampling frame. This
difference is mainly due to oversampling grantees with concentrated AIAN and NHPI patients;
10

Note that during the sampling plan implementation, the sample realization may yield a slightly different
distribution of grantees for each funding type.

4-9

the majority of these grantees are in the West region (Alaska and Hawaii). The grantee sample
has higher proportions in urban areas compared to the grantee sampling frame; the reason for this
difference is that we oversample PHPC grantees and they are mainly in urban areas. The grantee
sample has lower proportions of small and medium-size grantees compared to the grantee
sampling frame. This disparity occurs because of the PPS sampling method employed in grantee
sample selection, which gives grantees with large patient volume a better chance of being
selected than grantees with small patient volume. A best practice is to select more large grantees
so as to lower data collection costs: a large patient volume ensures that the quota per grantee (as
shown in Exhibit 11) can be easily met within the data collection time period.
In general, our proposed grantee sample selection algorithm generates grantee samples
that represent different regions, urban/rural areas and grantee size very well.
Exhibit 12. Expected Grantee and Patient Sample Distribution by Region,
Urban/Rural Area and Grantee Size
Grantee Frame
Domains

Region

N

Expected Grantee Sample
%

n

%

1,149

100.00

166

100.00

Northeast

207

18.02

26

15.36

Midwest

225

19.58

28

16.87

South

405

35.25

39

23.53

West

312

27.15

73

44.23

1,149

100.00

166

100.00

Urban

615

53.52

103

62.16

Rural

534

46.48

63

37.84

1,149

100.00

166

100.00

Large

391

34.03

115

69.52

Medium

379

32.99

28

16.81

Small

379

32.99

23

13.66

Urban/Rural

Grantee Size

To evaluate the effectiveness of oversampling grantees with concentrated patients in the
three oversampling race/ethnicity categories (AIAN, NHPI, and Asians), we calculated the
coverage rates 11 of the three race/ethnicity categories from the sampled 166 grantees (see
11

Coverage rate is the ratio of (number of patients in the selected grantees / number of patients in all 1,149
grantees).

4-10

Exhibit 13). The 166 selected grantees cover 26.4% of patient population from all 1,149
grantees. The coverage rate for AIAN patients is 47%, 46.6% for NHPI patients, and over 50%
for Asian patients, while the coverage rate for other races is 25.3%. With the high coverage rates
from the selected grantees, additional oversampling of sites with concentrated patients at the
second selection stage, and oversampling of patients in the three race/ethnicity categories at the
third selection stage, we are very confident that we can achieve the oversampling goals in the
three race/ethnicity categories. The oversampling procedure at the second and third stages of
selection is discussed in Sections 5 and 6.
Exhibit 13. Patient Coverage Rates of 166 Grantees in Race/Ethnicity
Race/Ethnicity

Patient Coverage Rate

American Indian/Alaska Native

47.0%

Asian

51.2%

Native Hawaiian/Pacific Islander

46.6%

Other Races

25.3%

Overall

26.4%

4.6

Grantee Selection Probability
The selection probability for the ith grantee within the hth stratum can be calculated as

G

hi

=n

S

hi
h ∑S
hi
i
,

(1)

where h stands for the strata (h = 1, 2, …, 7, corresponding to 7 final strata); i is the index for
grantees on the frame within each stratum; nh is the number of grantees to select in the hth
stratum; and Shi is the size measure, which is the number of patients served by each grantee. Note
that we assume an 80 percent participation rate among grantees based on the results of the 2009
PHCPS. As a result, nh will be inflated to account for nonresponse among sampled grantees.
We are aware that applying different sampling rates for each stratum and oversampling at
the second stage and the third stage will cause deviations from a self-weighting design. As a
result, the variations in sample weights will be increased and variances in survey estimates will
be inflated, thereby reducing precision or statistical power in data analysis. To maintain a near
self-weighting design within each stratum, we will select sites within grantees using PPS

4-11

sampling in the second stage of selection and select the same number of patients per grantee in
the third stage.

4-12

SECTION 5.
SITE SAMPLE SELECTION
As discussed previously, more than two thirds of grantees have three or more sites. In
general, grantees with more sites tend to have more patients. At the first-stage selection, grantees
are selected with the PPS method, which means that grantees with large numbers of patients have
a higher probability of being selected in the sample. As a result, we expect a fair number of the
grantees recruited to have more than three sites. We will spread the sample of patients across
multiple sites to reduce the within-grantee clustering effect and increase the precision of the
analysis. We will select, at most, three sites for each funding program within a grantee for the
2014 Health Center Patient Survey. This section discusses the second stage of selection: the
selection of sites from participating grantees that have multiple sites.
5.1

Determine Eligible Sites within Participating Grantees

Once a grantee is recruited and agrees to conduct the study in its sites, our recruiters will
work with the grantee’s administration to identify eligible sites. The following eligibility criteria
will be used, and we will consult with the BPHC Contracting Officer Representative (COR) to
determine the site eligibility on a case-by-case basis whenever it is necessary.
■

The site should participate in at least one of the four specific funding programs and
must have been operating under the grantee for at least 1 year.

■

The site is not a school-based health center.

■

The site is not a specialized clinic, except clinics providing OB/GYN services.

■

The site does not provide services only through the migrant and seasonal farmworker
voucher screening program.

■

A site serves at least 100 patients.

After eligible sites are identified, we will collect from or verify with each participating
grantee the following information:
■

number of eligible sites serving each patient type (i.e., migrant and seasonal
farmworkers, homeless, public housing, and general patients);

■

address and contact information for each eligible site;

■

number of patients served in each eligible site, overall and by type of patient (CHC,
MHC, HCH, and PHPC); and

5-1

■

5.2

sites with concentrated patients in one of the three race/ethnicity categories (AIAN,
Asian, or NHPI)

Evaluate Distances between Eligible Sites

In most cases, one FI will be hired to collect data for each participating grantee.
Therefore, selected sites must be within manageable distances for the FI(s). The grantees tend to
operate sites in relatively localized areas. Our sampling staff will evaluate distances between the
administrative office/central site and the associated sites. For a specific funding program, the site
with the largest patient volume could be used as the central site. Typically sites will be excluded
if they are located more than 100 miles from the central site. However, we will, consult with the
BPHC COR to determine whether special data collection arrangements should be made for
remote sites.
5.3

Oversampling Sites with Concentrated Patients in Three Race/Ethnicity Categories

To achieve our target sample sizes of AIAN, Asian, and NHPI patients, we will not only
oversample grantees with concentrated patients in these three race groups at the first stage of
selection, but we will also identify sites with concentrated patients in at least one of the three
targeted race/ethnicity categories. These sites will be selected with higher probabilities than sites
without concentrated patients.
5.4

Site Selection and Selection Probability

If there are three or fewer sites for a patient type (i.e., migrant and seasonal farmworkers,
homeless, public housing, and general patients) and they are within a manageable distance for
one FI, all of the sites will be included in the study. If one site is far from the other sites and the
other sites are close to one another, the two sites that are close to each other will be selected.
However, if all three sites are far from one another, we will select the site with the largest patient
volume. Similarly, when two sites for a specific funding program are far from each other, the one
with the largest number of patients will be selected. Again, these special cases will be reviewed
with the COR.
For grantees with more than three sites for a patient type, we will use a PPS sampling
method similar to the one for grantees discussed in Section 4.4 to select three sites from the sites
within a manageable distance. The number of patients served by each site under a specific
funding program will serve as the size measure in the PPS sampling. For the grantees that
participate in multiple funding programs, an independent PPS selection of sites will be conducted
for each funding program, if needed.

5-2

The selection probability for the jth site within the ith grantee for funding program f is
given by



 1 , if 3 or fewer sites are all selected, or

C fij = 
 3s fij
, if 3 sites are selected through PPS sampling,

s fij

 j

(2)

∑

where sfij is the number of patients in site j within grantee i for funding program f. Based on our
experience with the 2009 PHCPS, we expect nearly all selected sites within participating
grantees to participate in the 2014 HCPS.

5-3

SECTION 6.
PATIENT SAMPLE SELECTION
Because some of the target populations of this study are quite mobile, a random sample
of patients will be selected for interview as they enter the site and register with the receptionist
for services. An FI will visit a selected site for a predetermined number of days and time slots in
the sampling period to conduct interviews. This section of the report presents the methodology
and specifications for selecting patients from participating sites.
6.1

Patient Interview Allocation to Grantee

To achieve the near self-weighting sample of patient interviews within each grantee
stratum, the same number of patients will be interviewed from the grantees in each funding
program. As shown in Exhibit 11 in Section 4.5 from the illustrative grantee sample example,
162 CHC grantees, 47 MHC grantees, 45 HCH grantees, and 36 PHPC grantees are to be
recruited. To achieve 3,630 completed interviews for CHC, we will need to complete 22–23
patient interviews per CHC grantee. We will need 25–26 completed interviews per MHC grantee
to achieve 1,210 interviews for MHC; 26–27 completed patient interviews per HCH grantee to
yield a total of 1,210 interviews for HCH; and 15–16 completed interviews per PHPC grantee to
yield a total of 550 interviews for PHPC.
6.2

Patient Interview Allocation to Sites within Grantee

Within each grantee, we will use different methods to allocate patient interviews to
multiple sites for grantees with three or fewer sites in a funding program and grantees with more
than three sites in a funding program. For grantees with three or fewer sites, the number of
patient interviews within that grantee will be allocated proportionally to the patient size of the
sites. That is,
n fij = n fi

s fij

∑s
j

fij

,

where nfi is the number of patients selected from a grantee for funding program f. For grantees
with more than three sites that are selected through PPS, the number of selected patients will be
divided equally among three selected sites. Doing so will help to reduce the UWE.
6.3

Patient Screening and Selection

RTI will design a screening sheet that the receptionist can use to screen and select
patients when a patient enters the site and registers for service. A patient will be considered
6-1

eligible if the patient received service through one of the grantees supported by BPHC funding
programs at least once in the past 12 months prior to the current visit. The receptionist will ask
eligible patients questions about their race/ethnicity and age to determine whether they belong to
the oversampling groups. If a patient belongs to a group that will not be oversampled, the
receptionist will select the first eligible patient registered after the FI has informed the
receptionist that he/she is ready for the next interview. The receptionist will read a brief script
about the study to the selected patient and direct the patient to the FI for questions or
participation. If a patient belongs to one of the oversampling groups, the receptionist will select
the patient and send the patient to the FI if he/she is available; when the FI is working on an
interview or unavailable, the receptionist will give the selected patient a yellow laminated card
and instruct him/her to wait in a designated area. When the FI is available and ready, the FI will
look for a person holding a yellow laminated card. Exhibit 14 shows the oversampling and nonoversampling groups based on patients’ age and race/ethnicity.
Exhibit 14. Oversampling and Nonoversampling Patient Group
Patient Group

65+, All Race/Ethnicity

Oversampling
Group

Eligible

Referred/
Selected

Yes

0–64, AIAN

Yes

0–64, Asian

Yes

0–64, NHPI

Yes

0–64, Other Race/ethnicity

Visited

No

The receptionist will be asked to keep track of the number of patients who enter the site,
the number of patients who are eligible, and number of patients selected while the FI is at the site
to conduct data collection for each patient group, as shown in Exhibit 14. The receptionist will
either use tally marks to count patients as they enter or complete a table based on the sign-in
sheet or appointment list before the FI leaves the site. The patient count sheets for each FI data
collection visit will be sent to RTI for data entry, and counts will be used to calculate the analysis
weights for the study. For sites that have more than one receptionist, all receptionists must track
number of visited, eligible and selected patients even though we may only recruit patients using
one receptionist.
If a site is chosen for data collection in multiple funding programs, the FI will screen
participating patients to determine patient population type (i.e., homeless, migrant and seasonal

6-2

farmworkers, public housing, or low income) and will use the appropriate questionnaire to
conduct the patient interview.
We will closely monitor the data collection and adjust the sampling rate if necessary to
ensure that target sample sizes in three race/ethnicity categories and patient aged 65 or older are
met.
6.4

Patient Selection Probability
The selection probability of patient k from grantee i, site j for funding program f is given

by
Pfijk =

m fij weeks
M fij 52

,

(3)

where Mfij is the number of eligible patients in the site during the sampling window (number of
weeks) and where mfij is the target number of selected patients inflated for nonresponse. We may
have to estimate the proportion of patients from different funding programs if the site is selected
in data collection for more than one funding program. The proportion of patients from different
funding programs for the grantee or other sites within the grantee can be used as an
approximation. Note: the patient selection probability will be calculated separately for each
patient group as shown in Exhibit 14.
6.5

Patient’s Probability of Inclusion in the Study

The probability of a patient being included in the study is the product of Ghi, Cfij, and Pfjik
in Formulas (1), (2), and (3), respectively. That is,

π hfijk =

nh s hi 3s fij m fij weeks
∑ s hi ∑ s fij M fij 52
i

j

.

(4)

The design is supposed to achieve near self-weighting within each grantee stratum if no
oversampling is conducted when selecting sites at the second-stage selection, and no
oversampling of patients is conducted at the third-stage selection. The oversampling at the
second and third stages causes the deviation from a near self-weighting design, meaning
probabilities in Formula (4) will not be equal within the same grantee stratum. As a result, the
UWE will be inflated.

6-3

SECTION 7.
SAMPLE SIZES AND STATISTICAL POWER
Statistical tests use data from samples to determine whether a difference exists in a
population or between two populations. An example of a statistical test is testing the null
hypothesis that the number of uninsured children aged 12 or younger does not differ between the
population of the 2014 Health Center Patient Survey and general population for the National
Health Interview Survey (NHIS). The power of the test is the probability that the test will find a
statistically significant difference between two populations given that there is a true difference
between those two populations. There is always a chance that the samples will appear to support
or to refute a tested hypothesis when the reality is the opposite. That risk is quantified as the
statistical significance level. We use a significance level of 0.05 to calculate statistical power in
this document.
To reduce data collection costs and meet the target sample sizes for four funding
programs and for race/ethnicity and age groups, we propose a stratified three-stage clustering
design and oversampling of certain subgroups. Large variations in sample weights due to
oversampling and the intra-class correlation among patients from the same grantee due to
clustering can increase sampling error, thereby reducing statistical power and precision of survey
estimates. The design effect (Deff) can be used to measure the loss of precision and statistical
power due to oversampling and clustering. Deff is a function of the clustering effect and the
unequal weighting effect (UWE) and can be defined as Deff = UWE*(1 + (m−1)*ICC), where m
is the number of patient interviews within a grantee, ICC is the intracluster correlation
coefficient that measures the degree of similarity among elements within a cluster, and UWE
measures variation in the sample weight. Deff can be reduced by reducing either UWE or the
clustering effect or both. The effective sample size is the target sample size divided by Deff.
Exhibit 15 displays the power calculation for proportion estimates between the 2014
Health Center Patient Survey and 2011 NHIS, showing that minimum differences can be
detected with 80% of statistical power at the 0.05 level for various domains. In the calculation,
we used a proportion (p=0.5); the statistical power is the smallest for proportion estimates when
the proportion is in the middle range (0.4–0.6) because the variance is the largest. The detectable
differences will be smaller if the proportion estimate is out of the middle range.

7-1

Exhibit 15. Detecting Differences in Percentage Estimates between the
Patient Survey and the NHIS
Detectable
Difference %

2011 NHIS

Expected
Sample Size

Estimated Deff

Effective
Sample Size

Sample Size

Estimated Deff

Effective
Sample Size

b

Patient Survey

Hispanic

2,044

4.0

511

24,539

2.0

12,269

6.3

NH-White

1,558

4.0

390

53,192

2.0

26,596

7.2

NH-Black

1,618

4.0

405

14,629

2.0

7,315

7.2

NH-Asian

647

4.0

162

6,795

2.0

3,398

11.2

NH-American
Indian/Alaska Native

409

4.0

102

600

2.0

300

15.7

NH-Native Hawaiian/
Pacific Islandera

251

4.0

63

204

2.0

102

21.8

73

4.0

18

1,916

2.0

958

—

Medicaid only

1,937

4.0

484

13,783

2.0

6,891

6.6

Medicare only

339

4.0

85

5,212

2.0

2,606

15.1

Medicaid and Medicare

334

4.0

84

1,520

2.0

760

15.8

Other

1,360

4.0

340

64,453

2.0

32,226

7.6

Uninsured

2,526

4.0

632

16,907

2.0

8,453

5.8

0 to 17

2,200

4.0

550

26,802

2.0

13,401

6.1

18 to 64

3,200

4.0

800

62,556

2.0

31,278

5.0

65+

1,200

4.0

300

12,517

2.0

6,258

8.3

Total

6,600

5.0

1,320

101,875

2.0

50,937

3.9

Domain

Race/Ethnicity

NH-Othersb
Insurance Status

Age Group

a

Due to the data confidentiality, NHIS did release rare race categories in the 2011 public use file, such as Native
Hawaiian and Pacific Islander. These rare race categories were combined in the ‘Other’ category. We used
the proportion of Native Hawaiian/Pacific Islander in the 2010 Census
(http://www.census.gov/prod/cen2010/briefs/c2010br-02.pdf) to estimate the sample size of this race
category for the 2011 NHIS.
b
Projected sample size too small for detecting differences with acceptable power.

The power analysis estimates in Exhibit 15 shows that the detectable differences are well
below 8% between the 2014 Health Center Patient Survey and the 2011 NHIS for race/ethnicity,
insurance status and age group domains except for Non-Hispanic Asian, Non-Hispanic American
7-2

Indian/Alaska Native, Non-Hispanic Native Hawaiian/Pacific Islander, Medicare Only, and
Medicaid & Medicare due to small sample sizes.

7-3

SECTION 8.
SAMPLE WEIGHTS
Patients, the primary analytic units for the 2014 Health Center Patient Survey, are
selected through a three-staged sample design, as discussed in Sections 4–6. Disproportionate
sample selection is used at all three stages; therefore, the patient samples are not self-weighting.
To make inferences about the target population or any subdomains of the target population,
sample weights are needed. We will calculate base weights for each respondent reflecting each
respondent’s probability of inclusion in the study. To account for nonresponse, a nonresponse
adjustment on the base weight will be calculated. Poststratification adjustment will also be
conducted to adjust for coverage bias and reduce variance.
8.1

Grantee Sample Selection Weights

The first-stage sampling weight for each grantee will be the inverse of the probability of
selection as calculated in Formula (1) in Section 4.6. Therefore, the grantee sample selection
weight for grantee i within the hth stratum is given by
w (1) hi = 1 / Ghi .

8.2

(6)

Site Sample Selection Weights

For the grantees that have more than three sites for a specific funding program, a
subsample of three sites was selected as discussed in Section 5.4. Thus, the site sample selection
weight for the jth site within the ith grantee for funding program f is given by

w ( 2 ) fij = 1 / C fij

,

(7)

where C fij is calculated in Formula (2).
8.3

Patient Sample Selection Weights

From the patient recruitment logs, the number of eligible patients, the number of patients
who were selected by a receptionist and sent to an FI, and the number of patients who agreed to
participate during the patient recruitment time periods will be determined. The number of
patients selected at each site for a specific funding program within a participating grantee,
summed across the days in which the sampling for that site took place, will be divided by the
total number of patients the site served in the year prior to the survey year, to obtain the
probability of selection for each patient as discussed in Section 6.4. Thus, the patient sample

8-1

selection weight for the kth patient at the jth site within the ith grantee for funding program f is
given by

w (3) fijk = 1 / Pfijk

,

(8)

where p fijk is calculated in Formula (3).
The product of three weight components discussed above forms the design-based weights
for each patient. That is,

w fijk = w (1) hi ⋅ w ( 2 ) fij ⋅ w (3) fijk
8.4

.

(9)

Nonresponse and Poststratification Weight Adjustments
To reduce the nonresponse bias on the estimates, the design-based weight w fijk will be

adjusted for nonresponse. A nonresponse adjustment will be calculated separately for each
funding program. Since we have age and race information for both respondents and
nonrespondents collected by receptionists, weighting classes will be formed by age group and
race/ethnicity, and a ratio adjustment will be calculated within each class. The adjustment within
each class is calculated as:
Adj nr = ∑s w fijk / ∑r w fijk

,

(10)

where s is for all selected patients and r is for respondents.
The poststratification is anticipated to reduce the coverage bias and variance of survey
outcomes, and it will be implemented using RTI’s generalized exponential model (GEM; Folsom
and Singh, 2000). Coverage bias can occur when a set of individuals in a sample does not match
the target population. For example, if there are more young patients in the study, then estimates
based on the sample may be biased if young patients respond to survey questions differently
from patients in other age groups. Poststratification adjustment adjusts weights in such a way
that weights for young patients will be adjusted downward. Thus, the youth over-representing
issue in the sample is corrected. GEM can use more predictors in the model than the
conventional weighting class methods. The predictors will be limited by available data from the
UDS, including age, race/ethnicity, gender, and poverty level. A separate poststratification will
be conducted for each funding program so that the sum of final analysis weights from all
respondents in a funding program will match the total number of patients served by the
corresponding funding program. The poststratification adjustment factor denotes Adj ps .

8-2

The final analysis weights for 2014 Health Center Patient Survey are the product of the
design-based weights and two adjustment factors. That is,
ANALWT fijk = w fijk ⋅ Adj nr ⋅ Adj ps

.

Exhibit 16 displays and explains the terms in the formulas from this section and from
Sections 4 through 6 and provides the resource of the information as well.

8-3

(11)

Exhibit 16. Description and Data Source of Terms in Formulas Calculating Sample Weights
Formula

S
hi
G =n
hi
h ∑S
hi
i

Terms

8-4

Pfijk =

m fij weeks
M fij 52

th

Output from PROC SURVEYSELECT in
SAS

Selection probability for the i grantee within h
stratum

nh

Prespecified number of grantees selected for the study in
h th stratum

RTI calculates the sampling rates and
allocates grantee samples into each stratum
(see example in Exhibit 10)

S hi

Number of patients served in the year prior to the survey
th
th
year in i grantee within h stratum

BPHC’s UDS

Total number of patients the grantees served in the year
th
prior to the survey year in h stratum

BPHC’s UDS

hi

i

C fij

Data Source

th

G
hi

∑S


 1, or

=
 3s fij

 ∑ s fij
 j

Description

th

th

C fij

Selection probability for j site within i grantee for
funding program f ; equals to 1 if 3 or fewer sites are
selected, or is calculated if 3 sites are selected using PPS

Output from PROC SURVEYSELECT in
SAS, or equals to 1

S fij

Number of patients served in the year prior to the survey
th
th
year from j site within i grantee for funding program

RTI recruiters collect this information from
the grantee or site in recruiting process

f

∑S
j

fij

Total number of patients served in the year prior to the
th
survey year from all sites within i grantee for funding
program f

k from grantee i , site j

Sum of S fij within the grantee for a specific
funding program
Calculate from the formula

Pfijk

Selection probability of patient
for funding program f

m fij

Number of selected patients to yield n fij complete
interview from grantee i , site j for funding program f

FI keeps track of the number of selected
patients sent by a receptionist for each funding
program

M fij

Number of patients entered in the site during the
sampling window (number of weeks)

RTI collect data from receptionists’ tally
sheets
(continued)

Exhibit 16. Description and Data Source of Terms in Formulas Calculating Sample Weights (continued)
Formula

Terms

Description

Data Source

w (1) hi = 1 / Ghi

w(1) hi

Design weight corresponding to grantee selection

Inverse of G

w ( 2 ) fij = 1 / C fij

w( 2 ) fij

Design weight corresponding to site selection

Inverse of C fij

w (3) fijk = 1 / Pfijk

w(3) fijk

Design weight corresponding to patient selection

Inverse of Pfijk

Design weights for each selected patient

Product of three design-based weight
components corresponding to three selection
stages

A weighting class nonresponse adjustment

Calculate the nonresponse adjustment within
each weighting class separately for each
funding program

w fijk = w (1) hi ⋅ w ( 2 ) fij ⋅ w (3) fijk w fijk

Adjnr
Adjnr = ∑ s w fijk / ∑r w fijk
8-5

∑W
s

fijk

Sum of the design weights of all selected patients for a
specific funding program

Sum of w fijk of all selected patients within a
weighting class

∑W

fijk

Sum of the design weights of completed interview for a
specific funding program

Sum of w fijk of completed interviews within
a weighting class

Adj ps

Poststratification adjustment done by each funding
program; adjusts weights to BPHC’s UDS total number
of patients for various demographic domains

Generalized Exponential Model developed at
RTI; control totals are from BPHC’s UDS

ANALWT fijk

Final analysis weight

Product of design weight, nonresponse, and
poststratification adjustments

r

Adj ps
ANALWT fijk = w fijk ⋅ Adjnr ⋅ Adj ps

hi

SECTION 9.
DATA COLLECTION
9.1

Schedule

The 2014 Health Center Patient Survey data will be collected over 4 months, from
August to December 2014. Typically, a work day will be divided into morning or afternoon time
slots. We will send an FI to a site on predetermined days and time slots. An FI will normally
work in multiple sites from one grantee or multiple grantees. We will determine the FI’s time
slots for each site by considering the production goal of a site, estimated patient volume in a site,
the FI’s working schedule, and the site’s operating schedule. The production goal, which is the
number of completed interviews, varies for each site; it can be as low as 5 or 6 interviews when 3
sites are selected for a PHPC grantee (15–16 interviews for PHPC per grantee) or it can be as
high as 90–92 when a site is the only one selected for data collection for all four funding
programs (although that scenario rarely happens). Achieving the production goal at each site
should not be difficult in a 4-month data collection window. However, for some sites, because of
unexpected low patient volume or an unusual operating schedule, the production goal could
potentially be missed. We will closely watch the data collection process, and if a delay occurs,
we will send an FI more often to the site. We may have to reduce the production goal for a site
and allocate more interviews to other sites if meeting the production goal proves to be extremely
difficult.
9.2

Costs

The three primary field costs are FI labor, mileage incurred by FIs, and incentives paid to
respondents. We estimate that we need 4.7 hours on average to obtain one interview for the CHC
patients, 6.7 hours for interviews done in an Asian language, and 7 hours per interview for MHC,
PHPC, and HCH patients. These hours include time for driving to and from a facility, waiting to
be approached by eligible patients, screening potential participants, administering informed
consent, administering an interview, updating field status codes and completing other
administrative paper work, shipping material back to RTI, and participating in regular conference
calls with his/her field supervisor. We also assume that FIs will require reimbursement for an
average of 60 miles per completed interview. Finally, we have budgeted for $25 in incentives for
each survey respondent.

9-1

SECTION 10.
STRENGTHS AND LIMITATIONS OF STUDY DESIGN
10.1

Strengths

The three-stage PPS sample design will produce a nationally representative sample of
grantees, health sites, and patients across the United States, across urban/rural locations, and
across various grantee sizes.
We will create seven grantee strata according to funding program(s) in which a grantee
participated and whether a grantee has concentrated patients in one of the three race/ethnicity
categories (AIAN, Asian, and NHPI). We will oversample grantees receiving PHPC, MHC, and/
or HCH funding, and grantees with concentrated patients in one of three race/ethnicity
categories. The stratified disproportionate sample at grantee selection stage will yield a grantee
sample with more grantees participating in PHPC, MHC, and/or HCH funding programs and
grantees with large number of patients in three race/ethnicity categories. These aspects of the
design are key so that the target sample sizes for funding programs and race/ethnicity groups can
be met. The optimum grantee sample allocation procedure reduces UWE. Independent site and
patient samples will be selected for each funding program if a grantee participated in multiple
funding programs. This step reduces data collection cost and increases sampling efficiency
because of the large costs of recruiting a grantee.
Oversampling sites with concentrated patients in one of the three race/ethnicity categories
will further guarantee a success of achieving target sample sizes in the minority race/ethnicity
categories. Allocating interviews per funding program in a grantee to up to three sites when
possible will help to reduce the clustering effect, thus reducing sampling error and improving
precision on survey estimates.
We will oversample patients at the third selection stage for ages 65 or older and in
race/ethnicity categories (AIAN, Asian, and NHPI). We will closely monitor the data collection
on a weekly basis, and we will adjust the sampling rates and frequency of an FI on a site to
ensure target sample sizes in each group will be met within the 4-month sampling window.
When the target sample for each funding program is met, BPHC can compare survey
estimates among funding programs. The combined sample of patients from the four funding
programs will be sufficient for comparative analyses with national estimates of U.S. residents
from the NHIS on various survey outcomes at the national level and some subgroups, such as
race/ethnicity, age group, health insurance status, etc.

10-1

10.2

Limitations

The sample size has increased from 4,500 in the 2009 study to 6,600 for the 2014 study
so the precision of survey estimates should improve in the 2014 study. However, oversampling
grantees, sites, and patients at all three stages can cause large variation in sample weights,
thereby increasing variances associated with survey estimates and reducing statistical power in
data analysis. This design efficiency loss due to oversampling could partially offset the gain of
the increased sample sizes.
An additional limitation is the capture of seasonal variation in health care needs and
service utilization. The time constraints for completing the study within the contract time period
limit the data collection period to 4 months, not a full year; thus, the study will not be able to
address any seasonal fluctuations in the types of services provided to the health center patients
during different seasons of the year. The short time period for data collection may also miss
groups of seasonal farmworkers who move from one part of the country to another during the
year.

10-2

SECTION 11.
REFERENCES
Chromy, J. R. (1981). Variance estimations for a sequential sample selection procedure. In
D. Krewski, R. Platek, and J.N.K. Rao, eds. Current Topics in Survey Sampling. New
York: Academic Press, Inc.
Folsom, R. E., and Singh, A. C. (2000). The generalized exponential model for sampling weight
calibration for extreme values, nonresponse, and poststratification. Proceedings of the
American Statistical Association Section on Survey Research Methods, 598–603.
Kish, L. (1995). Survey Sampling, p217–246. Wiley Classics Library Edition Published 1995.

11-1


File Typeapplication/pdf
Authorderecho
File Modified2014-02-12
File Created2014-02-11

© 2025 OMB.report | Privacy Policy