Survey methodology

Survey reliability guidelines

  • For survey data, statistical reliability of the estimates is assessed using relative standard error, confidence interval width, and sample size. Statistical reliability guidelines can be found here.

Statistical significance

  • A t-test is used to assess statistical differences between two prevalence estimates. A significance level of p < 0.05 determines that the estimate is statistically different from the reference group.


Household and neighborhood definitions

Community districts

  • New York City’s 59 community boards were created by local law in 1975, and each represents a community district (CD). For a complete listing of all CDs and their boundaries, visit the NYC Department of City Planning Community Portal.

Neighborhood poverty

  • Neighborhood poverty (ZIP code) is defined as the percentage of the population living below the Federal Poverty Line (FPL) based on the American Community Survey. Neighborhoods are categorized into four groups as follows: “Low poverty” neighborhoods are those with <10% of the population living below the FPL; “Medium poverty” neighborhoods have 10-<20% of the population below FPL; “High Poverty” neighborhoods have 20-<30% of the population living below the FPL; “Very high poverty” neighborhoods have ≥30% of the population living below the FPL.

Neighborhood Health Action Center areas

  • To promote health equity and reduce health disparities at the neighborhood level, the Health Department established Neighborhood Health Action Centers (formerly District Public Health Offices) in the South Bronx, East and Central Harlem, and North and Central Brooklyn, neighborhoods with high rates of chronic disease and premature death.

UHF neighborhoods


Statistical definitions


  • Within EpiQuery, all age-adjusted estimates have been standardized to the Year 2000 U.S. Standard Population. Most health outcomes and behaviors are related to age. Epidemiologists use age adjustment to compare the attributes of two or more groups whose age distributions may be different.
  • Example: The prevalence of high blood pressure increases with age. The age distribution differs between men and women because women on average live longer than men. Therefore, if you compare non-age-adjusted prevalence estimates between men and women and observe a higher high blood pressure prevalence among women compared with men, this may be due to there being more older women than older men. Age-adjustment ensures that any differences in high blood pressure found are because there is a true difference between the sexes in high blood pressure.
  • When not to age-adjust
  • If a researcher is only interested in outcome differences between populations, but not the reasons for those differences (for example, when performing needs assessment for service delivery), a comparison and evaluation of the unadjusted rates may be more appropriate.
  • To examine the effect of age on a health-related outcome, it may be more important to adjust for factors other than age.
  • For a more detailed discussion of prevalence estimates and age adjustment, take the CDC training module on age standardization.

95% Confidence Interval

  • Survey data are based on a sample meaning there is room for error. The confidence interval gives the range of error in an estimate. The wider the confidence interval, the more imprecise the estimate. The range has a probability (most commonly 95%) that the true value is included.


  • Proportion of people in a given population with a condition or attribute in a specified time period.
  • Example: Using the Community Health Survey, the prevalence of adults ages 18 and older with current depression was 8.9% in 2016.


  • Measure of the frequency of an event relative to the number of people at risk for the event. The number of events (e.g. deaths) is divided by the size of the population within a certain time period.


  • p-values indicate the probability that two groups are actually different and that the difference is not due to chance.
    • If the p-value is less than or equal to 0.05 (a 5% or less chance that the differing results occurred at random), then the difference between values is considered statistically significant.
    • By contrast, p-values that are greater than 0.05 indicate weak evidence of actual differences between groups. In other words, differences in values are attributed to random chance.


  • t-tests are used to generate the p-values that indicate whether the estimates between two groups or between two years are actually different from each other and not due to chance.

Statistical calculations

  • All results are calculated using R software.


Communicable disease-specific details

For links to more information about specific reportable communicable diseases, please refer to the Disease Reporting section. In addition, please note the following disease-specific details for data presented on EpiQuery:

  • Hepatitis A: Please refer to the Centers for Disease Control and Prevention and Council of State and Territorial Epidemiologists (CDC/CSTE) case definition.
  • Hepatitis B (chronic) and hepatitis C (chronic):
    • The Health Department often receives more than one report for each person with chronic hepatitis B or chronic hepatitis C and uses automatic de-duplication methods to identify repeat reports based on name, date of birth, and other information. Only the first report is counted in the data presented here.
    • The Health Department does not routinely investigate hepatitis B or hepatitis C reports because of the large volume; therefore: a) it is difficult to determine when people with these infections were first infected, although most were probably infected a while ago, and b) address information is missing for some patients, but most probably reside in New York City.
    • For chronic hepatitis C, in 2016, the CDC/CSTE case definition for confirmed or probable cases was changed. CDC’s 2016 case definition is used for people first reported with chronic hepatitis C in 2016 and onward. Prior years use the previous case definition for past or present hepatitis C infection available from the CDC website.
    • For chronic hepatitis B, the data represent patients who meet the CDC/CSTE case definition for confirmed or probable cases.
    • View the most recent New York City Hepatitis B and C Annual Report here.
  • Legionellosis: Please refer to the CDC/CSTE case definition.
  • Encephalitis and viral (aseptic) meningitis: Encephalitis and viral meningitis cases have been identified primarily through two methods: (1) directly, via routine provider reporting and (2) indirectly, using laboratory submission forms for CSF specimens submitted for West Nile virus and encephalitis testing at the New York State public health laboratory. Since 1999, encephalitis and viral meningitis data in New York City have been heavily influenced by changes in West Nile virus surveillance efforts. Since 1999, when West Nile virus was introduced to New York City, communities with higher levels of historical or recent West Nile virus activity have traditionally reported higher rates of encephalitis and aseptic meningitis. Because laboratory submission forms serve as a proxy and do not accurately reflect clinical presentation, this was discontinued in 2012. Encephalitis and viral meningitis data are not considered representative of true rates of these diseases.
  • Anaplasmosis: Please refer to the CDC/CSTE case definition and the CDC website.
  • Babesiosis: Please refer to the CDC/CSTE case definition and the CDC website.
  • Ehrlichiosis: Please refer to the CDC/CSTE case definition and the CDC website.
  • Lyme disease: Please refer to the CDC/CSTE case definition and the CDC website.
  • West Nile disease: More specific West Nile disease data are available on the Health Department website.