Skip to content

Impose Data Reliability Threshold for Sales Data #13

Open
@tiernanmartin

Description

After reading through Policymap's documentation on their home sales dataset, I think it would be a good idea to impose a minimum threshold to exclude census tracts that have very few recorded sales when calculating summary statistics like median sale price.

I think that census tracts with fewer than ten (10) sales in a year should be suppressed from the dataset.

Clarification: tracts with fewer than ten sales per year will be excluded from the calculation of summary metrics like median sale price; however, they will be included when calculating metrics like sale rate.

Text from Policymap's Zillow data documentation

To ensure that only market based residential transactions were included, PolicyMap used the subset of residential sales that were at-arms-length transactions, over $5,000 in value, and did not involve vacant or unimproved land. Transactions that involved auctions, foreclosures, Real Estate Owned (REO) property sales, sheriff sales, or Planned Unit Developments (PUDs) were also excluded. Partial sales, property improvements, and transfers across multiple properties (also called bulk sales) were likewise removed. PolicyMap suppresses indicators with fewer than five sales in a given time period and geography as “insufficient data.” Counties with “limited data availability” have greater than 75% of sales with no reported sales price or a zero dollar sales price. Indicators may be unreliable in these areas.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions