moj-analytical-services / uk_address_matcher

51 stars

4 forks

68 issues

PythonShell

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing moj-analytical-services/uk_address_matcher in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/moj-analytical-services/uk_address_matcher)

Preview:

Repository Overview (README excerpt)

Crawler view

High performance UK addresses matcher (geocoder) Fast, simple address matching (geocoding) in Python. For full documentation, see our main documentation site. Why use this library • **Simple.** Setup in seconds, runs on a laptop. No separate infrastructure of services needed. • **Fast.** Match 100,000 addresses in ~30 seconds. • **Proven accuracy.** We use public, labelled datasets to measure and document accuracy. • **Support for Ordnance Survey data.** We provide a automated build pipeline for users wishing to match to Ordnance Survey data. Matching to any other canonical dataset is also supported. The end-to-end process of matching 100,000 addresses to Ordnance Survey data, including all software downloads and data processing takes: • Less than a minute if you are matching to a small area such as a local council region. • If matching to the whole UK, there's a one-time preprocessing step that takes around 10 minutes. Subsequent matching of 100k records takes less than a minute. Installation What does it do? Given the following data: • a "messy" dataset of addresses that you want to match • a "canonical" dataset of known addresses, often an Ordnance Survey dataset such as AddressBase or NGD. this package will find the best matching canonical address for each messy address. Example: Your address files need, at minimum, two columns: and . is optional by recommended. If not provided an attempt is made to parse them out of Given the following data: Messy data | unique_id | address_concat | postcode | |----------|----------------|----------| | m_1 | Flat A Example Court, 10 Demo Road, Townton | AB1 2BC | | ...more rows | Canonical data | unique_id | address_concat | postcode | |----------|----------------|----------| | c_1 | Flat A, 10 Demo Road, Townton | AB1 2BC | | c_2 | Flat B, 10 Demo Road, Townton | AB1 2BC | | c_3 | Basement Flat, 10 Demo Road, Townton | AB1 2BC | | ...more rows | You can match it as follows: Example output: | unique_id | resolved_canonical_id | original_address_concat | original_address_concat_canonical | match_reason | match_weight | distinguishability | |----------|------------------------|-------------------------|-----------------------------------|--------------|--------------|--------------------| | m_1 | c_2 | Flat A Example Court, 10 Demo Road, Townton | Flat A, 10 Demo Road, Townton | splink: probabilistic match | 13.5885 | 11.5033 | Development The scripts and tests will run better if you create .vscode/settings.json with the following: