Financial Economic Crime/Client Lifecycle Management: Optimally Aggregating Publicly Sourced Data for Client Due Diligence (CDD)

Delta Capita has widespread experience in CDD data within Client Lifecycle Management (CLM). Leveraging this experience and our understanding of how our clients are tackling this topic, in this blog we will outline some of the key challenges and best practices.


Sam is a senior Data Business Analyst and Project Manager with experience in aggregating disparate data sources within global investment banks across CLM, Reg and Investment Bank Operations.

Sam Gaunt
Managing Consultant

Drawing on our extensive knowledge of CDD data, we group the core challenges as below:

Inconsistent identifiers across public data sources create challenges in linking and integrating data. While certain common identifiers like the Bank Identification Code (BIC), Legal Entity Identifier (LEI), International Securities Identification Number (ISIN) and Companies House Company Registration Number are available, their lack of alignment poses difficulties. Different matching keys are required to establish connections between data sources, which make data integration and analysis complex and time-consuming. Several studies emphasise the significance of consistent identifiers across public data sources. A report by the World Economic Forum's Global Future Council on Data Policy highlights that inconsistent identifiers hinder data interoperability and limit the ability to trace and understand connections between entities.

Conflicting data formats pose additional challenges when joining different data sources. While in some cases, a one-to-one match is possible, there are instances where a one-to-many relationship is the only alternative. This necessitates making judgment calls and employing data analysis techniques to aggregate data effectively from both public and internal sources. Research conducted by the Data Warehousing Institute (TDWI) reveals that dealing with disparate data formats is one of the top challenges faced by organisations when merging multiple data sources. Thoughtful choice and ongoing review of source/attribute combinations will make this easier, alongside potential for technology such as AI to learn from the data and reduce cost over time.

Duplicate records are prevalent in public data sources, which further complicates data matching. Implementing data analysis layers before the integration of systems becomes necessary to eliminate duplicate entries and consolidate data elements, ensuring the quality of the data within internal systems. According to a survey conducted by Experian, 92% of organisations reported experiencing duplicate data-related challenges. Duplicates can result from errors during data collection, system migrations, or merging datasets from different sources.

Data quality issues arise due to the absence of controls in various public websites and data sources. A McKinsey KYC Survey found that data quality issues within the top global banks contribute to up to 26 percent of their operational costs. Efforts must be made to identify the most suitable data sources for specific purposes and perform supplementary analysis to fill in any gaps using available data. The European Data Governance Act and similar regulatory initiatives aim to improve data quality, establish common standards, and facilitate data sharing across sectors. Advancements in data integration technologies, such as data virtualisation and automated data cleansing, are aiding organisations in overcoming these hurdles. Ensuring that there is a well-documented view of which sources contain which trusted attributes (often split by client jurisdiction, and client entity type or segment) is key to reducing deduplication and quality improvement workload.

Client Engagement Use Case: FinCrime Automation

Last year, Delta Capita were engaged to project manage and automate new commercial client sanction screening. The objective was to automate the validation of new clients for commercial car loans, to reduce cost and onboarding time, whilst increasing transparency and accuracy. The optimised process runs in minutes and is fully automated, compared to previous half a day manual process.

A Delta Capita team - comprising of individuals with expertise in data analytics, process optimisation, and project management - were engaged to manage and deliver against the audit findings over a 3-month timeline. The approach was as follows:

  • Documented total scope of data for remediation
  • Established API connectivity using Python to automatically extract data from multiple internal and external sources
  • Data combined from multiple sources using fuzzy matching on shareholder and director names, location, and date of birth to identify the required persons for sanction screening
  • Filters and calculations applied in accordance with business and regulatory requirements, then individuals matched to external sanction lists for screening
Automated MI generated accurately and with greater insights than before

This risk reduction opened the opportunity for Delta Capita to expand automation further downstream and across other product types.

Following the successful go-live of the new commercial client sanction screening process the client realised these benefits:

  • Created transparency of controls in place to meet regulatory requirements
  • New client validation performed in minutes with improved accuracy, rather than up to half a day of FinCrime analyst time with risk of manual errors
  • 0.5 FTE reduction by converting manual data validation into python
  • Automated MI built using Python, reducing reporting time and improving insights

Learn more on how Delta Capita can help you.
If you are interested in learning more about how Delta Capita can support your FEC or CLM delivery and transformation, including assessment of Target Operating Models, please get in touch today.