Tuesday, August 15, 2023

Historical Data Migration -

Migrating historical data during the acquisition of an investment management business involves careful planning, coordination, and compliance with regulatory requirements. Below is a list of items, documents, and migration steps that the seller needs to provide to the buyer to ensure a smooth historical data migration process while maintaining regulatory compliance:

1. Data Inventory and Documentation:

Detailed inventory of all historical data, including transaction records, client information, investment portfolios, performance metrics, risk assessments, compliance records, and more.

Data lineage documentation showing the flow of data across systems and processes.

Data dictionaries explaining data fields, definitions, and formats.

2. Data Sources and Systems:

Identification of all data sources, including databases, spreadsheets, applications, and third-party data providers.

Information about data storage, formats, and data retention policies.

Documentation of data integration processes and data transformation procedures.

3. Data Mapping and Transformation:

Detailed data mapping documents illustrating how data from various sources will be transformed and integrated into the buyer's systems.

Transformation rules and logic used to convert data formats and values, ensuring consistency and accuracy.

4. Regulatory Compliance:

Documentation of compliance with relevant regulatory requirements, such as GDPR, SEC regulations, FINRA rules, and any other applicable industry standards.

Records of past regulatory audits, findings, and corrective actions.

5. Contracts and Agreements:

Copies of contracts, agreements, and legal documents related to clients, vendors, partners, and service providers.

Documentation of any outstanding legal disputes, lawsuits, or regulatory investigations.

6. Client Information:

Comprehensive client profiles, including personal details, investment preferences, risk tolerance, and transaction histories.

Consent forms and agreements related to data sharing and usage.

7. Investment Portfolios:

Detailed records of investment holdings, positions, trades, and historical performance data.

Documentation of investment strategies, asset allocations, and risk assessments.

8. Performance Reports:

Historical performance reports for individual clients, investment funds, and portfolios.

Calculation methodologies for performance metrics such as returns, volatility, and risk-adjusted measures.

9. IT Infrastructure:

Information about the technology stack, hardware, software, and networking components used to manage and store data.

Details about data security measures, access controls, encryption, and backups.

10. Data Quality and Accuracy:

Processes and procedures for data validation, cleansing, and quality assurance.

Documentation of any data anomalies, inconsistencies, or data integrity issues.

11. Data Migration Plan:

Comprehensive data migration plan outlining the sequence of migration tasks, timelines, responsibilities, and dependencies.

Contingency plans to address potential migration challenges or disruptions.

12. Training and Knowledge Transfer:

Training materials and documentation for the buyer's team to understand the acquired systems, data, and processes.

Transition plans to ensure a smooth handover of knowledge from the seller's team to the buyer's team.

13. Data Retention and Destruction:

Policies and procedures for retaining and eventually destroying historical data in compliance with regulatory guidelines and industry standards.

14. Legal and Regulatory Approvals:

Documentation of any necessary approvals from regulatory bodies or legal authorities for the data migration and acquisition process.

15. Post-Migration Support:

Agreement on post-migration support from the seller's team to address any issues or questions that arise after the data migration.

It's important to note that this list is not exhaustive and may need to be customized based on the specific circumstances and regulatory landscape of the acquisition. Both the seller and the buyer should work closely with legal, compliance, IT, and data management teams to ensure a successful and compliant historical data migration process.

Wednesday, May 24, 2023

Business Requirements for Reconciliation of Data Between Three Different Systems

Purpose:

The purpose of this business requirement is to establish a reliable and efficient data reconciliation process between three different systems within the organization. The reconciliation process aims to identify and resolve any discrepancies, inconsistencies, or errors in the data exchanged between these systems, ensuring data integrity and accuracy.

Scope:

The reconciliation process will involve three systems: System A, System B, and System C. These systems may have overlapping functionalities or handle different aspects of the organization's operations. The reconciliation will focus on key data elements that are shared or transferred between these systems.

Data Elements:

Identify the specific data elements that need to be reconciled between the three systems. This may include customer information, financial transactions, inventory records, employee data, or any other relevant data that flows across these systems. The reconciliation process should ensure that these data elements remain consistent and synchronized across all systems.

Reconciliation Rules:

Define the rules and criteria for reconciling the data. This includes specifying the conditions under which the reconciliation should occur and the tolerance levels for discrepancies. The rules should consider factors such as data formats, data types, unique identifiers, time stamps, and any specific business logic that determines data consistency.

Frequency:

Determine the frequency at which data reconciliation should take place. This may vary depending on the criticality and volatility of the data being reconciled. Consider factors such as transaction volume, data update frequency, and business requirements for timely decision-making.

Error Handling and Exception Management:

Define the process for handling reconciliation errors and exceptions. Establish protocols for identifying, logging, and resolving reconciliation discrepancies. Specify the roles and responsibilities of individuals or teams responsible for investigating and rectifying data inconsistencies. Additionally, outline escalation procedures for unresolved discrepancies and timeframes for resolution.

Reporting and Metrics:

Specify the reporting requirements for the reconciliation process. Determine the key performance indicators (KPIs) that will be used to measure the effectiveness and efficiency of the reconciliation efforts. This may include metrics such as reconciliation accuracy rate, error resolution time, exception frequency, and overall data quality improvements.

Security and Privacy:

Consider security and privacy requirements while reconciling data across systems. Ensure compliance with relevant data protection regulations and implement measures to safeguard sensitive or confidential information during the reconciliation process. Define access controls, encryption protocols, and audit trails to maintain data integrity and protect against unauthorized access or data breaches.

Integration and Data Exchange:

Outline the integration mechanisms or data exchange protocols between the three systems. Identify any existing APIs, data interfaces, or middleware that facilitate data transfer between the systems. Specify the data formats, protocols, and data transformation requirements needed to ensure seamless reconciliation and data synchronization.

Documentation and Training:

Develop comprehensive documentation that captures the reconciliation process, including reconciliation rules, error handling procedures, and reporting mechanisms. Provide training and awareness programs for personnel involved in the reconciliation process to ensure they understand the requirements, procedures, and tools necessary to carry out their responsibilities effectively.

Future Scalability:

Consider future scalability requirements when designing the reconciliation process. Anticipate potential changes in data volumes, system upgrades, or the addition of new systems in the future. Ensure that the reconciliation process can accommodate these changes without significant modifications, minimizing disruption to operations.

Compliance and Audit:

Ensure that the data reconciliation process complies with regulatory and audit requirements relevant to the organization's industry. Implement audit trails and logging mechanisms to track and monitor the reconciliation activities, facilitating compliance audits and internal control assessments.

Stakeholder Engagement:

Engage relevant stakeholders from each system to gather input, validate requirements, and ensure alignment with their needs. Seek feedback from system users, data owners, IT teams, and management to refine the reconciliation process and address any concerns or requirements specific to each system.

Change Management:

Develop a change management plan to effectively communicate and implement the reconciliation process across the organization. Provide training and support to users affected by the changes, and establish a feedback mechanism to capture suggestions and address any implementation challenges.

By addressing these business requirements, the organization can establish a robust and reliable data reconciliation process, ensuring accurate and consistent data across its systems and enabling informed decision-making based on trustworthy information.


Title: Business Requirements for Reconciliation of Data Between Three Different Systems

Introduction:

The purpose of this document is to outline the business requirements for reconciling data between three distinct systems within the organization. The systems involved are System A, System B, and System C. The reconciliation process aims to ensure data accuracy, consistency, and integrity across all systems, enabling efficient decision-making and reliable reporting. The primary objectives are to minimize data discrepancies, streamline operations, and enhance overall data quality.

Scope:

The reconciliation process will cover the following aspects:

a. Data Elements: All relevant data elements present in the three systems, including customer information, financial transactions, inventory data, and other critical business data.

b. Frequency: Define the frequency at which data reconciliation will occur (e.g., daily, weekly, monthly, etc.).

c. Validation Rules: Establish specific rules and criteria for data validation and reconciliation to identify discrepancies or inconsistencies.

d. Reporting: Generate reconciliation reports that highlight variances and provide an overview of data accuracy across systems.


Business Requirements:

3.1. Data Mapping and Comparison:

a. Identify common data elements among the three systems and establish a mapping process to ensure consistent interpretation and comparison.

b. Determine the reconciliation key fields that will be used to match records across systems accurately.

c. Define the comparison rules and algorithms to identify discrepancies between the systems, including rules for handling missing or incomplete data.

3.2. Data Reconciliation Process:

a. Define the sequence and steps involved in the data reconciliation process, considering dependencies, system availability, and resource constraints.

b. Ensure that the reconciliation process is automated to the greatest extent possible, minimizing manual interventions and potential errors.

c. Specify the order of system reconciliation, ensuring logical flow and consistency between the systems.

3.3. Exception Handling and Error Resolution:

a. Establish procedures to handle data discrepancies and exceptions identified during the reconciliation process.

b. Define responsibility and accountability for resolving reconciliation errors, including escalation processes when necessary.

c. Implement mechanisms to track and document reconciliation errors, their resolutions, and any associated impact on business operations.

3.4. Reconciliation Reporting:

a. Design reconciliation reports that provide a clear and concise overview of data accuracy and discrepancies between systems.

b. Include summary metrics, such as the number of reconciled records, discrepancies found, and reconciliation success rate.

c. Specify the frequency, distribution, and recipients of reconciliation reports to ensure timely availability of information for decision-making.


3.5. Data Security and Confidentiality:

a. Ensure that data confidentiality and integrity are maintained throughout the reconciliation process, adhering to relevant organizational policies and regulatory requirements.

b. Implement appropriate access controls and encryption mechanisms to protect sensitive data during the reconciliation activities.


Constraints and Dependencies:

a. Identify any constraints, limitations, or dependencies that may impact the data reconciliation process, such as system downtime, data availability, or external factors.

b. Consider the impact of system upgrades, patches, or migrations on the reconciliation process and plan accordingly.


Governance and Compliance:

a. Ensure compliance with applicable regulatory and legal requirements related to data reconciliation, data privacy, and information security.

b. Establish a governance framework with clearly defined roles, responsibilities, and accountability for data reconciliation activities.

c. Conduct periodic audits and reviews to assess the effectiveness and compliance of the data reconciliation process.


Conclusion:

The business requirements for the reconciliation of data between System A, System B, and System C provide a comprehensive framework for ensuring data accuracy, consistency, and integrity across the organization. By implementing these requirements, the organization aims to minimize discrepancies, streamline operations, and enhance overall data quality, leading to improved decision-making and reliable 

Sunday, May 21, 2023

 Data Lineage

  • https://a-teaminsight.com/wp-content/uploads/2019/03/A-Team-Group_Data-Lineage-Handbook-2019-1.pdf
  • https://neo4j.com/blog/internal-risk-models-frtb-data-lineage/ 
  • https://www.marklogic.com
  • http://www.datamanagementinsight.com
  • https://www.mckinsey.com/~/media/mckinsey/business%20functions/risk/our%20insights/frtb%20reloaded%20the%20need%20for%20a%20fundamental%20revamp%20of%20trading%20risk%20infrastructure/frtb-the-need-for-a-fundamental-revamp-of-trading-risk-infrastructure-web-final.ashx
  • https://getmanta.com
  • https://www.ticksmith.com/use-case-ticksmith-data-pooling-platform-empowers-banks-to-pool-data-for-frtb
  • https://www.mckinsey.com/~/media/mckinsey/business%20functions/risk/our%20insights/frtb%20reloaded%20the%20need%20for%20a%20fundamental%20revamp%20of%20trading%20risk%20infrastructure/frtb-the-need-for-a-fundamental-revamp-of-trading-risk-infrastructure-web-final.ashx
  • https://datacrossroads.nl/2019/03/10/data-lineage-101/
  • https://datacrossroads.nl/2019/03/13/data-lineage-102/
  • https://datacrossroads.nl/2019/03/17/data-lineage-103/
  • https://datacrossroads.nl/2019/03/20/data-lineage-104/

  • https://productresources.collibra.com/docs/collibra/latest/Content/CollibraDataLineage/TechnicalLineage/ref_technical-lineage-viewer.htm


Data lineage traces data from source to destination, noting every move the data makes and taking into account any changes to the data during its journey. 

Data lineage is the process of understanding, recording, and visualizing data as it flows from data sources to consumption. This includes all transformations the data underwent along the way—how the data was transformed, what changed, and why.

Initially implemented without specific regulatory requirements to track data across individual data management projects, data lineage rose to prominence following the implementation of BCBS 239 in January 2016, a Basel Committee on Banking Supervision (BCBS) rule designed to improve data aggregation and reporting across financial markets, as well as accountability for data. This required improvements in data governance and data lineage that have since been reinforced by other regulations and financial institutions’ recognition of the importance of accurate, complete and sustainable data lineage.


What is data lineage?

Data lineage covers the lifecycle of data, from its origins, through what happens to the data when it is processed by different systems, and where it moves from and to over time. It can be applied to most types of data and systems, and is particularly valuable in complex, high volume data environments. It is also a key element of data governance, providing an understanding of where data comes from, how systems process the data, how it is used and by whom. It also plays well into improving data quality.

Scope of data lineage implementation is often determined by regulatory requirements, enterprise data management strategy, data impact and critical data elements.

The use of data lineage for regulatory compliance is slightly different depending on the specific regulatory data requirement, but the overall theme is the same.

Few firms can claim complete and entirely successful data lineage, but most have developed a regulatory response that is beginning to yield operational and business benefits

Data lineage is sometimes referred to as technical lineage, which represents the flow of physical data through underlying applications, services and data stores,

or business lineage, which requires the same underlying technicalities but is perceived as a driver of business intelligence and better business decisions.

By building a picture of how data flows through an organization and is transformed from source to destination, it is possible to create complete audit trails of data points, an aspect of lineage that has become increasingly necessary to meeting regulatory requirements and ensuring data integrity for the business.

While data lineage helps to track data and identify different processes involved in the data flow and their dependencies, metadata management – the management of data that describes data – is key to capturing enterprise data flow and presenting data lineage. Data lineage solutions based on metadata collect and integrate consistent end-to-end metadata throughout an organization, and create a metadata repository that is accessible and makes complete data lineage information available to different user groups.

Data lineage is usually represented visually to show the movement of data from source to destination, changes to the data and how it is transformed by processes or users as it moves from one system to another across an enterprise, and how it splits or converges after each move. Visualization can demonstrate data lineage at different levels of granularity, perhaps at a high level providing data lineage that shows what systems data interacts with before it reaches destination. As the granularity increases,

it is possible to provide detail around the particular data such as its attributes and the quality of the data at specific points in the data lineage.

The scope of lineage implementation is often determined by regulatory requirements, enterprise data management strategy, data impact and critical data elements of an organization. It is not necessary to boil the ocean, but instead identify regulatory requirements for data lineage and business areas to which its application is beneficial.

In many financial firms, users of data lineage include business managers and analysts, compliance professionals, strategy developers, data governance teams, data modelers, and IT management, development and support.

Importance of data lineage

Data lineage is critical to both regulatory compliance and business opportunity. From a regulatory perspective, compliance has been tightened up considerably since the 2008 financial crisis with subsequent regulations been designed to avoid a repeat of similar circumstances. Rather than merely producing reports for compliance, these regulations – including BCBS 239, Markets in Financial Instruments Directive II (MiFID II), General Data Protection Regulation (GDPR), Fundamental Review of the Trading Book (FRTB) and the Comprehensive Capital Analysis and Review (CCAR) – now require firms to implement data lineage to demonstrate exactly how they came to the results published in reports. Using data lineage, firms can not only prove the accuracy of results, but also take a proactive approach to identifying and fixing any gaps in required data.

Complete data lineage can also reduce the burden of regulation by providing operational transparency and reducing risk and costs. Its metadata can help firms consolidate regulatory reporting by identifying data that is used across numerous regulations and move towards processing the data once for multiple purposes. Similarly, metadata for data lineage can ease the burden and cost of implementing new regulations.

From a business perspective, and at a base level, data lineage helps firms stay on the right side of regulators and avoid the penalties of non-compliance. Equally important, it helps firms gain an understanding of their data and the impact on data of any changes to strategy, systems and processes. With an understanding of data, firms can gain the benefits of data lineage beyond compliance, including the ability to spot new business opportunities, make better decisions, increase efficiency and reduce costs.

Regulatory drivers

Regulations driving financial institution to implement data lineage include those noted above and detailed here. The use of data lineage in each case is slightly different depending on the specific regulatory data requirement, but the overall theme is the same, to be able to demonstrate where data originated, trace its journey through an organization, and prove how it has been changed along the way.

BCBS 239

Basel Committee on Banking Supervision Rule 239 (BCBS 239) came into force on January 1, 2016 and is designed to improve risk data aggregation and reporting. It is based on

14 principles that underpin accurate risk aggregation and reporting in normal times and times of crisis. To achieve compliance, banks must capture risk data across the organization, establish consistent data taxonomies, and store data in a way that makes it easily accessible and straightforward to understand.

Data lineage requirement: Data lineage must be implemented to support risk aggregation, data accuracy and reporting. Also, and conversely, to ensure risk data can be traced back to its origin and risk reports can be defended.

MiFID II

Markets in Financial Instruments Directive II (MiFID II) is a principles based directive issued by the EU. It took effect on January 3, 2018, and aims to increase transparency across Europe’s financial markets and ensure investor protection. The demand for reference and market data for both pre- and post-trade transparency, including trade reporting and transaction reporting, is unprecedented, leading to data management challenges including sourcing required data, reporting in near real-time, and uploading reference and market data to MiFID II mechanisms including Approved Publication Arrangements (APAs) and Approved Reporting Mechanisms (ARMs).

Data lineage requirement: MiFID II operations can benefit from data lineage in a number of ways. Lineage can be used to identify any gaps in trade reporting

data, and any similarities across numerous regulatory reporting obligations. It can also be used to map MiFID II reporting data from source systems to APAs and ARMs and vice versa.


GDPR

General Data Protection Regulation (GDPR) is an EU data privacy regulation that came into force on May 25, 2018. It is designed to harmonize data privacy laws across Europe and protect EU citizens’ data privacy. The requirements of GDPR include gaining explicit consent to process personal data, giving data subjects access to their personal data, ensuring data portability, notifying authorities and individuals of data breaches, and giving individuals the right to be forgotten.


Data lineage requirement:

Firms subject to GDPR are dependent on data lineage to track data and provide transparency about where it is and how it used. Data lineage provides the ability to demonstrate compliance with the regulation and, from a data subject’s perspective, supports access to personal data and the execution of other rights such as the right to be forgotten.


FRTB

Fundamental Review of the Trading Book (FRTB) regulation will take effect in 2022. It is a response to the 2008 financial crisis, which exposed fundamental weaknesses in the design of the trading book regime, and focuses on a revised internal model approach to market risk and capital requirements, a revised standardized approach, a shift from value at risk to an expected shortfall measure of risk, incorporation of the risk of market illiquidity, and reduced scope for arbitrage between banking and trading books. Its data management challenges include data sourcing, facilitating capital calculations, and gathering historical data as well as real- price observations for executed trades, or committed quotes, to meet requirements around non-modellable risk factors (NMRFs) and the linked risk factor eligibility test. Data lineage requirement: To satisfy the demands of FRTB, data lineage may be needed to track historical data and trade data aggregation required for the risk factor eligibility test of NMRFs, essentially the provision of at least 24 real price observations of the value of the risk factor over the previous 12 months. To satisfy the demands of FRTB, data lineage may be needed to track historical data and trade data aggregation required for the risk factor eligibility test of NMRFs

CCAR

The Comprehensive Capital Analysis and Review (CCAR) is an annual exercise carried out by the Federal Reserve to assess whether the largest bank holding companies (BHCs) operating in the US have sufficient capital to continue operations throughout times of economic and financial stress, and have robust, forward-looking capital planning processes that account for their unique risks. From a data management perspective, CCAR requires data sourcing, analytics and risk data aggregation for stress tests designed to assess the capital adequacy of BHCs and for regulatory reporting purposes.

Data lineage requirement: CCAR requires attribute level data lineage to track data from source to destination and ensure the validity and veracity of capital plans. Data lineage can also be used to identify any data gaps in reporting and highlight any data quality issues.

Supply and demand

Over the past few years, a number of established data management vendors have brought data lineage solutions to market, as have start- ups and young companies dedicated to lineage. Some take a technical approach, others a business approach, but their common challenge is to meet growing market demand for automated data lineage that can cross complex data environments and ensure regulatory compliance and deliver business benefit. On the demand side, recognition and adoption of data lineage has tracked increasing regulation since the financial crisis. Few firms can claim complete and entirely successful systems, but most have developed a regulatory response that is beginning to yield operational and business benefits.

Unsurprisingly, most progress has been made at Tier 1 banks and other large organizations subject to extensive regulation and with the resources to implement data lineage, although all firms that want to stay in the game are likely to need data lineage across some aspects of their business going

Challenges and opportunities

Overview - Like most data management programs, data lineage includes inherent challenges and potential opportunities. The challenges range from winning management buy-in for initial projects to understanding and tracking huge volumes of data with complex links across a big data environment. The opportunities range from improved data quality to better decision making and identifying business opportunities.

Challenges - The challenges of data lineage tend to fall into three buckets – operations, technology and data management – and while many are ongoing pain points for data managers across all sorts of programs, some are specific to data lineage.

Operational challenges - The operational challenges of data lineage start with winning management buy- in and funding for a solution that can be expensive, requires significant human input, and offers only a modicum of advantage in early implementation. Poor understanding of data lineage and its potential benefits by senior executives can stymie approval, while the prospect of lengthy and complex projects could be enough to bring the shutters down.

Questions to consider at the outset of a data lineage project include:

  • Where are we now, why do we need data lineage?
  • What extent of lineage would be optimal?
  • How can we win management buy-in?
  • Do we need a champion for data lineage?
  • How much will it cost now and going forward?
  • How much can we do with allocated budget?
  • Do we have required skills internally?
  • What are the internal cultural issues of data lineage?

You can begin to answer these questions by ensuring senior management understands the importance of data and benefits of data lineage, and starting small. Decide whether a pilot project is going to provide insight into business processes or achieve an element of regulatory compliance, prioritize the most important and relevant data, scope the project carefully, and identify stakeholders that should be involved.

In the first instance, it may be useful to assess where required data comes from manually and create baseline data lineage before considering automation. It is also important to make sure a pilot project is scalable and could include additional data or other areas of the organization before making a business case.

Proving the concept of data lineage and demonstrating quick wins to the business should, at least in some cases, be enough to start the journey towards a larger data lineage program spanning part or all of the organization.

While a good start to any data management project means it should gain momentum, the success of data lineage is particularly dependent on people and their approaches. It takes a range of data and metadata management skills to develop and maintain data lineage, but if data producers and consumers don’t see its value, they are unlikely to fall in with the cause and follow carefully created data lineage processes. These producers and consumers need to look beyond their own environment and understand how the organization can benefit from data lineage.

That is not to say any data lineage. As data lineage can be expensive to build and manage, it is important to understand what level of data lineage users require. Depending on resources, it may or may not be possible to match extensive requirements, so the initial aim must be to build a data lineage solution that delivers value and is right-sized for consumers, with later iterations providing more detail around data and data flows.

Data ownership and accountability is an ongoing challenge that many organizations with huge amounts of data, myriad systems and applications, and little appetite among employees to take responsibility for data have failed to resolve. Data lineage isn’t a silver bullet, but by tracking data and showing how it is used and by whom, it does add some clarity to data and allows responsibility for specific areas of data to be allocated to their rightful owners.


Technology challenges

The technology challenges of data lineage reflect growing numbers of regulations with overlapping lineage requirements and smarter auditors and regulators asking for responses to questions on demand. Advances in technology add to the challenge, with cloud- based applications and services, and big data systems – not to mention emerging machine learning, artificial intelligence and natural language processing technologies – creating a complex data infrastructure. Data can be managed in new and interesting ways, but keeping track of it and ensuring it can be trusted is increasingly difficult.

At the heart of addressing these challenges, and a challenge in itself, is the selection of a solution, or solutions, to support an organization's data lineage. Early implementations of data lineage were often built in-house as few vendor solutions were available, more recently many firms have moved to hybrid in-house and vendor solutions, or migrated entirely to vendor solutions as data lineage has advanced towards becoming a commodity.

Whether you plan to build or buy, these questions are worth considering before final decisions are made:

  • How much lineage is already in place?
  • To what extent will manual lineage continue to be necessary?
  • How will lineage be documented?
  • How will it need to be scaled? 
  • How will impact assessment be managed?
  • What is the long-term aim for automation?
  • Which areas of the organization will be covered and at what level in terms of technical and business lineage?
  • How will data lineage be sustained?
  • What skills will be required? • How much will it cost?


There are no catch-all answers to these questions and few organizations that will find answers to all the questions in one solution, leading many to implement a combination of in-house developed and vendor deployed solutions.

Whatever the selected solution, however, it will not provide value in isolation. It is important to consider how data lineage and its metadata will integrate with the rest of an organization's business metadata as this will provide rich data and the ability to slice and dice the data. Lineage also needs to run alongside an organization's systems development lifecycle plan to ensure it is maintained

as technologies are changed. And, of course, scalable and flexible technology is essential, not only to master growing volumes of existing data types, but also to embrace additional datasets, alternative data, data resulting from mergers and acquisitions, and data that we have yet to discover.

Data management challenges

Implementing data lineage is a complex data management task that could include huge volumes of data, the creation of metadata, multiple legacy systems, mountains of spreadsheets, disparate systems, siloed data, uncharted data flows and mixed data formats.

The potential impact of regulatory change must also be assessed, data quality considered, and manual processes brought into the lineage framework.

Big data, data lakes and repositories raise issues around how data is stored, tagged and linked to other data and systems, while outsourced data and automated data feeds need to be mined and brought into the data lineage scheme.

Data management questions that need to be considered before data lineage is implemented include:

  • Is all the data valuable?
  • Is the data duplicated?
  • Is some of the data redundant? • Is the data internal?
  • Is the data external and correctly licensed?
  • What tools are required to find answers to these questions?


Reflecting these questions, an early inventory of an organization's data can start the process of identifying which data is important to the business and should be part of a data lineage program, which data can be left as is, and which data can be scrapped. Data in legacy systems and black boxes will difficult, if not impossible, to capture, as will data that changes continually but not consistently.

Considering the scope and scale of these data management challenges, particularly in large organizations, data lineage utopia is not in sight, but there are tools and solutions that

can break the backbone of implementation and provide a sturdy platform on which to build and maintain data lineage that can provide useful and timely information to the business.

Data Lineage related regulations:

  • BCBS 239
  • Fundamental Review of the Trading Book (FRTB) - 
  • Markets in Financial Instruments Directive II (MiFID II)
  • General Data Protection Regulation (GDPR)
  • Comprehensive Capital Analysis and Review (CCAR)


Saturday, May 20, 2023

Data Quality 

  • Data Quality Criteria - Accessibility, Accuracy, Completeness, Comprehensiveness, Consistency, Currency, Integrity, Provenance, Representation, Uniqueness, Validity
  • Availability (Accessibility, Authorization, Exclusiveness, Timeliness), Performance, Efficiency)
  • Usability (Definition, Recognition, Orientation, Structure, Effectiveness)
  • Reliability (Consistency, Integrity, Accuracy, Completeness, Actuality, Audit-ability, Responsibility)

Six Data Quality Dimensions

Data quality can be determined based on the following six attributes:

  1. Completeness: The degree to which expected data attributes are provided. Completeness is expressed as a percentage of data that meets the users' expectations and data availability. For example, 95% of surname record fields that need to be known are complete.
  2. Coverage: The degree to which a dataset is complete for all required values. For example, if a dataset of US zip codes covers only 20 states, the dataset does not have complete coverage if the requirement is for all contiguous states.
  3. Accuracy: The data reflects the real-world state. For example, the company name is the real company name, and the company identifier is verified against the official database of companies being used (Dun & Bradstreet, SEC, and so forth). Note: Data can be complete but not accurate.
  4. Consistency: Whether the facts across multiple datasets match and represent the same objects. Consistency also takes into account whether data is at the same level of aggregation (e.g., sales transaction data may show individual order line items for each customer while monthly sales reporting simply shows total order value by geography).
  5. Validity: The extent to which the data conforms to defined business rules. A value can be valid but not accurate. For example, the customers' birthdate may be a valid date, but incorrect.
  6. Timeliness: The degree to which data is adequately up to date for a given task. For example, the tax information provided on the application is for the most recent tax year.

Rather than fixing data quality by finding and correcting errors, managers and teams must adopt a new mentality — one that focuses on creating data correctly the first time to ensure quality throughout the process. It is essential that people recognize themselves as customers, clarify their needs, and communicate those needs to creators. People must also recognize themselves as creators, and make improvements to their processes, so they provide data in accordance with their customers’ needs. So why aren’t they the norm? It turns out that a variety of organizational and cultural issues get in the way.

https://hbr.org/2020/02/to-improve-data-quality-start-at-the-source

The Data Governance organization needs to establish policies to identify high-value data attributes, and the mechanism to measure the improvement of data quality over time. Here are the sub-steps associated with this “Manage Data Quality” step:

  • 10.2.1 Establish data quality policies, including the identification of high value data attributes.
  • 10.2.2 Baseline data quality.
  • 10.2.3 Build the business case.
  • 10.2.4 Cleanse the data.
  • 10.2.5 Monitor the data quality over time.


Baseline Data Quality

Data has to be of the appropriate quality to address the needs of the business. There are several ways to assess the quality of a dataset:

  • Validity—The data values are in an acceptable format. For example, employee numbers have six alphanumeric characters.
  • Uniqueness—There are no duplicate values in a data fi eld.
  • Completeness—There are no null values in a data fi eld. For example, the zip or postal code should always be populated in an address table.
  • Consistency—The data attribute is consistent with a business rule that may be based on that attribute itself, or on multiple attributes. For example, a business rule might check whether a birth year is prior to 1/1/1900 or whether the effective date of an insurance policy is prior to the birth date on the policy.
  • Timeliness—The data attribute represents information that is not out-of-date.

For example, no customer contracts have expiry dates that have passed.

  • Accuracy—The data attribute is accurate. For example, employee job codes are accurate to ensure that an employee does not receive the wrong type of training.

● Adherence to business rules—The data attribute or a combination of data

attributes adheres to specified business rules. 

https://www.gartner.com/doc/reprints?id=1-1ZK7KV2W&ct=200728&st=sb?mkt_tok=eyJpIjoiWVRBM1pUVXhOVFEwWVRrNCIsInQiOiJiWXhcL2lEZUlYOVI2YWg2T0RQdEh5eGo0cGUzeDcxZXBWOVRUQms3dVdyOHBLbWFTSUhrdFJlcTJpMmFlYUlaV1wvZVRiMEgxQnJhNHEycDI0NU5iaEwySURCRVVVSEFvRDJEV2NpY1FzS0YwZGdLVnpkZWZsMWVIM09PZWFVcVR3V0Jhb0VBMjhzM1g3S0pGN0NabitTdz09In0%3D

Magic Quadrant for Data Quality Solutions

Published 27 July 2020 - ID G00389794 - 60 min read

The data quality solutions market continues to evolve and grow, fueled by desire for cost and operational efficiency. The solutions leverage augmented capabilities to deliver automation and insights. Data and analytics leaders should use this research to make the best choice for their organizations.

Strategic Planning Assumptions

By 2022, 60% of organizations will leverage machine-learning-enabled data quality technology for suggestions to reduce manual tasks for data quality improvement.

Market Definition/Description

As organizations accelerate the speed of digital transformation and innovation, there is a greater market demand for data quality solutions. This stems from a need to overcome the challenges from complex and distributed data landscapes, and new and urgent business requirements. Data and analytics leaders are facing intensive pressure to provide “trusted” data that can help business operations to run more efficiently and making business decisions faster and with greater confidence.

Data quality initiatives have traditionally been mandated to fulfill compliance requirements and to reduce operational risks and costs. Increasingly, data quality is also a necessity when amplifying analytics for better insights and making trusted, data-driven decisions.

As artificial intelligence (AI) technologies mature and become more widely adopted, many data quality vendors have started incorporating them into their solutions. In building augmented capabilities, they are driving better automation in areas that have traditionally relied on intensive manual tasks such as data matching, cleansing and transformation. Augmented data quality extends conventional data quality features to reduce manual tasks with automatic recommendations on “next best actions.”

The term “data quality” relates to the processes and technologies for identifying, understanding and correcting flaws in data that support effective data and analytics governance across operational business processes and decision making. The packaged solutions available include a range of critical functions, such as profiling, parsing, standardization, cleansing, matching, enrichment, monitoring and collaborating.

Considering the expansion in market demand, and the evolution and innovation of technologies, Gartner changed the name of this Magic Quadrant from “Magic Quadrant for Data Quality Tools” to “Magic Quadrant for Data Quality Solutions.” Effective data quality practices require more than a tool. A complete data quality solution includes built-in workflow, knowledge bases, collaboration, interactive analytics, and automation to support various uses cases across different industries and disciplines.

Gartner sees end-user demand expanding toward broader capabilities spanning data management, and data and analytics governance. As a result, the market for data quality solutions continues to integrate closely with the markets for data integration tools, metadata management solutions and for master data management (MDM) products, all of which are used to build solid foundations with trusted data asset. Users expect effective integration of, and interoperability between, these products. Evaluating and selecting data quality solutions is much less of a specialized IT task than it was formerly. It now requires greater collaboration with business leaders and users.

Gartner’s market perspective focuses on transformational technologies or approaches delivering on the current needs of end users. The following are the key capabilities that organizations need in their data management solutions portfolio, if they are to address the increasing importance and urgency of data quality:

Connectivity: The ability to access and apply data quality rules to a wide range of data sources, including internal and external, on-premises and cloud, relational and non-relational data sources.

Data profiling, measurement, and visualization: Data analysis capabilities that give business and IT functions (especially those supporting business users) insight into the quality of data and help them identify and understand data quality issues.

Monitoring: Capabilities that assist with the ongoing understanding and assurance of data quality through monitoring of, and alerting to, possible data quality issues.

Parsing: Built-in capabilities that decompose data into its component parts.

Standardization and cleaning: Built-in capabilities that apply government, industry or local standards, business rules or knowledge bases to modify data for specific formats, values and layouts.

Matching, linking and merging: Built-in capabilities that match, link and merge related data entries within or across datasets, using a variety of techniques such as rules, algorithms, metadata and machine learning.

Multidomain support: Packaged capabilities aimed at specific data subject areas, such as customer, product, asset or location.

Address validation/geocoding: Capabilities that support location-related data standardization and cleansing, and completion for partial data in real-time or batch process.

Data curation and enrichment: Capabilities that integrate externally sourced data to improve completeness and add value.

Business rule development and implementation: Capabilities that create, deploy and manage business rules that can then be called within the solution or by third-party applications for data validation purpose.

Issue resolution and workflow: Process workflows and user interfaces that enable nontechnical business users to identify, quarantine, assign, escalate, resolve and monitor data quality issues.

Metadata management: Capabilities that capture, reconcile and interoperate metadata relating to the data quality process.

DataOps environment: Collaboration of data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization.

Deployment environment: Styles of deployment, hardware, operating system and maintenance options for deploying data quality operations.

Architecture and integration: Commonality, consistency and interoperability among various components of the data quality toolset (including third-party tools).

Usability: Suitability of the solution to engage and support the various roles (especially nontechnical business roles) required in a data quality initiative.

Magic Quadrant

Figure 1. Magic Quadrant for Data Quality Solutions

Source: Gartner (July 2020)


Magic Quadrant for Data Quality Solutions

Vendor Strengths and Cautions

Ataccama

Ataccama is a Visionary in this Magic Quadrant; in the previous edition, it was also a Visionary. Ataccama has headquarters in Toronto, Canada. Its data quality product is Ataccama ONE DQ (version 12.5, which became generally available in December 2019). Ataccama has an estimated 370 customers for this product line. Its operations are mostly in EMEA and North America, and its clients are primarily in the banking and securities, healthcare, and public sectors.

Strengths

Growth in revenue, customers, and company size: Ataccama has demonstrated solid growth in revenue (20% year over year [YoY]) and customer base for the last three years for its data quality products and services alone. Two freemium products (Ataccama Data Quality Analyzer and Ataccama ONE Profiler) had 55,000 downloads, representing a 57% increase YoY. The number of employees grew by 50%, with 43% of this increase attributed to product development staff.

Technology innovation: Ataccama has invested in adding key emerging technologies to its core capabilities including automatic discovery and suggestions, AI-enabled data matching, and anomaly detection. Gartner’s Peer Insights reviews also show customer satisfaction with AI functionality to assist users.

Integrated data management solution: Ataccama extends its partnership with Manta for a data lineage solution that is fully integrated into Ataccama ONE to enhance data quality evaluation, data cataloging and maintenance of a metadata glossary. This outreach to technology partners complements Ataccama ONE’s capabilities beyond its core data quality, master data management, and metadata management features, and delivers an integrated data management solution.

Cautions

Upgrade and migration: Surveyed reference customers’ scores for Ataccama were below average for ease of upgrade and migration between versions of the vendor’s data quality solutions. Peer Insights reviewers also noted that the software releases and update cycles could be improved.

Interactive visualization: Reference customer scores put Ataccama’s data visualization capabilities behind those of most of the vendors in this Magic Quadrant. Specifically, the tools were seen as insufficiently user-friendly for developers and nontechnical business users.

Lack of local resources with skills and experience: Ataccama’s reference customers commented that the vendor lacks good local assistance and programmers familiar with the tool, and that the learning curve is steep if Ataccama consultants are not engaged. In addition, the vendor’s product documentation (in terms of its completeness, clarity and usefulness) scored below average. This may slow adoption of Ataccama’s solution.

Experian

Experian is a Challenger in this Magic Quadrant; in the previous edition, it was also a Challenger. Experian has its corporate headquarters in Dublin, Ireland. Its data quality products include Experian Aperture Data Studio (version 2), which became generally available in January 2020, Experian Pandora (a legacy data quality product) and QAS Pro (a web-based address validation product). Experian has an estimated 6,700 customers for these product lines. Its operations are geographically diversified, and its clients are primarily in the financial services, retail and public sectors.

Strengths

Market presence: Experian continues to maintain a good market share — fourth highest for data quality products and services. It has a large ecosystem with approximately 20,000 B2B customers and individual credit information in 44 countries across the world. This represents a significant market potential for its data quality offerings, especially in banking and financial industries.

Customer data focus: Experian specializes in “customer” data and customer enrichment insight. 80% of its customers are using party data (customer) domain. Experian offers a range of SaaS and on-premises contact data validation capabilities covering address, email, phone and geolocation. Experian positions this as its market differentiator.

Ease of use and implementation: Reference customers for Experian expressed strong satisfaction with the ease of installation, deployment and use of their products. The score is among the highest of all vendors included in this Magic Quadrant. This feedback is consistent with previous Magic Quadrant reference customer surveys. Specifically, the role-based usability of Experian products enables easy adoption by nontechnical business users.

Cautions

Product innovation: Experian lags behind its competitors in introducing new innovations and practices with emerging technologies. For example, Experian does not use machine-learning-driven issue resolution that learns and applies rules to similar scenarios. Neither does it provide prebuilt integration to share and consume data quality rules or processes to and from business analytics solutions.

Data quality features: Experian reference customers scored the product below average for several core data quality features such as multidomain support, visualization, parsing, standardization and cleansing. Visualization scores are below the average, with limited utilization across the company’s user base (80% of surveyed respondents don’t use it or have no plans to use it).

Pricing and value relative to cost: Reference customers for Experian score its pricing and licensing approach, and the value of its data quality tool relative to its cost, below the survey average. Lack of contract negotiation and support has also resulted in negative feedback from its customers.

IBM

IBM is a Leader in this Magic Quadrant; in the previous edition, it was also a Leader. IBM has headquarters in Armonk, New York, U.S. Its data quality products are IBM InfoSphere Information Server for Data Quality (version 11.7.1 FP1), which became generally available in June 2020, and Watson Knowledge Catalog, powered by Cloud Pak for Data (version 3), which became generally available in April 2020. IBM has an estimated 2,500 customers for these product lines. Its operations are geographically diversified, and it has clients in various sectors.

Strengths

Market understanding and product strategy: IBM demonstrates deep understanding of this market and has increased focus on driving the DataOps to deliver business-ready data using its solution capabilities. Reference customers praised IBM for having the vision to modernize the platform for the future.

Technology innovation: The latest IBM data quality product — Watson Knowledge Catalog — reflects the vendor’s investments in key emerging technologies to build an end-to-end data platform that integrates data quality, data governance and data consumption into one experience. The innovation in AI/ML-driven automation, built-in workflow for governance objects, redesign of persona-based user experience, and container-based and microservices deployment options give IBM a competitive advantage.

Pricing and value: Surveyed reference customers scored IBM among the highest for the value of its data quality tools relative to their cost. Scores for its pricing and licensing approach are also well above the average. IBM has shown consistent improvement in these areas for the past few years.

Cautions

Product migration path: IBM’s product strategy is to migrate its existing InfoSphere Information Server customers toward the latest Watson Knowledge Catalog. The Information Server product lines will likely move into “maintenance mode” in the future. The ML-based data rule definition generation is currently only available in WKC. Also, all newly created ML features moving forward will only be available in WKC. Customers currently tied to this suite need to consider a migration path to WKC to take advantage of IBM’s new and innovative technologies.

Ease of implementation, upgrade and migration: Reference customers indicated dissatisfaction with IBM in terms of product installation, upgrade and migration between versions of the vendor’s data quality solutions. There has been consistent feedback about IBM InfoSphere Information Server on this subject for several years. The survey scores are below average for these areas. The container-based and microservices-based WKC architecture is expected to alleviate this concern.

Data quality features: IBM scored lower than average for several core data quality features such as visualization, parsing, matching, linking and merging, and monitoring. These are important data quality features that IBM needs to improve. IBM is actively addressing this issue in its current and future releases.

Infogix

Infogix is a Visionary in this Magic Quadrant; this is its first appearance in the Magic Quadrant. Infogix has headquarters in Naperville, Illinois, U.S. Its data quality products are Data360 (version 4.3 of which became generally available in March 2020), and Infogix ACR, a legacy product for mainframe systems. Infogix has an estimated 230 customers for these product lines. Its operations are mostly in Asia and North America, and its clients are primarily in the financial services, healthcare and insurance sectors.

Strengths

Metadata solution background: Founded 40 years ago, Infogix focuses on industrial practices in data management and has a history in transactional data monitoring. As well as its data quality features, Data360 has extensive metadata management capabilities that support compliance, risk management and governance initiatives.

Customer satisfaction: Infogix retains a loyal customer base with a reported 98% customer retention rate. It is among the highest-rated vendors in this Magic Quadrant for customer satisfaction. Reference customers especially praised its product technical support, professional services, and support for product evaluation and contract negotiation.

Ease of use and implementation: Reference customers identified ease of use, installation, upgrade and integration as strengths of Infogix. They also praised its offer of user-friendly ETL and data validation, and flexibility to read any data. Infogix scored above average in these areas.

Cautions

Market presence: Despite its long history in the data management market, Infogix has limited market visibility, as indicated by its comparatively rare presence in competitive situations known to Gartner, and infrequent mentions by users of Gartner’s client inquiry service.

Data quality features: Infogix scored lower than average for several core data quality features such as multidomain support, standardization and cleansing, unstructured data support, and real-time processing.

End-user training: Infogix scored lower than average for the quality and availability of end-user training. Infogix’s relatively small customer base has resulted in limited availability of relevant product expertise externally. Reference customers indicated that insufficient user training is a barrier to adoption, given the limited external support available.

Informatica

Informatica is a Leader in this Magic Quadrant; in the previous edition, it was also a Leader. Informatica has headquarters in Redwood City, California, U.S. Its data quality products are Informatica Data Quality (IDQ) which became generally available in December 2019 (version 10.4), Informatica Axon Data Governance (version 7.0), which became generally available in May 2020, Informatica Data Engineering Quality (version 10.4), which became generally available in December 2019 and Informatica Data as a Service. Informatica has an estimated 5,500 customers for these product lines. Its operations are geographically diversified, and it has clients in various sectors.

Strengths

Market understanding and presence: Informatica continues to grow strongly with 5% revenue growth YoY and was the second highest vendor for market share by revenue in 2019. It has a deep understanding of the data quality market and a proven track record of adapting quickly to market changes. Its market understanding is highly correlated with its sales and marketing strategy, and with closed-loop market execution. It is frequently mentioned by users of Gartner’s inquiry service, and 42% of survey participants shortlisted Informatica in competitive evaluations for data quality solutions.

Product Innovation, vision and strategy: Informatica offers integrated data management solutions, underpinned by metadata-driven artificial intelligence. Its broad product portfolios provide comprehensive data management capabilities, including data quality, metadata management, data governance and master data management. These all come together as one end-to-end hybrid platform at enterprise scale. Reference customers praised Informatica’s consistent innovation to improve its data quality solutions

Data profiling and multidomain support: Informatica’s data profiling capability is the key to enabling integrated, end-to-end data quality across all of its data management solutions. Informatica scored among the highest in the data profiling and multidomain support areas. Reference customers commented that data profiling was quick and easy, requires no or little configuration, and allows connections to multiple data domains.

Cautions

Price-value ratio questioned: Informatica has a wide range of products that operate as an integrated data platform to support the various use cases. This makes it difficult for customers to derive the value of their platform from a single product. In addition, Informatica’s pricing models are relatively higher than the market. As a result, Informatica reference customers scored the data quality solution for value received in relation to the investment cost lower than the survey average. To address this, Informatica introduced more flexible pricing structures, such as consumption-based pricing with Cloud Data Quality (CDQ). Customers may revise their opinion of the value over time.

Ease of installation, upgrade, and integration: Reference customers identified these as challenging areas for Informatica products, and said they often required strong technical resources to perform these tasks. The ability to integrate with other technologies outside the Informatica ecosystem can also be challenging. Informatica scored below average for these areas. Informatica is addressing these issues with its latest cloud native versions and quick installation templates in cloud ecosystems.

Hadoop integration with older versions: Execution from Informatica Data Quality to a Hadoop environment is delivered through Informatica Data Engineering Integration (DEI) and Informatica Data Engineering Quality (DEQ), which enable users to execute data quality processes directly in Hadoop ecosystems. Although addressed with version 10.4, older versions of DEI and DEQ are not available with some older versions of Informatica Data Quality. Some reference customers have commented on this scenario.

Information Builders

Information Builders is a Visionary in this Magic Quadrant; in the previous edition, it was also a Visionary. Information Builders has headquarters in New York City, New York, U.S. It offers Omni-Gen Data Quality Edition (version 3.1.5), which became generally available in June 2020. Information Builders has an estimated 320 customers for this product. Its operations are mostly focused in North America and EMEA, and its clients are primarily in the financial services, healthcare, and public sectors.

Strengths

Marketing and sales strategies: Information Builders has transformed its marketing and sales strategies by focusing on a “data first” strategy, which used to be the Phase 2 goal in its sales cycle. This shift has allowed Information Builders to gain market attraction for its Omni-Gen product line. Its dedicated industry focus is on financial services, healthcare and public sectors. Reference customers highly praised the company for its industry specialization. The company’s Academic Alliance Program provides free educational licenses for its software and training curriculum to academic institutions, and this new program provides good opportunities for brand loyalty and awareness.

Cloud and hybrid enablement: Information Builders’ strong initiatives in cloud and hybrid architecture support streamlined deployments either on-premises, in major public cloud environments, or through a fully hosted and managed service with Omni-Gen Total Access Cloud — Data Quality Edition. It scored above average for off-premises implementations via SaaS or cloud-based deployment models.

Integration with MDM and data integration tools: Information Builders scored among the highest for integration with its MDM solution and data integration tools as well as integrating with other MDM solutions. Reference customers indicated an easy process to connect multiple data sources. Ease of integration is especially critical when Information Builders provides the option for clients to leverage partners to provide additional data management solutions such as MDM and metadata solutions.

Cautions

Market share and visibility: Gartner’s market share data shows that Information Builders saw a 4.7% drop in revenue for its data quality product and occupied only 0.2% of overall market share in 2019. Roughly 4% of all survey participants considered the company during the vendor selection. It is also rarely mentioned by users of Gartner’s client inquiry service.

Technology innovation: Information Builders does not appear to have compelling technology innovation, compared with those of its competitors. There is a lack of out-of-the-box machine learning, automated issue resolution, metadata-driven governance, and AI matching. Information Builders expressed that AI/ML capabilities are a major part of their 2020 roadmap.

Scalability, performance and product documentation: Information Builders scored below the average for scalability and performance for diverse data, and product documentation. Reference customers expressed challenges with certain large-scale deployment use cases with product performance. Documentation is also lacking information.

Innovative Systems

Innovative Systems is a Challenger in this Magic Quadrant; in the previous edition, it was a Niche Player. Innovative Systems has headquarters in Pittsburgh, Pennsylvania, U.S. Its data quality products include the Enlighten Data Quality Suite and FinScan, both of which reside on the Synchronos Enterprise Customer Platform (version 5.1.1, which became generally available in March 2020). Innovative Systems has an estimated 1,030 customers for these products. Its operations are mostly in Americas and EMEA, and its clients are primarily in the banking and securities, insurance and media sectors.

Strengths

Core functionalities: Innovative Systems offers solid and stable core data quality functionalities, driven by a crowdsourced AI-based approach. Its reference customers score it above average in the areas of parsing, standardization, cleansing, matching, linking, merging, business-driven workflow, and real-time processing. Over 33% of deployments apply data quality functions to real-time data streams, such as address validation at check-out, GPS tracking data, real-time credit checks and anti-money-laundering screening. Innovative Systems received very high appreciation from its customers.

Customer service and support: Reference customers expressed a high degree of satisfaction with its service and support, giving it one of the highest overall scores in the areas of product technical support, professional services, and end-user training. Innovative Systems retains a loyal customer base with some using its products for over 30 years.

Pricing and value: Reference customers praise Innovative Systems’ flexible approaches to pricing and licensing. Comments indicate that products are reasonably priced and the company exceeds customer expectations while offering good-value products.

Cautions

Mind share and market visibility: Despite a long history in the data quality tool market, and steady growth in revenue and customer base, Innovative Systems still has a relatively limited market presence. Additionally, it is rarely seen by Gartner in competitive situations and is rarely mentioned by users of Gartner’s client inquiry service.

Industry focus and domain usage: Innovative Systems is most active in the financial services sector, where their focus includes data quality for compliance and risk mitigation. More than 75% of its revenue (customer base) is from the banking, securities and insurance sectors. Although financial services is one of the most demanding industries for data quality initiatives, prospective customers in other sectors should check that this vendor’s technology and services will fully meet their business requirements. In addition, Gartner sees relatively limited usage of Innovative Systems products outside the customer/party data domain.

Performance and scalability: A few reference customers commented on their concerns about performance, and the scalability of the vendor’s data quality solutions. Specifically, in some use cases, batch processes use a lot of memory and can be slow with larger files. Innovative Systems scored below average in this area.

Melissa Data

Melissa Data is a Niche Player in this Magic Quadrant; this is its first appearance in the Magic Quadrant. Melissa Data has headquarters in Rancho Santa Margarita, California, U.S. Its data quality products include Contact Zone (version 8.1.0.4, which became generally available in April 2019) Data Quality Suite (version 3333, which became generally available in February 2020), Data Quality Components for SQL Server (version 9.4, which became generally available in November 2019), and Unison (version 1.2.10, which became generally available in April 2020). Melissa Data has an estimated 1,000 customers for these products. Its operations are mostly in North America, and its clients are primarily in the communications, financial services and healthcare sectors.

Strengths

Data validation services: Data Quality Suite is a data verification solution for validating and standardizing various data objects such as business entities, names, addresses, geocodes, emails, and phone numbers via SaaS or API calls based on multiple authoritative reference data sources and domain-specific rules. Melissa offers more comprehensive data validation services than its competitors in the market.

Pricing and value: Melissa Data received high scores for its pricing and contract flexibility. Reference customers report that the pricing model for its products and services is very favorable. The cost of solutions is low in relation to expectations, budget and the value received by its customers.

Customer experience: Reference customers reported a positive experience with Melissa Data’s technical support, professional services, product documentation and overall capabilities with respect to their requirements. They perceive Melissa Data as high value for dependability and responsiveness.

Cautions

Innovation and functionality: Melissa Data’s innovation focus is primarily on data validation and enrichment technologies. It has yet to demonstrate innovation in areas such as data preparation, machine-learning-driven automation, or predictive analytics specifically for data quality processes. Innovation in these areas is increasingly required for digital business.

Multidomain support: Gartner sees limited adoption of Melissa Data’s products by reference customers outside the party data domain, even though it supports healthcare, geographic and financial data. Its multidomain support functionality scored below the survey average.

Geographic focus and international support: Melissa Data generates an estimated 95% of its revenue from North America, and it has minimal presence in the rest of the world. Melissa Data is currently expanding into international markets with offices in Great Britain, Germany, Singapore and India. Currently, Melissa does not support a multilingual user interface, and English is the only supported language out of the box.

MIOsoft

MIOsoft is a Visionary in this Magic Quadrant; in the previous edition, it was also a Visionary. MIOsoft has headquarters in Madison, Wisconsin, U.S. Its data quality product are MIOvantage Platform (version 14.0.170, which became generally available in February 2020), and MIOvantage DQ Explorer (version 1.4, which became generally available in February 2020). MIOsoft has an estimated 500 customers for these product lines. Its operations are mostly in Asia and North America, and its clients are primarily in the telecommunications, insurance, and public sectors.

Strengths

Customer satisfaction: MIOsoft retains a loyal customer base and is among the highest-rated vendors in this Magic Quadrant for customer satisfaction. Reference customers especially praise its product technical support, professional services, and its overall service and support. Gartner has seen similar feedback from its customers for several years.

Ease of installation, use and upgrade: According to our survey, the average time taken to deploy a MIOsoft solution was one of the shortest in this Magic Quadrant. The ease of use, upgrade and migration between versions are also ranked among highest in the survey. Its customers especially praise it for user friendliness for nontechnical business users.

Scalability and performance: Gartner has seen consistent feedback from its reference customers that MIOsoft’s products are powerful when doing real-time data processing of stream data or IoT data. Its performance, scalability and reliability are among the highest in this Magic Quadrant.

Cautions

Marketing and sales strategies: Although MIOsoft has headquarters in U.S., 74% of its revenue in 2019 was from Asia, specifically the telecommunications industry from that region. In addition, an estimated 73% of its sales are from resellers or OEM partners. This presents a narrow focus on the company’s marketing effort, and high dependency on an external sales force. Although offering a high-performance data quality product, MIOsoft’s sales and marketing execution and presence are very limited. Only 1% of survey participants considered it during the vendor selection process, and Gartner does not often see it in competitive situations.

Availability of product expertise: The relatively small size of MIOsoft and its customer base limits the wider availability of relevant product expertise, which may act as a barrier to adoption. MIOsoft currently does not have formal user groups, where customers can learn from their peers. However, MIOsoft is expanding its partnerships in an attempt to address this issue.

Pricing and cost: MIOsoft scored below average in the areas of vendor’s pricing and contract flexibility and cost of vendor’s data quality solutions relative to customers’ expectations and budget.

Oracle

Oracle is a Challenger in this Magic Quadrant; in the previous edition, it was a Leader. Oracle has headquarters in Redwood Shores, California, U.S. Its data quality product is Oracle Enterprise Data Quality (version 12.2.1.4, which became generally available in November 2019). Oracle has an estimated 550 customers for this product. Its operations are geographically diversified, and it has clients in various sectors.

Strengths

Market growth: Oracle has recovered its revenue by growing Oracle Enterprise Data Quality (EDQ) by 6.2% YoY based on Gartner’s Market Share report. This growth is largely from cloud-based deployments. Half of Oracle’s reference customers have EDQ deployed in the cloud or as hybrid models.

Product strategy: In addition to the stand-alone EDQ product, Oracle data quality capabilities are also built into the Autonomous Database and embedded in Oracle Cloud Applications. Oracle also makes the data profiling function directly available for all users where data is used, managed, and organized in Oracle applications. Three-quarters of reference customers claim that they are using Oracle EDQ in ongoing operation of business applications. This is above the survey average of 60% from all vendors.

Core data quality features: Oracle scored above average in the areas of standardization, cleansing, matching, linking, merging, and business rule creation and deployment. Reference customers especially praised its powerful and flexible matching and transformation capabilities using out-of-the-box tools.

Cautions

Technology innovation: Oracle’s innovation primarily focuses on its database and data platform technologies. Gartner has yet to see Oracle introduce innovations and practices directly to EDQ — for example, enabling machine-learning-based and metadata-driven automation in data quality processes. Oracle is considered by Gartner less competitive compared with its rivals in the innovative technology roadmap for its data quality products. Oracle is investing in innovative features specifically for data quality processes — including those powered by AI and ML technologies — which will be available to EDQ customers in future releases.

Performance and scalability: Half of Oracle’s reference customers indicated that they face inadequate performance and scalability when handling very high volumes of data. They also highlighted that tuning options for scalability are not sufficient and expressed a desire for improvements in this area. Oracle EDQ scored below the survey average in this area.

Pricing and licensing: Oracle reference customers continue to identify high prices and a complex licensing model as areas of concern. Oracle also scored below average when it comes to customer satisfaction with the value the product provides for the money spent.

Precisely

Precisely is a Leader in this Magic Quadrant; in the previous edition, it was also a Leader, as Syncsort. Precisely has headquarters in Pearl River, New York, U.S. Its data quality products include the Precisely Trillium product family (version 16, which became generally available in February 2020) and Precisely Spectrum Quality family (version 2019, which became generally available in December 2019). Precisely Trillium includes Trillium Discovery, Trillium Quality and Trillium Geolocation. Precisely Spectrum Quality includes Spectrum Quality, Spectrum Discovery and Spectrum Global Addressing. Precisely has an estimated 4,000 customers for these products. Its operations are geographically diversified, and it has clients in various sectors.

Strengths

Business momentum and market growth: The acquisition of the Pitney Bowes software and data business in December 2019 has extended Precisely’s customer bases and technologies. Precisely now has the third-highest market share (2019) by revenue, right after SAP and Informatica. By adding location intelligence, metadata and data stewardship capabilities, Precisely is now in a better competitive position with more comprehensive product portfolios and cross-sell opportunities.

Core data quality functionality: Precisely reference customers score its core data quality functionality — including parsing, standardization and cleansing, matching, linking and merging — above the survey average, for Trillium product lines. They also praise the maturity of the products, which has been a consistent strength for many years.

Address validation and geocoding capabilities: Precisely was scored highly by its reference customers for both Trillium and Spectrum Quality product lines for address validation and geocoding/spatial data enrichment capabilities. This is a long-standing strength for both product lines.

Cautions

Product strategy and roadmap: Precisely has gone through a rebranding process since the acquisition. It has also disclosed plans for the Trillium and Spectrum lines to converge through a cloud-based architecture, supported by its core technologies. How well it is able to execute on this roadmap remains to be seen and should be closely monitored. In addition, any consolidation of pricing and licensing models or product support should be considered by existing customers of both products.

Ease of installation, upgrade and integration: Reference customers identified ease of installation, deployment, upgrade and migration between versions as challenging areas for the Spectrum Quality product line. In addition, some implementation partners’ staff (those from Pitney Bowes) seem to lack the product knowledge and skills needed. Precisely scored below average for these areas of concerns.

Product documentation and end-user training: The completeness, clarity and usefulness of Precisely’s product documentation was rated below average by our customer references for both Trillium and Spectrum Quality product lines. Customers commented that documentation needs to be more user friendly. End-user training was also a concern, with initial learning considered to be cumbersome for both products.

Redpoint

Redpoint is a Niche Player in this Magic Quadrant; in the previous edition, it was also a Niche Player. Redpoint has headquarters in Wellesley Hills, Massachusetts, U.S. Its data quality product is Redpoint Data Management (version 9.0, which became generally available in January 2020). The vendor has an estimated 280 customers for this product. Its operations are mostly in North America and Asia, and its clients are primarily in the retail, financial services and healthcare sectors.

Strengths

Customer satisfaction: Redpoint retains a loyal customer base and is among the highest-rated vendors in this Magic Quadrant for customer satisfaction. Some of the company’s reference customers comment that Redpoint has been a great partner for more than 10 years, and have seen consistent improvement in Redpoint’s product features and collaborative efforts to meet their business requirement.

Ease of installation, upgrade and use: Reference customers for Redpoint expressed strong satisfaction with the ease of installation, deployment and use of its products and scored it among the highest here. Customers especially highlighted user friendliness telling us that automations were easy and intuitive. This feedback is consistent from previous Magic Quadrant surveys.

Performance and scalability: Redpoint scored above the average for the performance and scalability of its data quality products, specifically in supporting real-time data processing in Hadoop environments with billions of transaction records.

Cautions

Market growth and visibility: Redpoint has a small customer base in the data quality segment, leading to its low market share. It had a relatively small increase in data quality only customer numbers in 2019, and remains relatively less-known in the overall data quality market. Only 4% of survey participants considered Redpoint during the vendor selection process. It is also rarely mentioned by users of Gartner’s client inquiry service.

Pricing and licensing model: Reference customers for Redpoint indicated that its solution is expensive compared with those of competitors considered, and that it has a complicated licensing model. They also continued to express a desire for a more-flexible licensing structure. Redpoint’s overall survey scores for pricing and licensing approach and contract flexibility are below the averages for vendors in this Magic Quadrant.

User community: The quality of Redpoint’s peer user community scored below the average in the survey. Reference customers remarked that lack of activity in the company’s user forums meant they were of little benefit. Such resources are important for enabling customers to quickly benefit from the skills and experience of their peers.

SAP

SAP is a Leader in this Magic Quadrant; in the previous edition, it was also a Leader. SAP has headquarters in Walldorf, Germany. Its data quality products include SAP Information Steward (version 4.2.13, which became generally available in December 2019), SAP Data Services (version 4.2.13, which became generally available in December 2019), and SAP Data Intelligence (version 3.0, which became generally available in March 2020), which replaces SAP Data Hub. SAP has an estimated 24,200 customers for these product lines. Its operations are geographically diversified, and it has clients in various sectors.

Strengths

Market growth and presence: SAP continues to have strong growth in market share, ranked as the highest (20%) for data quality products and services. It has 15% YoY combined data quality customer growth. SAP has a large customer base and strong brand recognition. It is often listed in competitive situations.

Comprehensive data management portfolio: SAP data quality products are undergoing a strategic transformation centered on SAP Data Intelligence. This transformation brings tighter integration with existing SAP data management capabilities such as data integration, data preparation, and metadata cataloging and governance. This comprehensive portfolio provides added value to SAP customers with an integrated data management solution that also integrates with SAP’s business applications and business processes.

Product strategy and innovation: SAP Data Intelligence is the next evolution of the SAP Data Hub on-premises solution. All current SAP Data Hub customers will be moved to SAP Data Intelligence at no additional charge and with no loss of functionality. In addition to existing strong metadata capabilities from Data Hub, SAP Data Intelligence adds machine learning content for AI/ML operations. SAP consistently innovates in many areas and is driving the automation of data quality processes.

Cautions

UI interface and dashboard: Reference customers highlighted the need for improvements in user interface for better user experience, especially the management console in Data Services, and rules editor function in Information Steward. Some customers also commented about a lack of dashboard capabilities in some components in SAP Data Services. SAP has just rewritten its UI (in SAP UI5) to refresh and modernize the user experience in the latest version of Information Steward.

Pricing and licensing, value: Overall rating for evaluation and contract negotiation is below average with SAP’s pricing and licensing approach causing concern among some customers. They have cited concerns related to the cost of the product, and the value the product provides for the money spent. This feedback has been consistent for the past several years.

Technical support and product documentation: SAP reference customers continue to cite poor technical support and product documentation as concerns. As such, scores for these areas are below the survey average. SAP has increased its efforts to improve customer support and satisfaction by introducing next-generation support to include channels for real-time support and collaboration.

SAS

SAS is a Leader in this Magic Quadrant; in the previous edition, it was also a Leader. SAS has headquarters in Cary, North Carolina, U.S. Its data quality products are SAS Data Management (version 9.4M6, which became generally available in November 2018), SAS Data Quality Desktop (version 9.4M6, which became generally available in November 2018), SAS Data Quality (version 3.5, which became generally available in April 2020), SAS Data Preparation (version 2.5, which became generally available in April 2020), and SAS Data Governance (version 9.4M6, which became generally available in November 2018). SAS has an estimated 2,700 customers for these product lines. Its operations are geographically diversified, and it has clients in various sectors.

Strengths

Market understanding and growth: SAS has demonstrated strong market understanding and has aligned its sales strategy and marketing accordingly. SAS continues to have solid growth in revenue (6% increase in data quality products in 2019 based on Market Share report), and customer base.

Product strategy and innovation: SAS is strategically transforming its data quality products by bringing them into SAS Viya, a cloud-native platform with improved open-source support. SAS Viya enables tighter integration of data quality functions with SAS analytics, data integration, data preparation and data governance. The vendor has also invested heavily in active metadata and ML-based self-learning capabilities to provide recommendations to users and suggest next best actions during the data preparation process.

Product technical support: SAS technical support has the highest score across all vendors. Its customers appreciated the true partnership through excellent support and customer service from the vendor.

Cautions

Lack of some data quality features: SAS lacks native support in several key emerging technologies, such as packaged machine learning algorithms and techniques to automate parsing of data in its data quality product lines. SAS plans to provide innovative features in 2021 as part of SAS Data Quality on SAS Viya at no additional cost. SAS also does not have packaged solution for transactional data and machine data. Instead, achieving this requires custom development to extend the existing quality knowledge base. SAS’s professional services team typically works with customers for this enhancement.

Ease of use, implementation and upgrade: Some reference customers highlighted the need for improvements in the ease of installation, upgrade and migration for SAS Data Management and SAS Data Quality; specifically, patch management is one area with difficulties. SAS scored below average in these areas of concern.

Pricing and contracts: Even though SAS has simplified its pricing and licensing structure, some reference customers still mentioned that it is difficult to understand the licensing models and the limitations of the licensing agreements. Some also commented that the contract renewal process can be difficult. They highlighted contracts and pricing as areas that require improvement.

Syniti

Syniti is a Niche Player in this Magic Quadrant; in the previous edition, it was also a Niche Player, as BackOffice Associates. Syniti has headquarters in Hyannis, Massachusetts, U.S. Its data quality products are Syniti Data Quality (version 7.1.2, which became generally available in February 2020), Data Stewardship Platform (version 7.1.2, which became generally available in February 2020), and SAP Advanced Data Migration by Syniti (version 7.0.6, which became generally available in September 2019). Syniti has an estimated 370 customers for these product lines. Its operations are mostly in North America and Asia, and its clients are primarily in the pharmaceuticals and life sciences, manufacturing and food sectors.

Strengths

Partnership with SAP and beyond: Syniti has a long history of partnership with SAP. It has a global reseller agreement and elite partnership status in many SAP programs. Syniti provides data quality and data migration solutions and implementation services designed specifically for SAP. Syniti has recently been seen to extend its partnership with multiple other big vendors, leveraging its strength in system integration and implementation experiences. With this extended partner ecosystem, Syniti has continued to broaden its sales opportunities.

Multidomain, and business-driven workflow: Syniti’s solution provides good support for all data domains and offers capabilities suitable for a wide variety of customer use cases and data journeys. The score is among the highest in this Magic Quadrant. Reference customers also praised highly on business-driven workflow of its data quality products.

Pricing, licensing and value: Refinements to the pricing and licensing structure of Syniti’s data quality solution has been well received by our reference customers. They scored Syniti among the highest in pricing and contract flexibility. The value of the vendor’s data quality solution also delivered well above the average in relation to its cost.

Cautions

Market growth and visibility: There were some increases in Syniti’s customer base last year, but a very limited increase in revenue. Eighty percent of its revenue came from consulting and professional services. Syniti’s data quality solution is mentioned very infrequently in Gartner client inquiries, and we see it mentioned in few competitive situations. In the survey for this Magic Quadrant, few respondents listed Syniti in their top three candidates for data quality solution providers.

Data profiling, address validation and location spatial data enrichment: Reference customers for Syniti scored its support for data profiling, address standardization and validation, and location enrichment capabilities lower than the average for vendors in this Magic Quadrant.

Professional services scarcity: Reference customers specifically indicated that externally sourced, experienced resources are difficult to find. However, good ETL/SQL resources can be easily trained in the methodology and can come up to speed quite quickly. Nevertheless, reference customers would like to see more system integrators and outsourcers with skills in Syniti’s products.

Talend

Talend is a Leader in this Magic Quadrant; in the previous edition, it was also a Leader. Talend has headquarters in Redwood City, California, U.S. Its data quality product is Talend Data Management Platform (version Winter ‘20, which became generally available in January 2020). Talend has an estimated 1,800 licensed customers for this product line. It also has two open-source data quality products: Talend Open Studio for Data Quality (version 7.2 from June 2019) and Talend Data Preparation Free Desktop (version 2.5 from May 2018). Its operations are geographically diversified and its clients are primarily in the media and services, financial services, and manufacturing sectors.

Strengths

Business momentum and market strategy: Talend executed strongly in the data quality solution market in 2019, growing license revenue by approximately 47% and data quality customers by 21%. Talend demonstrated good market understanding, healthy sales and a marketing strategy aligned with emerging trends. Talend increasingly appears in competitive situations seen by Gartner.

Core data quality capabilities: Talend reference customer ratings for overall product capabilities was above the average. The areas of data profiling, standardization and cleansing, matching, linking, and merging, and multidomain support met customer expectations well.

Ease of use, implementation, and integration: Some reference customers praised Talend for its user friendliness for business and nonbusiness users, and its simple and robust SaaS Solution. Talend’s customers also said the integration with other applications was easy and flexible, and offers a large set of connectors to other technical tools.

Cautions

Technical support: Talend’s reference customers raised concerns about the vendor’s technical support. It rated below the average across all vendors for not meeting customer expectations due to slow responses or poor reactivity. Specifically, some customers commented that technical support and follow-up of open cases required improvement. Talend has recently reorganized the Customer Success function with new leadership and processes.

Monitoring and reporting: Talend’s reference customers indicate that monitoring capability is not sufficient to support various use cases, and that improvement is required in reporting and dashboarding for monitoring of flows.

Communication of product roadmap: Some reference customers indicated a lack of communication regarding Talend’s product roadmap and said that its vision for the future was not very clear and not well shared.

Vendors Added and Dropped

We review and adjust our inclusion criteria for Magic Quadrants as markets change. As a result of these adjustments, the mix of vendors in any Magic Quadrant may change over time. A vendor’s appearance in a Magic Quadrant one year and not the next does not necessarily indicate that we have changed our opinion of that vendor. It may be a reflection of a change in the market and, therefore, changed evaluation criteria, or of a change of focus by that vendor.

Added

Infogix

Melissa Data

Precisely (formerly included as Syncsort)

Syniti (formerly included as BackOffice Associates)

Dropped

Pitney Bowes (acquired by Syncsort in December 2019)

Inclusion and Exclusion Criteria

The inclusion criteria represent the specific attributes that vendors had to have in order to be included in this Magic Quadrant.

To be included, vendors had to fulfill all the following criteria. They must:

Offer stand-alone on-premises software solutions and cloud-based services that are positioned, marketed and sold specifically for general-purpose data quality applications. Vendors that provide several data quality product components must demonstrate that these are integrated, and collectively meet the full inclusion criteria for this Magic Quadrant.

Deliver core data quality functions for the following, at minimum: profiling, parsing, standardization, cleansing, interactive visualization, matching, multidata domain support, business-driven workflow, business rule creation and rule-based data validation.

Support multiple data domains and diverse use cases across different industries.

Support data quality functionality with packaged capabilities in more than one language and for more than one region.

Support the above functions in both scheduled (batch) and interactive (real-time) modes.

Enable large-scale deployment via server-based and cloud-based runtime architectures that can support concurrent users and applications.

Maintain an installed base of at least 100 current production (maintenance or subscription fee-paying) customers for their data quality product(s).

The customer base for production deployment must include customers in more than one region (North America, Latin America, EMEA, Africa and Asia/Pacific).

The following types of vendor were excluded from this Magic Quadrant, even if their products met the above criteria:

Vendors that are limited to deployments in a single specific application environment, industry, language, or data domain are excluded, because they do not provide complete market coverage.

Vendors that support limited data quality functionalities or addressing very specific data quality problems (for example, address cleansing and validation) are excluded, because they do not provide the complete suites of data quality functionality expected of today’s data quality solutions.

Vendors that operate only in a single country and support only one language.

Vendors that lack the integrability or interoperability with other data management solutions such as metadata, MDM or data integration solutions.

Vendors that support only on-premises deployment and have no option for cloud-based deployment on any public cloud environment (for example, AWS, Azure, or Google Cloud).

Evaluation Criteria

Ability to Execute

Gartner analysts evaluate technology vendors on the quality and efficacy of the processes, systems, methods and procedures that enable their performance to be competitive, efficient and effective, and to positively affect their revenue, retention and reputation.

Gartner evaluates vendors’ Ability to Execute in the data quality solutions market by using the following criteria:

Product or service: The vendor’s core goods and services that compete in and/or serve the defined market. Included are current product and service capabilities, quality, feature sets, skills and so on. Products and services can be offered natively or through OEM agreements/partnerships, as defined in the Market Definition and detailed in the subcriteria.

Overall viability: Viability includes an assessment of the overall organization’s financial health, the financial and practical success of the business unit, and the likelihood that the individual business unit will continue offering and investing in the product(s). The vendor’s financial strength (as assessed by revenue growth, profitability and cash flow) and the strength and stability of its people and organizational structure are considered. This criterion reflects buyers’ increased openness to considering newer, less-established and smaller providers with differentiated offerings.

Sales execution/pricing: The organization’s capabilities in all presales activities and the structure that supports them. Included are deal management, pricing and negotiation, presales support and the overall effectiveness of the sales channel. We evaluate the effectiveness of the vendor’s pricing model in light of current and future customer demand trends and spending patterns (for example, operating expenditure and flexible pricing), as well as the effectiveness of its direct and indirect sales channels.

Market responsiveness/record: The vendor’s ability to respond, change direction, be flexible and achieve competitive success as opportunities develop, competitors act, customer needs evolve and market dynamics change. This criterion also considers the vendor’s history of responsiveness to changing market demands. We evaluate the degree to which the vendor has demonstrated the ability to respond successfully to market demand for data quality capabilities over an extended period.

Marketing execution: The clarity, quality, creativity and efficacy of programs designed to deliver the organization’s message. This messaging is intended to influence the market, promote the brand and the business, increase brand awareness, and establish a positive identification with the product/brand and organization in the minds of buyers. This “mind share” can be driven by a combination of publicity, partnerships, promotional initiatives, thought leadership, social media, referrals and sales activities. We evaluate the overall effectiveness of a vendor’s marketing efforts, the degree to which it has generated mind share, and the magnitude of the market share achieved as a result.

Customer experience: Relationships, products and services/programs that enable clients to be successful with the products evaluated. Specifically, we include the quality of technical and account support that customers receive. We may also include ancillary tools, customer support programs, availability of user groups, SLAs and so on. We evaluate the level of satisfaction expressed by customers with a vendor’s product support and professional services. We also assess their overall relationship with the vendor, as well as customer perceptions of the value of the vendor’s data quality solution relative to costs and expectations.

Operations: The vendor’s ability to consistently meet its goals and commitments. Factors considered include the quality of the organizational structure, skills, experiences, programs, the stability of key staff and other means that enable the vendor to operate effectively and efficiently.

Table 1: Ability to Execute Evaluation Criteria

Product or Service

High

Overall Viability

Medium

Sales Execution/Pricing

High

Market Responsiveness/Record

Medium

Marketing Execution

Medium

Customer Experience

High

Operations

Low

Source: Gartner

Completeness of Vision

Gartner analysts evaluate vendors on their ability to convincingly articulate logical statements. The evaluation covers current and future market direction, innovation, customer needs and competitive forces, and how well they correspond to Gartner’s view of the market. Gartner assesses vendors’ Completeness of Vision in the data quality tool market by using the following criteria:

Market understanding: The degree to which the vendor leads the market in new directions (in terms of technologies, products, services or otherwise). The vendor’s ability to adapt to significant market changes and disruptions, such as by supporting business-centric roles and providing advanced data quality functionality for the IoT (connectivity and deployment), data lakes, streaming data and external data. Also considered is the degree to which vendors are aligned with the significant trend of convergence with other data-management-related markets — specifically, the markets for data integration tools and MDM solutions.

Marketing strategy: We look for clear, differentiated messages, consistently communicated internally and externally through channels, social media, advertising, customer programs and positioning statements. Also considered are the degree to which the vendor’s marketing approach aligns with and/or exploits emerging trends (such as bimodal data governance and business-centric data quality programs) and the overall direction of the market.

Sales strategy: We look for a sound strategy for selling products that uses an appropriate network of direct and indirect sales resources, partnerships, and marketing, service and communication affiliates. The goal is to extend the scope and depth of the vendor’s market reach, skills, expertise, technologies, services and customer base. We particularly assess the use of partnerships. A sound sales strategy also aligns sales models with customers’ preferred buying approaches, such as freemium programs and subscription-based pricing.

Offering (product) strategy: This criterion concerns the vendor’s product development and delivery approach, emphasizing differentiation, functionality, product portfolio, methodology and features as these map to current and future requirements. It also covers the degree to which the vendor’s product roadmap reflects demand trends, fills current gaps or weaknesses, and emphasizes competitive differentiation. Also considered are the breadth of the vendor’s strategy regarding a range of product and service delivery models, from traditional on-premises deployment to SaaS and cloud-based models.

Business model: This criterion concerns the design, logic and execution of the organization’s business proposition for revenue growth and sustained success. We consider the vendor’s overall approach to executing its strategy for the data quality solutions market. This approach includes delivery models, funding models (public or private), development strategies, packaging and pricing options, and partnership types (such as joint marketing, reselling, OEM and system integration/implementation).

Vertical/industry strategy: We assess the vendor’s strategy to direct resources, skills and offerings to meet the specific needs of individual market segments, including vertical markets. The degree of emphasis that the vendor places on vertical-market solutions is considered, as is the depth of its vertical-market expertise, including certifications.

Innovation: We assess the extent to which the vendor demonstrates creative energy in thought leadership and in differentiating ideas and product roadmaps that could significantly extend or even reshape the market in a way that adds value for customers. Particularly, we examine how well vendors support— or plan to support — key trends with regard to personas, data diversity, latency, data quality analytics, intelligent capabilities and deployment, for example.

Geographic strategy: We evaluate the vendor’s strategy to direct resources, skills and offerings to meet the specific needs of geographies outside its “home” geography, either directly or through partners, channels and subsidiaries, as appropriate. We do so in light of global demand for data quality capabilities and expertise.

Table 2: Completeness of Vision Evaluation Criteria

Market Understanding

High

Marketing Strategy

Medium

Sales Strategy

Medium

Offering (Product) Strategy

High

Business Model

Low

Vertical/Industry Strategy

Low

Innovation

High

Geographic Strategy

Medium

Source: Gartner

Quadrant Descriptions

Leaders

Leaders demonstrate strength in depth across the full range of data quality functions, including core functions (parsing, standardization and cleansing), profiling, interactive visualization, matching, multidomain support, business-driven workflow, business rule development and data validation.

Leaders exhibit a clear understanding of dynamic trends in the data quality market; they explore and execute thought-leading and differentiating ideas; and they deliver product innovations based on the market’s demands.

Leaders align their product strategies with the latest market trends. These trends include focusing on a nontechnical audience, trust-based governance, growth in data diversity, low data latency, data quality analytics (not just reporting) and intelligent capabilities (such as machine learning and artificial intelligence). Other trends are new delivery options (such as cloud, hybrid cloud and IoT edge deployment), and alternative pricing and licensing models (such as open source and subscriptions).

Leaders address all industries, geographies, data domains and use cases. Their products support multidomain and alternative deployment options such as SaaS or microservices. They offer excellent support for business roles and easy-to-use visualization, and include out-of-the-box machine learning and predictive analytics.

Leaders offer extensive support for a variety of traditional and new data sources (including cloud platforms, IoT platforms, Hadoop and mobile devices), a trust-based governance model, and delivery of enterprise-level data quality implementations.

Leaders have significant size, an established market presence, and a multinational presence (either directly or through a parent company).

Leaders undertake clear, creative and effective marketing, which influences the market, promotes their brand, and increases their mind share.

Challengers

Challengers have established presence, credibility and viability, along with robust product capabilities and solid sales and marketing execution.

Challengers may not have the same breadth of offering as Leaders, and/or in some areas may not demonstrate as much thought leadership and innovation. For example, they may focus on a limited number of data domains (customer, product and location data, for example).

Challengers may lack capabilities in areas such as streaming data, machine learning predictive analysis and support for new data sources.

Compared with Leaders, Challengers often exhibit less understanding of some areas of the market, and their product strategies may suffer from a lack of differentiation.

Visionaries

Visionaries are innovators.

Visionaries demonstrate a strong understanding of trends in the market. These include focus on a nontechnical audience, trust-based governance, growth in data diversity, low data latency, data quality analytics, intelligent capabilities (such as machine learning). Also included are new delivery options (such as cloud and IoT edge deployment), and alternative pricing models (such as open source and subscriptions). Visionaries’ product capabilities are mostly aligned with these trends, but not as completely as Leaders.

Although Visionaries can deliver good customer experiences, they may lack the scale, market presence, brand recognition, customer base and resources of Leaders.

Niche Players

Niche Players often specialize in a limited number of industries, geographic areas, market segments (such as small and midsize businesses) or data domains (such as customer data or product data). They often have strong offerings for their chosen areas of focus and deliver substantial value for customers in those areas.

However, Niche Players typically have limited market share and presence, limited functionalities, or lack financial strength. Niche Players often have to catch up with the latest innovations, such as the IoT (connectivity and deployment), machine learning and interactive visualization.

Context

Every organization — no matter how big or small — needs data quality. Every business process needs data quality whether it is as simple as processing a new order, or as complicated as approving a loan application. Every decision needs data quality whether it’s decided offline or in real time. Data quality is vital for every aspect of business operation. However, Gartner reference survey data indicates that managing data quality issues across the organizational landscape is increasingly cited as a top challenge (by 60% of respondents) to data management practice (see “Survey Analysis: Data Management Struggles to Balance Innovation and Control”).

Organizations with multiple business units operating in several geographic regions with many customers, employees, suppliers and products will inevitably face more severe data quality issues. Low levels of data literacy and silo-oriented attitudes, prevalent among senior business leaders, often result in a lack of investment in systemic and sustainable data quality improvement. Consequently, key business goals, such as financial performance and customer experience, are adversely impacted.

Gartner’s Magic Quadrant customer reference survey shows that organizations estimate the average cost of poor data quality at $12.9 million every year. This number is likely to rise as business environments become increasingly digitalized and complex. As data quality is seen as a mandatory aspect of every business, data quality solutions are in greater demand, and are often embedded into critical business applications such as CRM and ERP.

Use this Magic Quadrant to help you find the right vendor and product for your organization’s needs. Gartner strongly advises against selecting a vendor simply because it is in the Leaders quadrant. A Challenger, Niche Player or Visionary could be the best match for your requirements. Use this Magic Quadrant in combination with the companion “Critical Capabilities for Data Quality Solutions” and “Toolkit: RFP Template for Data Quality Tools,” as well as Gartner’s client inquiry service.

Given the current economic and market conditions caused by the COVID-19 pandemic, it is also important to analyze cost-saving opportunities by looking at the nontechnological characteristics of vendors, such as acquisitions processes, pricing models, speed of deployment, total cost of ownership, availability of skills, and support and service capabilities.

In addition, application leaders should take the following actions to improve data quality best practices and optimize the use of modern data quality solutions:

Embrace adaptive data and analytics governance by certifying the trust levels of data sources and data itself, and identifying the data quality requirement for each level, because it may not be practical or possible to achieve 100% perfect data. This will enable a more focused approach to prioritizing data quality improvement efforts, and create agility and flexibility for collaboration among stakeholders.

Gauge the level of technical innovation exhibited by data quality vendors by evaluating their data quality intelligence capabilities, such as machine learning, predictive analtyics and knowledge bases, to meet the challenges posed by data of increasing diversity.

Train nontechnical business users to become more data literate so that they can meaningfully use the new automation capabilities offered to them. Ensure that they can properly interpret, and manage data given business context, and decide proper use of the data while adhering to data governance and data security requirements.

Market Overview

Trusted, high-quality data is a vital component for the success of digital initiatives. Organizations are accelerating the speed of their digital transformations by introducing digital products, adopting cloud computing, modernizing their business processes and embracing distributed infrastructure to leave the data at edges. At the same time, they are facing greater challenges in data quality from a mix of diversified and distributed datasets. In addition, growing regulatory requirements from governments and industries, such as GDPR and CCPA, put more pressure on organizations to manage personal data properly. One of the most important tasks of any data quality initiative is how to incorporate regulatory requirements into the architecture of products and services.

Data and analytics leaders need to understand the importance of empowering nontechnical business users as the primary audience for data quality solutions, and adopting more flexible, trust-based data governance models. They are applying these tools to a growing array of use cases, including enterprise operations, data integration, data migration, MDM and — in two of the hottest trends — data and analytics governance, and AI development.

During the COVID-19 pandemic, we have also seen public health agencies and healthcare organizations heavily using data quality solutions to prepare data from a wide range of sources for virus detection and outbreak prediction.

Data quality vendors are competing to address these requirements by introducing an array of innovations and technologies. These include machine learning, active metadata repositories, and knowledge graphs for impact analysis, all of which they are embedding in data quality solutions. Consequently, the data quality solutions market has changed dramatically over the past few years. It has evolved from providing simple applications used mainly by IT for a single purpose (such as data cleansing), to complete solutions for addressing a range of data quality problems, with built-in workflows, knowledge bases, collaboration, and automation.

Gartner is seeing the following shifts in the focus of the data quality solutions market:

Automation: Embedded AI is being used to reduce manual tasks and drive better automation, especially in areas that traditionally require intensive manual tasks.

Integration: Integration with metadata management, data integration, data preparation and data governance solutions, as end-to-end data management solutions.

Extensibility: Extended data quality functions across heterogenous data sources and landscapes from a single data quality platform.

Scalability: Scaled over a wide range of data sources, data volumes, use cases and latencies.

Simplicity: Persona-based UIs and workflows, design centrally and deploy anywhere methodologies, and simplified licensing models.

The data quality solutions market has continued to grow strongly, reaching $1.77 billion in 2019, an increase of 6.2% over 2018 (see “Market Share: Data Quality Tools, Worldwide, 2019”). Gartner’s interactions with clients on data quality topics have also shown high demand, increasing 45% between 2018 and 2019.

In 2019, approximately 50% of the market was represented by four vendors: SAP, Informatica, Precisely and Experian. The remaining half was divided between a large number of providers, including some megavendors (such as IBM, Oracle and SAS) and smaller players (such as Ataccama, Talend and Information Builder).

As the data quality solutions market matures, and data quality solutions are increasingly viewed as must-have applications, choosing a vendor based on product capabilities alone will become increasingly difficult. For example, many core data quality features (such as profiling, matching, linking and cleansing), are very similar among every mainstream data quality vendor.

As data management vendors across all markets use AI to simplify and accelerate the productivity of their data management activities, the data quality market is pursuing the same trend. Vendors are heavily investing in emerging technologies like augmented data quality to differentiate themselves, as seen in “Hype Cycle for Data Management, 2020.”

Newer technologies typically offer more out-of-the-box functionality and proven methodologies. Consequently, the market’s smaller vendors are increasingly challenged to maintain high levels of investment and are often forced to become, or stay as, Niche vendors.

At the same time, however, customer dissatisfaction (with the high prices typically charged by larger vendors, their relatively inflexible pricing models, less-attentive customer support and service, and lengthy deployment times) creates opportunities for innovative, smaller vendors and startups. The larger vendors recognize this threat and are responding, albeit slowly, by offering additional implementation resources and alternative pricing options.

Considering the impact of the COVID-19 pandemic, enterprise is looking for competitive pricing models to reduce costs. This is another important factor for all vendors striving for competitive advantage.

Evidence

The analysis in this document is based on information from a number of sources, including:

An RFI process that engaged vendors in this market. It elicited extensive data on functional capabilities, customer base demographics, financial status, pricing and other quantitative attributes.

Interactive briefings in which each vendor provided Gartner with updates on its strategy, market positioning, recent key developments and product roadmap.

A web-based survey of reference customers identified by each vendor. This captured data on usage patterns, levels of satisfaction with major product functionality categories, various non-technology-related vendor attributes (such as pricing, product support and overall service delivery), and more. In total, 154 organizations associated with 16 vendors across all major regions provided input on their experiences with vendors and their solutions.

Feedback about solutions and vendors captured during conversations with users of Gartner’s client inquiry service.

Market share and revenue growth estimates developed by Gartner’s technology and service provider research unit.

Peer feedback from Gartner Peer Insights, which is a peer-driven ratings and reviews platform for enterprise IT solutions and services covering more than 300 technology markets and 3,000 vendors.

Evaluation Criteria Definitions

Ability to Execute

Product/Service: Core goods and services offered by the vendor for the defined market. This includes current product/service capabilities, quality, feature sets, skills and so on, whether offered natively or through OEM agreements/partnerships as defined in the market definition and detailed in the subcriteria.

Overall Viability: Viability includes an assessment of the overall organization's financial health, the financial and practical success of the business unit, and the likelihood that the individual business unit will continue investing in the product, will continue offering the product and will advance the state of the art within the organization's portfolio of products.

Sales Execution/Pricing: The vendor's capabilities in all presales activities and the structure that supports them. This includes deal management, pricing and negotiation, presales support, and the overall effectiveness of the sales channel.

Market Responsiveness/Record: Ability to respond, change direction, be flexible and achieve competitive success as opportunities develop, competitors act, customer needs evolve and market dynamics change. This criterion also considers the vendor's history of responsiveness.

Marketing Execution: The clarity, quality, creativity and efficacy of programs designed to deliver the organization's message to influence the market, promote the brand and business, increase awareness of the products, and establish a positive identification with the product/brand and organization in the minds of buyers. This "mind share" can be driven by a combination of publicity, promotional initiatives, thought leadership, word of mouth and sales activities.

Customer Experience: Relationships, products and services/programs that enable clients to be successful with the products evaluated. Specifically, this includes the ways customers receive technical support or account support. This can also include ancillary tools, customer support programs (and the quality thereof), availability of user groups, service-level agreements and so on.

Operations: The ability of the organization to meet its goals and commitments. Factors include the quality of the organizational structure, including skills, experiences, programs, systems and other vehicles that enable the organization to operate effectively and efficiently on an ongoing basis.

Completeness of Vision

Market Understanding: Ability of the vendor to understand buyers' wants and needs and to translate those into products and services. Vendors that show the highest degree of vision listen to and understand buyers' wants and needs, and can shape or enhance those with their added vision.

Marketing Strategy: A clear, differentiated set of messages consistently communicated throughout the organization and externalized through the website, advertising, customer programs and positioning statements.

Sales Strategy: The strategy for selling products that uses the appropriate network of direct and indirect sales, marketing, service, and communication affiliates that extend the scope and depth of market reach, skills, expertise, technologies, services and the customer base.

Offering (Product) Strategy: The vendor's approach to product development and delivery that emphasizes differentiation, functionality, methodology and feature sets as they map to current and future requirements.

Business Model: The soundness and logic of the vendor's underlying business proposition.

Vertical/Industry Strategy: The vendor's strategy to direct resources, skills and offerings to meet the specific needs of individual market segments, including vertical markets.

Innovation: Direct, related, complementary and synergistic layouts of resources, expertise or capital for investment, consolidation, defensive or pre-emptive purposes.

Geographic Strategy: The vendor's strategy to direct resources, skills and offerings to meet the specific needs of geographies outside the "home" or native geography, either directly or through partners, channels and subsidiaries as appropriate for that geography and market.

View Analyst Melody ChienMelody Chien

Sr Director Analyst

View Analyst Ankush JainAnkush Jain

Sr Principal Analyst


© 2020 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. It consists of the opinions of Gartner's research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or influence from any third party. For further information, see "Guiding Principles on Independence and Objectivity."