Petrolink, a leading provider of data management solutions for the oil and gas industry, has open-sourced its Data Quality Algorithm under its GitHub repositories. The algorithm is a Python library that calculates six data quality dimensions for any data set and can be easily integrated into any project. Petrolink also provides sample code showing how to use the library to run through comma-separated values (CSV) in a file.
Releasing the algorithm under open-source code provides transparency to data providers, operators, and to data consuming applications. We are passionate about continuous improvement in the field of data monitoring and data quality – not just for our customers but for everyone. By sharing the code, we want to promote an open dialog between data providers and consumers.
There should be no secret as to how data quality is measured, calculated, and scored. The importance of good quality data cannot be overemphasized, when you consider the safety risks and costly errors that can occur due to bad data. We hope that the broader community will collaborate on which dimensions should be measured and agree on the algorithms to calculate each dimension.
Peter Gonzalez, Chief Strategy Officer of Petrolink International
About Petrolink: Petrolink is a global leader in data management solutions for the oil and gas industry. Petrolink’s mission is to empower its clients with the best data and insights to optimize their operations and performance. Petrolink offers a range of services and products, such as data acquisition, data integration, data analytics, data visualization, and data quality. Petrolink is committed to supporting the open standards community and contributing to the advancement of the industry. For more information, visit www.petrolink.com, email info@petrolink.com, or follow Petrolink on LinkedIn.
To access Petrolink Open-Sources Data Quality Algorithm, visit https://github.com/Petrolink/Data-Quality