Data Management
Importance of data management
Data management is the process of effectively capturing, storing, and collating data created by different applications in your company to make sure it's accurate, consistent, and available when needed. It includes developing policies and procedures for managing your end-to-end data life cycle. The following are some of the elements of the data life cycle specific to HPC applications, due to which it's important to have data management policies in place:
- Cleaning and transforming raw data to perform detailed faultless analysis.
- Designing and building data pipelines to automatically transfer data from one system to another.
- Extracting, Transforming, and Loading (ETL) data into appropriate data storage systems such as databases, data warehouses, and object storage or filesystems from disparate data sources.
- Building data catalogs for storing metadata to make it easier to find and track the data lineage.
- Following policies and procedures as outlined by your data governance model. This also involves conforming to the compliance requirements of the federal and regional authorities of the country where data is being captured and stored. For example, if you are a healthcare organization in California, United States, you would need to follow both federal and state data privacy laws, including the Health Insurance Portability and Accountability Act (HIPAA) and California's health data privacy law, the Confidentiality of Medical Information Act (CMIA). Additionally, you would also need to follow the California Consumer Privacy Act (CCPA), which came into effect starting January 1, 2020, as it relates to healthcare data. If you are in Europe, you would have to follow the data guidelines governed by the European Union's General Data Protection Regulation (GDPR).
- Protecting your data from unauthorized access, while at rest or in transit.