Re: How to create large data sets for performance test

  • From: Lothar Flatz <l.flatz@xxxxxxxxxx>
  • To: loknath.73@xxxxxxxxx, Oracle L <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 30 May 2023 11:30:04 +0200

Hi Lok,

Actually that is not the kind of job that should be done in the cloud, as it is resource heavy and needs a lot of control.
Especially when you replace Exadata with different Hardware in a DWH context, be careful. There are abilities of Exadata that no other Hardware can provide.
There is one the that puzzels me: if you consider to move the data to the cloud, would not the same security concerns apply as you have it now?
If you decide to move the data to the cloud when your test is sucessful, why would it be ok then, but not now?

Other than that: you could consider to change just the sensitive customer data and leave the keys (foreign or primary) as they are. I hope your keys are just meaningless numbers, are they?
I know that there are tools on the market to anonymize data. Even Oracle sells them. Never used any though.

Regards

Lothar

Am 29.05.2023 um 22:55 schrieb Lok P:

Hello Listers ,
We have one of the existing production system(Oracle database 19C Exadata) which is live and running on premise(Its a financial system).We have this data replicated to cloud(AWS S3/data lake) and then multiple transformation happens and finally moved to multiple downstream system/databases like Redshift, Snowflake etc on which reporting and analytics APIs/application runs.

For doing performance tests for these reporting/analytics applications and also the data pipeline , we need to have a similar volume of data generated with the same data pattern/skewness and also with the same level of integrity constraints maintained as it exists in the current oracle production database. Performance of the databases like snowflake solely depends , the way incoming data is clustered and for that it's important that we have similar data pattern/skewness/order as that of the production environment or else it wont give accurate results. We are getting ~500millions of rows loaded into our key transaction Table(atleast 5-6 tables are ~10+TB in size in production Oracle database) on a daily basis in the current production system. Current production system holds ~6months of data. And we want to do a performance test at least on the ~3 months worth of data.

We thought of copying the current oracle production data to the performance environment , however the production data has many sensitive customer data/columns which can't be moved to other environments because of compliance restrictions. And also joining the masked column data can be challenging if they are not the same across tables. So I wanted to understand from experts, if there any easy way(or any tool etc) to generate similar performance data in such a high volume in quick time for the performance testing need?

Regards
Lok

--
//www.freelists.org/webpage/oracle-l


Other related posts: