TRENDING
Saudi Central Bank , SAMA has also released a riskmanagement framework for banks practicing Islamic banking . In addition , financial institutions must tackle the risks associated with global warming , since the impact of climate change could set in danger assets worth trillions of dollars globally .
Plus , there are also rare scenarios to consider , where organisations do not have enough data points for modelling due to a scarcity of events . For a newly set up digital bank or a consumer heavy bank looking to build up a strong corporate book might lack data for specific portfolio and events that are key inputs to a risk model .
Amidst these conditions , organisations increasingly use risk models to feed automated decisioning processes , while also striving to quickly deploy new , innovative models to meet changing business needs . In recent years , the integration of Artificial
How could it be possible then to extrapolate and model the unknown , ensuring there is no overfitting ? That is exactly where synthetic data comes into play .
Synthetic data generation
Building and training models with faulty data can lead to adverse consequences from incorrect assumptions and estimations .
Intelligence , AI and machine learning , ML in risk models has resulted into significant accuracy and efficiency improvements . That is because AI and ML systems excel in recognising patterns in data to make predictions .
Limitations of data
Building and training these models with faulty base data can lead to adverse consequences from decisions made using incorrect assumptions and respective estimations . Consequently , having a strong data foundation with easy and fast access to large and diverse group of people , and high-quality data is of paramount importance when it comes to developing risk assessment models .
The emergence of ChatGPT in 2023 has opened new avenues of AI innovation and sparked a generative AI , GenAI evolution in several industries . According to a recent global survey conducted by SAS and Coleman Parkes , 2 % of data decision makers in the UAE and Saudi Arabia say their companies have fully integrated GenAI into regular processes , while 48 % are running initial tests for implementation which exceeds the global benchmark set at 43 %. Another 34 % intend to use GenAI within 1 – 2 years .
In the case of SDG , GenAI goes beyond prediction and conversation ; it generates new data as its primary output . And while most people are familiar with large language models like ChatGPT that generate text , GenAI can also generate synthetic data . SDG refers to on-demand , self-service or automated data generated by algorithms or rules rather than collected from the real world .
Going over and above of generating data randomly , SDG tries to multiply the real-world data by ensuring the correlations , distributions and patterns are not underestimated or overfitted .
Generative Adversarial Network
That is certainly not an easy feat . Getting that kind of real-world data to begin with , usually translates to high data acquisition and annotation costs , not to mention efforts required to analyse and profile the data . But even when businesses have plenty of realworld data available , additional challenges tend to arise . Data quality or historical depth of data may not always meet expectations .
What is more , in the financial sector , although an ocean of data is generated daily , organisations are required to safely handle sensitive personally identifiable information within the permitted regulatory compliance parameters or risk being levied substantial fines as well as potential incidents causing reputation damage – and data anonymisation has proven itself to be inadequate and overseen time to time .
A Generative Adversarial Network , GAN is the most popular technique for SDG , mimicking specific distributions . There are two neural networks involved in training . One network generates the data , the generator while the other network tries to discriminate , the discriminator if that data is real or fake .
If it is deemed to be fake , the generator is notified and tries to improve on the next batch of generated data . Therefore , the two networks are training against each other , hence the adversarial part .
A GAN ’ s training process can be effort-heavy and typically requires graphics processing units , GPUs , but it can capture highly nonlinear , complex relationships among variables and produce perfectly accurate and realistic synthetic data .
26 INTELLIGENTCIO MIDDLE EAST www . intelligentcio . com