Bank’s Data Analysis

A SQL approach to data analysis
code
SQL
database
Author

Simone Brazzi

Published

February 10, 2025

1 Introduction

Please check this Shiny Dashboard.

1.1 Project Description

The company Banking Intelligence wants to develop a supervised machine learning model to predict the future behaviour of its customers, based on transactional data and product ownership characteristics. The aim of the project is to create a denormalised table with a series of indicators (features) derived from the tables available in the database, representing the customers’ financial behaviours and activities.

1.2 Objective

Our objective is to create a feature table for training machine learning models, enriching customer data with various indicators calculated from their transactions and accounts held. The final table will be related to the customer ID and will contain both quantitative and qualitative information.

1.3 Value Added

The denormalised table will allow the extraction of advanced behavioural features for training supervised machine learning models, providing numerous benefits for the company:

  • Prediction of customer behaviour: By analysing transactions and product ownership, useful behaviour patterns can be identified to predict future actions such as purchasing new products or closing accounts.

  • Reducing churn rates: Using behavioural indicators, a model can be built to identify customers at risk of abandonment, allowing timely interventions by the marketing team.

  • Improved risk management: Segmentation based on financial behaviour allows high-risk customers to be identified and credit and risk strategies to be optimised.

  • Customisation of offers: Extracted features can be used to customise product and service offers according to the habits and preferences of individual customers, thereby increasing customer satisfaction.

  • Fraud prevention: By analysing transactions by type and amount, the model can detect behavioural anomalies indicative of fraud, improving security and prevention strategies.

These benefits will lead to an overall improvement in business operations, enabling more efficient customer management and sustainable business growth.

1.4 Behavioural Indicators to be Calculated

Indicators will be calculated for each individual customer (referring to id_customer) and include:

1.5 Basic indicators

  • Age of customer (from customer table)

1.6 Transaction indicators

  • Number of outgoing transactions on all accounts.
  • Number of incoming transactions on all accounts.
  • Total amount transacted out on all accounts.
  • Total amount transacted incoming on all accounts.

1.7 Account Indicators

  • Total number of accounts held.
  • Number of accounts held by type (one indicator for each type of account).

1.8 Transaction indicators by account type

  • Number of outgoing transactions by account type (one indicator per account type).
  • Number of incoming transactions by account type (one indicator per account type).
  • Outgoing transacted amount by account type (one indicator per account type).
  • Amount transacted inbound by account type (one indicator per account type).

2 Plan for Creating Denormalised Table

2.1 Table Join

To construct the final table, it will be necessary to perform a series of joins between the tables available in the database.

2.2 Calculation of Indicators

Behavioural indicators will be calculated using aggregation operations (SUM, COUNT) to obtain the required totals.

3 Code

3.1 Import

Code
library(tidyverse)
library(ggplot2)
library(plotly)
library(duckdb)
library(RMariaDB)

3.2 Create db connection

Code
con <- dbConnect(RMariaDB::MariaDB(),
                 user = "username",
                 password = "password",
                 host = "localhost",
                 dbname = "dbname")
dbListTables(con)

3.3 Query

Code
dbGetQuery(
  con,
  "
  CREATE TEMPORARY TABLE IF NOT EXISTS features (
  SELECT 
    cliente.id_cliente,
    TIMESTAMPDIFF(YEAR, cliente.data_nascita, CURRENT_DATE()) AS eta,
    COUNT(CASE WHEN tipo_transazione.segno = '-' THEN 1 ELSE 0 END) AS n_transazioni_uscita,
    COUNT(CASE WHEN tipo_transazione.segno = '+' THEN 1 ELSE 0 END) AS n_transazioni_entrata,
    SUM(CASE WHEN tipo_transazione.segno = '-' THEN transazioni.importo ELSE 0 END) AS tot_uscita,
    SUM(CASE WHEN tipo_transazione.segno = '+' THEN transazioni.importo ELSE 0 END) AS tot_entrata,
    COUNT(DISTINCT conto.id_conto) AS n_conti,
    COUNT(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Privati' THEN 1 ELSE 0 END) AS n_conto_privati,
    COUNT(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Base' THEN 1 ELSE 0 END) AS n_conto_base,
    COUNT(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Business' THEN 1 ELSE 0 END) AS n_conto_business,
    COUNT(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Famiglie' THEN 1 ELSE 0 END) AS n_conto_famiglie,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Privati' AND tipo_transazione.segno = '-' THEN transazioni.importo ELSE 0 END) AS tot_uscita_conto_privati,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Base' AND tipo_transazione.segno = '-' THEN transazioni.importo ELSE 0 END) AS tot_uscita_conto_base,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Business' AND tipo_transazione.segno = '-' THEN transazioni.importo ELSE 0 END) AS tot_uscita_conto_business,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Famiglie' AND tipo_transazione.segno = '-' THEN transazioni.importo ELSE 0 END) AS tot_uscita_conto_famiglie,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Privati' AND tipo_transazione.segno = '+' THEN transazioni.importo ELSE 0 END) AS tot_entrata_conto_privati,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Base' AND tipo_transazione.segno = '+' THEN transazioni.importo ELSE 0 END) AS tot_entrata_conto_base,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Business' AND tipo_transazione.segno = '+' THEN transazioni.importo ELSE 0 END) AS tot_entrata_conto_business,
    SUM(CASE WHEN tipo_conto.desc_tipo_conto = 'Conto Famiglie' AND tipo_transazione.segno = '+' THEN transazioni.importo ELSE 0 END) AS tot_entrata_conto_famiglie
  FROM cliente
  LEFT JOIN conto ON cliente.id_cliente = conto.id_cliente
  LEFT JOIN tipo_conto ON conto.id_tipo_conto = tipo_conto.id_tipo_conto
  LEFT JOIN transazioni ON conto.id_conto = transazioni.id_conto
  LEFT JOIN tipo_transazione ON transazioni.id_tipo_trans = tipo_transazione.id_tipo_transazione
  GROUP BY cliente.id_cliente
  ORDER BY cliente.id_cliente);
  "
)

3.4 Consideration

To better interact with the database, go to the Shiny dashboard or click on the link at the top of this page or in the table of content.