Key Takeaways
- Graph‑based models naturally represent the interconnected activity and sensor data generated by cyber systems, making them ideal for detecting coordinated fraud.
- Capitec Bank deployed an in‑memory graph database on AWS, integrating transaction, customer, device, and watch‑list data into a production fraud‑scoring pipeline.
- By converting relationship patterns into graph features (centrality, community signals, multi‑hop connections), the bank reduced false positives to 2.1 % while processing over 3.5 million records per day in roughly two hours.
- Relationship‑centric analysis uncovered hidden fraud networks—such as up to nine linked accounts sharing devices—that rule‑based systems missed, boosting suspicious‑activity detection by ~50 % and cutting investigation time by 30‑40 %.
- Continuous collaboration between data scientists, engineers, and domain experts keeps the graph model tuned to emerging threats, turning it into a living, proactive defence tool.
Why Graphs Fit Cybersecurity in Finance
The data produced by modern banking systems—transactions, login events, device fingerprints, IP addresses, and watch‑list hits—are inherently relational. Each event links to one or more actors (customers, accounts, devices) and to other events through shared attributes. Modeling these links as edges in a graph captures the structure of interactions far more faithfully than flat tables, where relationships must be reconstructed via costly joins. In cybersecurity, adversaries often operate in networks, using multiple accounts or devices to obscure their trail; a graph surface makes such collaborative behaviour visible as clusters, high‑degree nodes, or anomalous paths. Consequently, graph‑based analytics excel at uncovering the hidden, coordinated fraud patterns that rule‑based or purely statistical approaches tend to miss.
Capitec’s Adoption of Graph Technology
Capitec, South Africa’s fastest‑growing retail bank and largest digital bank, serves over 25 million clients and faces a rising tide of card‑not‑present fraud—an 86 % increase reported for 2024. To combat this, the bank embarked on a two‑year experiment with an in‑memory graph database hosted on AWS. The goal was to transform raw transaction streams into a connected fraud graph, derive predictive features from that graph, and feed them into a specialised internal scoring pipeline. The initiative moved from proof‑of‑concept to production, demonstrating that graph technology could scale to meet the bank’s massive data volumes while delivering actionable insights for fraud analysts.
Turning Data into Fraud Graphs
Capitec’s pipeline ingests transaction histories, customer profiles, device identifiers, and public watch‑lists, then maps each entity as a node and each relationship (e.g., “same device used by two accounts,” “beneficiary appears in multiple transactions”) as an edge. From this enriched graph, the bank extracts a suite of graph‑based features—centrality measures (degree, betweenness), community detection scores, path‑length statistics, and multi‑hop neighbourhood patterns—alongside traditional tabular attributes. Initially, 195 graph features were generated; through iterative feature importance analysis, the set was trimmed to 27 graph and 23 tabular features that retain predictive power while reducing noise. These features are scored in real time, producing a fraud risk rating for each transaction as it flows through the system.
Performance and Scale of Capitec’s System
Running the graph database in memory yields sub‑second latency for neighbourhood queries, enabling the bank to process roughly 3.5 million records daily within a two‑hour window. The production system achieves a impressively low false positive rate of 2.1 %, a testament to the discriminative strength of relationship‑based signals. Seven live anti‑fraud graphs now operate in parallel, each tuned to different fraud typologies (e.g., account takeover, synthetic identity, coordinated withdrawal). The capability to scale horizontally on AWS ensures that as transaction volumes grow, the graph layer can expand without sacrificing speed—a critical requirement for real‑time fraud interception.
The Value of Relationships in Fraud Detection
Capitec’s data science team stresses that modern fraud is rarely isolated; it typically involves networks of accounts, devices, and identities working in concert. Rule‑based systems, which evaluate each transaction against static thresholds, struggle to detect such coordinated schemes because they lack context about the surrounding network. By contrast, a knowledge graph makes explicit the connections between entities, allowing analysts to see patterns like a cluster of accounts sharing a single phone number or a chain of devices funneling money to a common beneficiary. As Head of Product Derick Schmidt notes, uncovering these hidden links lets the bank “weed out quite a few of the scammers at once,” turning a dispersed set of low‑value alerts into a concentrated, actionable threat signal.
Operational Benefits and Analyst Workflow Improvements
The graph‑based internal anti‑fraud tool refines how investigators work. Instead of presenting analysts with hundreds of unrelated transactions, it surfaces only the most relevant nodes—those embedded in suspicious sub‑graphs—thereby reducing alert fatigue and focusing effort on high‑risk areas. Analysts report detecting up to 50 % more suspicious activity than with the legacy system while cutting investigation time by 30‑40 %. The team also learned that overloading the model with excessive detection parameters or automated scripts can introduce confusion and misfires. By limiting automated triggers to the minimal set needed for each scenario and breaking complex cases into smaller, manageable sub‑tasks, Capitec improved both precision and efficiency, echoing the least‑privilege principle familiar from IT security.
Proactive Threat Detection and Continuous Improvement
Because the graph resides in memory and updates continuously, Capitec can monitor emerging fraud rings in near real time. Analysts observe patterns such as a device repeatedly linking to new accounts, or a series of IP addresses hopping between known fraud clusters, allowing them to intervene before losses materialise. The graph can also encode operational policies—for example, limiting the number of high‑value transactions per account—thereby automating low‑risk monitoring while preserving human oversight for edge cases. Crucially, the model is not static; data scientists, engineers, and fraud investigators constantly refine node and edge definitions, adjust feature weights, and incorporate new threat intelligence. This collaborative feedback loop ensures the graph evolves alongside fraud tactics, maintaining its effectiveness as a living defence mechanism.
Implications for Other Financial Institutions
Capitec’s experience illustrates that graph technology is more than an academic curiosity for banks confronting sophisticated, networked fraud. By linking disparate data sources, visualising relational networks, and curating both tools and context, any financial services firm can build a proactive defence that detects hidden fraud rings faster and with fewer false positives. The key ingredients are a performant in‑memory graph store, a pipeline that translates graph structures into machine‑learning‑ready features, and a disciplined process for continual model tuning grounded in domain expertise. For institutions facing rising fraud volumes and increasingly clever adversaries, investing in a knowledge‑graph‑based cybersecurity approach offers a clear path to protect revenue, preserve customer trust, and stay ahead of the curve.

