Detailed Description
1. Background
E-payment systems provide services for users making payments through cell phones or other online devices. Although this technology brings convenient to people's daily life, it also brings some financial risks and challenges to service providers and regulators. Therefore, detecting anomalies in payment systems is an very important task nowaday.
Payment activities can be viewed as a temporal interaction graph (TIG), with nodes representing users and edges representing multi-dimensional transaction sequences between them. To fully capture and model the dynamics of the graph, we propose the model, Graph Temporal Edge Aggregation (GTEA), a framework of representation learning for temporal interaction/ time-evolving graphs.
Figure 1: We can view the payment transaction as a time-varying graph with a rich set of node and edge attributes, then we can use graph neural network to do prediction for some downstream tasks, e.g. node classification.
2. Graph Temporal Edge Aggregation (GTEA)
In GTEA, we present a new perspective to deal with temporal interaction graphs (TIGs). Instead of partitioning a temporal graph into multiple snapshots or
grouping all related interactions to form a single time series for a target, we propose to mine the pairwise interaction patterns from the graph.
For example in figure 2 b), abnormal behaviors can be readily captured by modeling
the pairwise temporal interactions (edge) between node A and C explicitly,
which in turn, helps identify the role or illicit activities of these
nodes.
Figure 2: There are different kind of interaction events occur between
nodes in TIG, e.g., node A behaves normally with node B and
D, while conducting regular gambling activities with node C.
GTEA can adopt with a sequence model such as LSTM/ Transformer to generate an edge embedding
by capturing temporal patterns and relationships of the interactions between a pair of
nodes. The learned edge embeddings will be aggregated together with the node attributes
recursively and finally generate a discriminative representation for each node, which can
be generalized to some downstream tasks, e.g. node classification.
Figure 3: The framework of GTEA, where a sequence model enhanced by a time-encoder is proposed to learn embeddings for
edges. The learned edge embeddings will be aggregated together with node attributes by the GNN backbone, which is induced
by a sparse attention mechanism and finally yields discriminative node embeddings.
The general architecture of GTEA is shown in Figure 3. In a TIG, interaction events occur between nodes from time to time. Therefore,
it is supposed that interaction behaviors can be specific for different
(interacted) node pairs. To further understand different behaviors in the graph, GTEA utilizes a sequence model (e.g., LSTM or Transformer) to learn
the dynamics of pairwise interaction sequence to represent edges. This helps capturing fine-grained interaction patterns along all events of a pair
of nodes of interest. To capture continuous and irregular temporal patterns of interactions, we enhance GTEA by integrating the sequence model with
Time2Vec (T2V), an time-encoder that works on the continuous space to represent time.
To jointly correlate the topological dependencies and temporal dynamics, we utilize a GNN backbone to capture the relationship dependencies of TIG, where the embeddings outputted by the sequence model are taken as edge features and are incorporated into the neighborhood aggregation process. Furthermore, a sparsity-inducing attention mechanism is introduced to augment the aggregation operation, which refines neighborhood information by filtering out redundant longtail noises raised by unimportant neighbors.
All these modules inductively capture both topological and time-related dependencies among different nodes and are jointly integrated, which encourages GTEA to yield discriminative node representations for different graph-related tasks.
By jointly optimizing the sequence model and the GNN backbone, GTEA is able to learn a comprehensive and discriminative representation capturing both temporal and
graph structural characteristics for each node in an inductive manner.
We have demonstrated the effectiveness of GTEA over state-of-the-art models by conducting extensive experiments on 3 large-scale real-world datasets to identify phishing scams in the Ethereum Smart-contract blockchain as well as the detection of other illicit activities among the users of a top-tier mobile payment service. In particular, GTEA can improve the accuracy in detecting illicit mobile payment activities by 4.41% to 79.9% when compared to other state-of-the-art approaches.
Labelled data are essential for training Deep Learning models. But given the scarcity of high-quality labelled data in most real-world graph datasets and it is costly
to hire subject matter experts to perform expensive manual annotation of unlabelled datasets.
The active learning algorithms can help us to select specific input data points and interactively request human domain experts to provide labels for the most appropriate nodes in order to maximize model training benefits.
Therefore, we implement an Active Learning algorithms that can help us to reduce manual data annotation cost and leverage additional domain knowledge from Subject Matter Experts by enabling Human-in-the-Loop.
Figure 4: A massive graph dataset with tons of nodes and edges. It is hard to annotate all of these unlabelled data, therefore, we should select the most appropriate nodes for labelling.
We have shown the potential of active learning comparing these methods on real-world/ mobile payment transaction datasets e.g. from public blockchains like Ethereum or e-payment data provided by our industrial collaborators.
Scarcity of labelled data (ground truth) is a common problem among large-scale graph datasets in practice. In order to get satisfying results with limited labelled training data, we developed transfer learning algorithms to enable knowledge transfer among similar tasks.
Transfer learning refers to the process of leveraging the information from a source domain to train a better classifier for a target domain. For example in figure 5,
there are plenty of labeled examples in the source domain, whereas very few or no labelled examples in the target domain.
And then we can try to reuse the model that trained with the source domain on the target domain if the source and target domain have some sources of similarity. Therefore, we
can get a better result in target domain, even it has very few labelled data.
Figure 5: There are plenty of labeled examples in the source domain, whereas very few or no labelled examples in the target domain. Then we can try to reuse the
model that trained with the source domain on the target domain, we may get a better result by camparing to just train the model with the target domain itself.
We have leveraged pretrained models derived from self-supervised learning to perform transfer learning, in order to reuse knowledge learned from different online/ mobile payment networks. With the pretrained models, we can even speed up the training process.
Our experiments with the real-world datasets show that transfer learning can reduce the training time of large-graph representation learning by 27 to 41% with no or negligible loss in detection accuracy.
We develop a Distributed Data Parallel version of GTEA by leveraging the Pytorch Distributed Data Parallel (DDP) library, to perform distributed model training across multiple machines, each with multiple GPUs. Such capability enables us to substantially speed up the modeling training process, scale-up our graph learning algorithms and their implementations to tackle even bigger real-world datasets.
Figure 6: With Pytorch Distributed Data Parallel (DDP), GTEA can perform distributed training on multi-machine with multiple GPU devices. After loading the training data, it will split the training data into subsets by DataSampler, and then load the minibatch of each subset into each of the GPUs/devices respectively, finally we can train the model by synchronization of model parameters among different GPUs/devices.
Preliminary experimental results show that when performing distributed training for different variants of GTEA using multiple GPUs on the same server, speedup can be achieved but the parallel efficiency (= Speed-up / Degree of Parallelism) would reduce from over 80% to around 60% as the number of GPUs increases from 2 to 8.
Via the following website, we provide registered users access to our GNN-based data analytic engine in the form of a web-service.
Users can use our web-based service to submit application datasets as well as its app-specific customizations/ configurations for graph neural network training. After completing the testing, the website will generate a report to help users to understand the performance of the model.
For the detail of how to use the website system, please
click here to the Website User Guide.
Click here to GTEA Webdemo (Required IE VPN)
Figure 7: Dataset selection for training in front page.
Figure 8: Part of the generated training report.