Predictive Analytics for Online Social Networks




Graph Representation Learning in Large-Scale E-Payment Networks

Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we propose an end-to-end Graph Convolution Network (GCN)-based algorithm to learn the embeddings of the nodes and edges of a large-scale time-evolving graph. In the context of e-payment transaction graphs, the resultant node and edge embeddings can effectively characterize the user-background as well as the financial transaction patterns of individual account holders. As such, we can use the graph embedding results to drive downstream graph mining tasks such as node-classification to identify illicit accounts within the payment networks. Our algorithm outperforms state-of-the-art schemes including GraphSAGE, Gradient Boosting Decision Tree and Random Forest to deliver considerably higher accuracy (94.62% and 86.98% respectively) in classifying user accounts within 2 practical e-payment transaction datasets. It also achieves outstanding accuracy (97.43%) for another biomedical entity identification task while using only edge-related information.


Information Diffusion in Online Social Networks


Predicting the popularity of a discussion topic in an online social network (OSN) or the responses to an online fund-raising campaigns is a practical challenge of immense value. Previous work tries to predict the popularity of an online campaign by modeling information diffusion as a homogeneous temporal point process within a network of a single-type of actors. However, real-world information propagation often involved multiple types of actors. In particular,there are the so-called opinion leaders, e.g. online celebrities or influential OSN users with a huge number of followers, who can create a great impact on the visibility and thus the final popularity of an event by simply mentioning it in their tweets or postings. In this paper, we propose MASEP, a Multi-actor Self-exciting Process,to model and predict the popularity of different online campaigns involving multiple types of actors. MASEP combines a self-exciting branching process with a periodical decay process to capture the dynamics and interdependent relationship between opinion leaders and ordinary users during an online campaign. A closed-form expression is derived for the temporal campaign popularity under the MASEP model. Based on this closed-form expression, we can efficiently perform regression against the empirical activity measurements of an online campaign during its early stage to estimate the parameters of the corresponding MASEP model. The final popularity of the campaign can then be predicted. To demonstrate the efficacy of the MASEP-based approach, we apply it to predict the popularity of three types of online campaigns from different large-scale real-world datasets, namely, the total number of posts in retweeting cascades, the overall count of individual hashtags in posting streams, and the final number of sponsors for crowd-funding campaigns. In particular, using the initial 30% of each campaign data trace for training, our approach can achieve absolute prediction error (APE) of 13.25%,15.7%, and 36.9%respectively for datasets of 3 different types of campaigns. This corresponds to a 26.1% to 63.2%reduction in prediction error when comparing to state-of-the-art approaches including SEISMIC, SpikeM, and STRM

Behavior prediction in online social networks (OSNs) has attracted lots of attention due to its vast applications. However, most previous work needs global network information to train classifiers. Due to the large data volume and privacy concern, it is infeasible to obtain global network information for every OSN. We propose a decentralized framework, named REPULSE, to predict whether a target user will retweet a message relayed by his friends. We also identify a new set of community-related features that improve retweet prediction accuracy considerably. To demonstrate the value of community-related features, we propose another framework named HOTPIE to predict tweets popularity. Utilizing community-related features can boost the F1 score of popularity prediction from 0.43 to 0.55. To the best of our knowledge, this is the first work which systematically studies the impact of global vs. locally observable information on the prediction of retweet behavior in OSNs.


Sampling in Online Social Networks


With the explosion of graph scale of social networks, it becomes increasingly impractical to study the original large graph directly. Being able to derive a representative sample of the original graph, graph sampling provides an efficient solution for social network analysis. We expect this sample could preserve some important graph properties and represent the original graph well. If one algorithm relies on the preserved properties, we can expect that it gives similar output on the original graph and the sampled graph. This leads to a systematic way to accelerate a class of graph algorithms. Our work is based on the idea of stratified sampling [14], a widely used technique in statistics. We propose a heuristic approach to achieve efficient graph sampling based on community structure of social networks. With the aid of ground-truth of communities available in social networks, we find out that sampling from communities preserves community- related graph properties very well. The experimental results show that our framework improves the performance of traditional graph sampling algorithms and therefore, is an effective method of graph sampling.



Publications:

  • Da Sun Handason Tam*, Wing Cheong Lau, Bin Hu, Qiu Fang Ying, Dah Ming Chiu and Hong Liu, "Identifying Illict Accounts in Large-Scale E-Payment Networks -- A Graph Representation Learning approach," Artificial Intelligence for Business Security Workshop (AIBS), IJCAI-19, Aug 2019.

  • Bowen Zhang*, Wing Cheong Lau, "Temporal Modeling of Information Diffusion using MASEP: Multi-Actor Self-Exciting Processes," the 8th Temporal Web Analytics Workshop (TempWeb), WWW’18 Companion, April 2018.

  • Gao, Ruohan*, Huanle Xu*, Pili Hu, and Wing Cheong Lau. "Accelerating graph mining algorithms via uniform random edge sampling." In 2016 IEEE International Conference on Communications (ICC), pp. 1-6. IEEE, 2016.

  • Li, Guanchen*, and Wing Cheong Lau. "Predicting Retweet Behavior in Online Social Networks Based on Locally Available Information." International Conference on Social Informatics. Springer, Cham, 2016.

  • Ruohan Gao*, Pili Hu*, Wing Cheong Lau, "Property Preservation under Community-based Sampling," IEEE Globecom, San Diego, CA, Dec. 2015.

  • Ruohan Gao*, Huanle Xu*, Pili Hu* and Wing Cheong Lau, "Accelerating Graph Mining Algorithms via Uniform Random Edge Sampling (Poster)," ACM Conference on Online Social Networks (COSN), Stanford, CA, Nov. 2015.

  • Huanle Xu*, Pili Hu*, Wing Cheong Lau, Qiming Zhang* and Yang Wu*,"DPCP: A Protocol for Optimal Pull Coordination in Decentralized Social Networks," IEEE Infocom, Apr. 2015.

.: TOP :.
Last Updated on Jun 6 2019.
Copyright © 2022. All Rights Reserved. MobiTeC, The Chinese University of Hong Kong.
Disclaimer Privacy Statement