Roohollah Etemadi

PhD, School of Computer Science

University of Windsor


Email: etemadir at uwindsor dot ca

Phone: Fax:

Address: 401 Sunset Ave, Windsor, ON N9B 3P4

Research projects:

Large-scale network embedding Year:  2018-2019 Summary:  Massive networks are ubiquitous in the big and hidden data era. Facebook, for example, is an online social network with more than 2.2 billion users. Analyzing such networks using traditional network representations is computationally expensive. Thus, efficient representations with low dimensions and at the same time preserving more information of the network are desirable. In this project, deep neural networks are designed to learn a low-dimensional embed space for large-scale networks. Project link:  
PES: Priority Edge Sampling in Streaming Triangle Estimation Year:  2018 Summary:  The number of triangles is an important metric to analyze massive graphs. It is also used to compute clustering coefficient in networks. In this project,  a new algorithm called PES (Priority Edge Sampling) was proposed to estimate triangles in the streaming model where we need to minimize the memory window. PES combines edge sampling and reservoir sampling. Compared with the state-of-the-art streaming algorithms, PES outperforms consistently. The results are verified extensively in 48 large real-world networks in different domains and structures. The performance ratio can be as large as 11. More importantly, the ratio grows with data size almost exponentially. This is especially important in the era of big data--while we can tolerate existing algorithms for smaller datasets, our method is indispensable in very large data sampling. In addition to empirical comparisons, we also proved that the estimator is unbiased, and derived the variance.Project link:  http://etemadir.myweb.cs.uwindsor.ca/PES/
The Bias and Variance in Clustering Coefficient Estimation Year:  2018 Summary:  Clustering coefficient (hereafter CC) is an important metric to understand the complex structure of networks. Due to the high complexity of direct computation algorithms, sampling based algorithms are indispensable. Despite numerous algorithms proposed in this area, the bias and variance of the estimators remain an open problem. 
   We quantify the bias using Taylor expansion and find that the bias can be determined by the number of shared wedges and triangles in the sample. Based on the understanding of the bias, we give a new estimator that corrects the bias.  The results are derived analytically and verified extensively in 56 networks ranging in different size and structure. The experiments reveal that the bias ranges widely from data to data. The relative bias can be as high as 4\% in non-streaming model and 2\% in streaming model or can be negative. For most of the graphs, the bias is small, although every graph does have a bias as quantified by our analytical results. 

    We derived the variances of the estimators, and the estimators for the variances. More importantly, we simplify the estimators so that it can be used in practice.

Project link:  
Bias Correction in Clustering Coefficient Estimation Year:  2017 Summary:  Clustering coefficient (C) is an important structural property to understand the complex structure of a graph. Calculating C is computationally an expensive task. Thereby, sampling based methods have attracted substantial research for estimating C, and the closely related metric, the number of triangles. A widely used estimator for C is biased. We quantify the bias using Taylor expansion and find that the bias can be determined by the number of shared wedges and triangles in the sample. Based on the understanding of the bias, we give a new estimator that corrects the bias. The results are derived analytically and verified extensively in 56 networks ranging in different size and structure. The experiments reveal that the bias ranges widely from data to data. The relative bias can be as high as 4 percent or can be negative. For most of the graphs, the bias is small, although every graph does have a bias as quantified by our analytical results. Negative or small biases occur in online social networks where clustering coefficient is typically high. Positive and large biases typically occur in Web graphs, where there are nodes with high degrees but few neighboring nodes connecting with each other.Project link:  http://etemadir.myweb.cs.uwindsor.ca/cbias/
Efficient Estimation of Triangles in Very Large Graphs Year:  2015-2016 Summary:   The number of triangles in a graph is an important metric for understanding the graph. It is also directly related to the clustering coefficient of a graph, which is one of the most important indicator for social networks. Counting the number of triangles is computationally expensive for very large graphs. Hence, estimation is necessary for large graphs, particularly for graphs that are hidden behind searchable interfaces where the graphs in their entirety are not available. For instance, user networks in Twitter and Facebook are not available for third parties to explore their properties directly. 
    In this project, we proposed a new method to estimate the number of triangles based on random edge sampling. It improves the traditional random edge sampling by probing the edges that have a higher probability of forming triangles. The method outperforms the traditional method consistently, and can be better by orders of magnitude when the graph is very large. The result is demonstrated on 20 graphs, including the largest graphs we can find. More importantly, we proved the improvement ratio, and verified our result on all the datasets. The analytical results are achieved by simplifying the variances of the estimators based on the assumption that the graph is very large. We believe that such big data assumption can lead to interesting results not only in triangle estimation, but also in other sampling problems.
Project link:  http://etemadir.myweb.cs.uwindsor.ca/cikm2016/triangles.php