metapath2vec: Scalable Representation Learning for Heterogeneous Networks
Yuxiao Dong, Nitesh V. Chawla, Ananthram Swami
University of Notre Dame
Army Research Laboratory
ydong1@nd.edu, *nchawla@nd.edu, ananthram.swami.civ@mail.mil


0. This page is under construction and will be completed before the KDD17 conference in August.


A. Raw Network Data

1. AMiner Computer Science (CS) Data

The CS dataset consists of 9,323,739 computer scientists and 3,194,405 papers from 3,883 computer science venues---both conferences and journals---held until 2016. We construct a heterogeneous collaboration network, in which there are three types of nodes: authors, papers, and venues.

Citation: Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD'08. 990–998. Œ

2. Database and Information System (DBIS) Data

The DBIS dataset was constructed and used by Sun et al. It covers 464 venues, their top-5000 authors, and corresponding 72,902 publications. We also construct the heterogeneous collaboration networks from DBIS wherein a link may connect two authors, one author and one paper, as well as one paper and one venue.

Citation: Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB'11. 992–1003.


B. Cleaned Network Data for Generating Paths

1. AMiner CS Data (TBA)

2. DBIS Data (TBA)


C. Generated Paths by Meta-Path Based Random Walkers

1. AMiner CS Data (TBA)

2. DBIS Data (TBA)


D. Code---metapath2vec & metapath2vec++ (TBA)


E. Latent Vector Representations Learned by metapath2vec & metapath2vec++

1. AMiner CS Node Representations
metapath2vec++: m2vpp.aminer2017.w1000.l100.txt.size128.window7.negative5.txt (2GB)
metapath2vec: m2v.aminer2017.w1000.l100.txt.size128.window7.negative5.txt (2GB)

2. DBIS Node Representations
metapath2vec++: m2vpp.dbis.w1000.l100.txt.size128.window7.negative5.txt (72MB)
metapath2vec: m2v.dbis.w1000.l100.txt.size128.window7.negative5.txt (72MB)

F. Ground Truth Labeled by Google Scholar Metrics 2016 for Multi-Label Node Classification and Clustering

C1: Eight-area 133 venues label file: googlescholar.8area.venue.label.txt

C2: Eight-area 246,678 authors label file: googlescholar.8area.author.label.txt

1. Computing Systems:

1.1 IEEE Trans. Parallel Distrib. Syst.; 1.2 NSDI; 1.3 Future Generation Comp. Syst.; 1.4 ISCA; 1.5 ASPLOS; 1.6 SC; 1.7 CLOUD; 1.8 HPCA; 1.9 FAST; 1.10 MICRO; 1.11 IPDPS; 1.12 SIGMETRICS Performance Evaluation Review; 1.13 EuroSys; 1.14 SoCC; 1.15 IEEE Trans. Services Computing; 1.16 ICDCS; 1.17 USENIX Annual Technical Conference; 1.18 J. Parallel Distrib. Comput.; 1.19 CCGRID

2. Theoretical Computer Science:

2.1 STOC; 2.2 FOCS; 2.3 SODA; 2.4 SIAM J. Comput.; 2.5 J. Comput. Syst. Sci.; 2.6 Theor. Comput. Sci.; 2.7 ICALP; 2.8 Algorithmica; 2.9 Logical Methods in Computer Science; 2.10 J. Autom. Reasoning; 2.11 SPAA; 2.12 Random Struct. Algorithms; 2.13 ACM Trans. Algorithms; 2.14 Theory of Computing; 2.15 STACS

3. Computer Networks & Wireless Communication:

3.1 IEEE Communications Magazine; 3.2 IEEE Communications Surveys and Tutorials; 3.3 IEEE Trans. Wireless Communications; 3.4 INFOCOM; 3.5 IEEE Journal on Selected Areas in Communications; 3.6 IEEE Trans. Vehicular Technology; 3.7 SIGCOMM; 3.8 IEEE Trans. Mob. Comput.; 3.9 IEEE Trans. Communications; 3.10 IEEE/ACM Trans. Netw.; 3.11 IEEE Wireless Commun.; 3.12 J. Network and Computer Applications; 3.13 Computer Networks; 3.14 IEEE Communications Letters; 3.15 Computer Communications; 3.16 ICC; 3.17 Internet Measurement Conference; 3.18 GLOBECOM; 3.19 MobiCom

4. Computer Graphics:

4.1 ACM Trans. Graph.; 4.2 IEEE Trans. Vis. Comput. Graph.; 4.3 Comput. Graph. Forum; 4.4 The Visual Computer; 4.5 VAST; 4.6 PacificVis; 4.7 IEEE Computer Graphics and Applications; 4.8 SIGGRAPH; 4.9 SI3D; 4.10 Computer Aided Geometric Design; 4.11 Web3D; 4.12 Graphical Models; 4.13 Eurographics; 4.14 Graphics Interface; 4.15 LDAV; 4.16 GRAPP/IVAPP; 4.17 Journal of Visualization and Computer Animation; 4.18 VRST

5. Human Computer Interaction:

5.1 CHI; 5.2 CSCW; 5.3 UIST; 5.4 UbiComp; 5.5 IEEE Trans. Affective Computing; 5.6 HRI; 5.7 Int. J. Hum.-Comput. Stud.; 5.8 MobileHCI; 5.9 ACM Trans. Comput.-Hum. Interact.; 5.10 Interacting with Computers; 5.11 ICMI; 5.12 ISMAR; 5.13 Int. J. Hum. Comput. Interaction; 5.14 IUI; 5.15 INTERACT; 5.16 Tangible and Embedded Interaction; 5.17 IEEE Trans. Haptics

6. Computational Linguistics:

6.1 ACL; 6.2 EMNLP; 6.3 HLT-NAACL; 6.4 LREC; 6.5 Computational Linguistics; 6.6 EACL; 6.7 COLING; 6.8 Language Resources and Evaluation; 6.9 IJCNLP; 6.10 CoNLL; 6.11 TACL; 6.13 WMT; 6.14 SLT; 6.15 CICLing; 6.16 ICSC; 6.17 RANLP; 6.18 TAC; 6.19 Natural Language Engineering

7. Computer Vision & Pattern Recognition:

7.1 CVPR; 7.2 IEEE Trans. Pattern Anal. Mach. Intell.; 7.3 ICCV; 7.4 IEEE Trans. Image Processing; 7.5 ECCV; 7.6 Pattern Recognition; 7.7 International Journal of Computer Vision; 7.8 Pattern Recognition Letters; 7.9 Computer Vision and Image Understanding; 7.10 Image Vision Comput.; 7.11 ICIP; 7.12 CVPRWorkshops; 7.13 ICCVWorkshops

8. Databases & Information Systems:

8.1 WWW; 8.2 VLDB; 8.3 IEEE Trans. Knowl. Data Eng.; 8.4 SIGMOD Conference; 8.5 ICWSM; 8.6 WSDM; 8.7 ICDE; 8.8 SIGIR; 8.9 CIKM; 8.10 Knowl. Inf. Syst.; 8.11 ACM TIST; 8.12 RecSys; 8.13 VLDBJ.; 8.14 PVLDB