Workshop on Big Graph Analysis Systems – University of Copenhagen

Forward this page to a friend Resize Print kalender-ikon Bookmark and Share

Department of Computer Science DIKU > Event Calendar 2017 > Workshop on Big Graph ...

There is an increasing amount of data that takes the form of complex graphs in various applications, such as social network, linked data, telecommunication networks, chemistry, life science, etc. The analysis of such data should not only focus on the attributes attached to the nodes or edges, but also on the way how the nodes are interconnected.

To address the challenges of the analysis of big graph data and develop supporting system technologies for data scientists, it requires joint efforts from various research areas, including but not limited to graph data management, linked data management, datalog, high-performance computing, distributed/parallel systems, etc.

The purpose of this workshop is to get together experts from several relevant communities that are actively addressing the problems related to big graph data analysis to identify commonalities and synergies and to stimulate cross-collaboration that can be expected but yet to be explored.

The workshop will host invited talks and will feature plenty of free time for discussions.


  • Yongluan Zhou, University of Copenhagen
  • Amol Deshpande, University of Maryland at College Park
  • Marcos António Vaz Salles, University of Copenhagen

Invited Speakers

Yannis Papakonstantinou, University of California San Diego

Yannis Papakonstantinou is a Professor of Computer Science and Engineering at the University of California, San Diego. His research is in the intersection of data management technologies and the web, where he has published over one hundred research articles that have received more than 14,000 citations, according to Google Scholar. A common theme of his research is the extension of database platforms and query processors beyond centralized relational databases and into semistructured & graph databases, integrated views of distributed databases and web services, textual data and queries involving keyword search, and most recently spatiotemporal sensor data. He has given multiple tutorials and invited talks, has served on journal editorial boards and has chaired and participated in program committees for many international conferences and workshops. He is a co-director and teaches for UCSD's Master of Advanced Studies in Data Science.

Yannis enjoys to commercialize his research and to inform his research accordingly. He was the CEO and Chief Scientist of Enosys Software, which built and commercialized an early Enterprise Information Integration platform for structured and semistructured data. The Enosys Software was OEM'd and sold under the BEA Liquid Data and BEA Aqualogic brand names, eventually acquired in 2003 by BEA Systems. He has also consulted for Amazon Web Services and multiple startups.

Yannis holds a Diploma of Electrical Engineering from the National Technical University of Athens, MS and Ph.D. in Computer Science from Stanford University (1997) and an NSF CAREER award for his work on data integration.

Hai Jin, Huazhong University of Science and Technology

Hai Jin is a Cheung Kung Scholars Chair Professor of computer science and engineering at Huazhong University of Science and Technology (HUST) in China. Jin received his PhD in computer engineering from HUST in 1994. In 1996, he was awarded a German Academic Exchange Service fellowship to visit the Technical University of Chemnitz in Germany. Jin worked at The University of Hong Kong between 1998 and 2000, and as a visiting scholar at the University of Southern California between 1999 and 2000. He was awarded Excellent Youth Award from the National Science Foundation of China in 2001. Jin is the chief scientist of ChinaGrid, the largest grid computing project in China, and the chief scientists of National 973 Basic Research Program Project of Virtualization Technology of Computing System, and Cloud Security.

Jin is a Fellow of CCF, senior member of the IEEE and a member of the ACM. He has co-authored 22 books and published over 800 research papers. His research interests include computer architecture, virtualization technology, cluster computing and cloud computing, peer-to-peer computing, network storage, and network security.

Dan Olteanu, University of Oxford

Dan Olteanu is a Computer Science professor at Oxford and a computer scientist at LogicBlox. He has also taught at the universities of California Berkeley, Munich, Saarland, and Heidelberg. He received his PhD in Computer Science from University of Munich in 2005. His research interests are in databases and adjacent areas. Dan contributed to XML query processing, incomplete information and probabilistic databases, and more recently to factorized databases, in-database analytics, and the LogicBlox commercial system. He co-authored the book "Probabilistic Databases" (2011). He has served as associate editor for PVLDB'13 and IEEE TKDE, track chair for ICDE'15, group leader for SIGMOD'15, and vice chair for SIGMOD'17. His current research is supported by an ERC consolidator grant and awards from Google, LogicBlox, and Ordnance Survey.

Milos Nikolic, University of Oxford

Milos Nikolic is a departmental lecturer in the Department of Computer Science at the University of Oxford. His research focuses on the design and implementation of data-intensive systems. His work studies the incremental computation of complex analytical queries, such as database queries and machine learning models, in local and distributed streaming environments using novel approaches to query optimization and compilation. He received a Ph.D. in Computer Science from EPFL.

George Fletcher, Eindhoven University of Technology

George Fletcher (PhD, Indiana University Bloomington) is an associate professor of computer science at Eindhoven University of Technology. His research interests span query language design and engineering, foundations of databases, and data integration. His current focus is on management of massive graphs such as social networks and linked open data. He is a member of the LDBC Graph Query Language Standardization Task Force.

Schema-driven generation of synthetic graphs and queries with gMark

Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this talk, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.

This is joint work with colleagues at CNRS Lyon (France), INRIA Lille (France), Université Clermont Auvergne (France), and TU Eindhoven (Netherlands).

Ioana Manolescu, INRIA Saclay


Hassan Chafi, Oracle Labs


Jun Huan, University of Kansas

Dr. Jun (Luke) Huan is a professor in the Department of Electrical Engineering and Computer Science at the University of Kansas. He directs the Data Science and Computational Life Sciences Laboratory at KU. Dr. Huan holds courtesy appointments at the KU Bioinformatics Center and the KU Bioengineering Program.

Dr. Huan is an internationally recognized investigator in data science. He has published more than 120 peer-reviewed papers in leading conferences and journals and has graduated more than 10 graduate students including seven Ph.D.s. He was a recipient of the National Science Foundation Faculty Early Career Development Award in 2009. His group won the Best Student Paper Award at the IEEE International Conference on Data Mining in 2011 and the Best Paper Award (runner-up) at the ACM International Conference on Information and Knowledge Management in 2009. His work appeared at mass media including Science Daily, R&D magazine, and EurekAlert. Dr. Huan’s editorial memberships have included Springer Journal of Big Data, Elsevier Journal of Big Data Research, and the International Journal of Data Mining and Bioinformatics among others. He regularly serves the program committee of top-tier international conferences on Machine Learning, Data Mining, Big Data, and Bioinformatics.

Since 2015 Dr. Huan has served as a Program Director in the Information and Intelligent Systems division at the US National Science Foundation. At NSF he manages programs such as IIS core, Big Data, and Partnerships for International Research and Education.

Semih Salihoglu, University of Waterloo

Semih Salihoglu is an assistant professor at University of Waterloo's Cheriton School of Computer Science. He is member of the Data Systems Research Group. Previously, he was a PhD student at Stanford University, advised by Jennifer Widom. His current research focuses graph databases and distributed graph processing engines.

Amol Deshpande, University of Maryland

Amol Deshpande is a Professor in the Department of Computer Science at the University of Maryland with a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). He received his Ph.D. from University of California at Berkeley in 2004. His research interests include uncertain data management, adaptive query processing, data streams, graph analytics, and sensor networks. He is a recipient of an NSF Career award, and has received best paper awards at the VLDB 2004, EWSN 2008, and VLDB 2009 conferences.

GraphGen: Adaptive Graph Extraction and Analytics over Relational Databases

Graph querying and analytics are becoming an increasingly important component of the arsenal of tools for extracting different kinds of insights from data. However, graphs are not the primary representation choice for most data today, and users who want to employ graph analytics are forced to extract data from their data stores, construct the requisite graphs, and then use a specialized engine to write and execute their graph analysis tasks. This cumbersome and costly process not only raises barriers in using graph analytics, but also makes it hard to explore and identify hidden or implicit graphs in the data.

In this talk, I will present our ongoing work on an end-to-end graph analysis framework, called GraphGen, that sits atop an RDBMS and enables users to declaratively specify graph extraction tasks, visually explore the extracted graphs, and write and execute graph algorithms over them, either directly or using existing graph libraries like the widely used NetworkX Python library. GraphGen has a fundamentally different goal from recent work on using relational databases to store graph data through "shredding". Instead, GraphGen is intended to analyze graphs that are present in existing relational databases. GraphGen attempts to utilize the underlying relational database to the full extent possible by pushing down computation, uses a novel condensed representation to handle graphs that may be too large to extract in their entirety, allows writing programs using a general subgraph-centric API, and features several optimizations for efficient extraction and querying of large graphs.

Tentative program

21 August

09.00 - 09.15 Welcome with a short introduction to the university and the department
09.15 - 10.00 Invited talk
10.00 - 10.45 Invited talk
10.45 - 11.15 Coffee break
11.15 - 12.00 Invited talk
12.00 - 12.15 Invited industrial talk
12.15 - 13.45 Lunch
13.45 - 14.30 Invited talk
14.30 - 15.15 Invited talk
15.15 - 15.45 Coffee break
15.45 - 16.30 Invited talk
16.30 - 17.15 Invited talk
18.00 Conference dinner

22 August

09.00 - 09.45 Invited talk
09.45 - 10.30 Invited talk
10.30 - 10.45 Invited industrial talk
10.45 - 11.15 Coffee break
11.15 - 12.00 Invited talk
12.00 - 12.45 Invited talk
12.45 - 14.15 Lunch
14.15 - 17.00 Discussions and walking tour of the city