The projects below are available for student projects, Bachelor and/or Master thesis. Please get in contact for further details.

**1. Development of a Plugin for Wikidata**

**Description.** Wikidata
[1] is a Wikipedia-style community project for structured knowledge.
Completeness of data collected on Wikidata is highly varying and
subjective. For instance, for US presidents, children are usually
recorded, whereas for Physics Nobel Prize winners usually not.

The
goal of this project is to extend an existing prototype for analysis of relative completeness of entities in Wikidata, Recoin [2]. A prototype of Recoin was published to the Wikidata community in November 2016, receiving good feedback and many suggestions for extensions. Major extension directions are

(1) an expansion to arbitrary classes in Wikidata

(2) the development of an crowdsourced evaluation framework for relative completeness

(3) the comparison of various metrics for entity similarity.

**Prerequisites**. Basic knowledge of web development.

**References**:

**2. Task Recommender System for Gliding**

**Description.** Gliding is both a recreational and competitive sport. On
good days, glider pilots can be in the air up to 8 hours or more, during
which they can cover significant distances (800km and more). Most glider
pilots upload their competitive flights to an online platform Onlinecontest.org, where flights are daily listed and ranked using
points that are based on the covered distance and the performance of the
aircraft.

To achieve high points on a given day, glider pilots
have to carefully choose a task (flight route) that, given their plane,
skills and the weather conditions, allows them to cover the maximal
distance. All of weather conditions, skills and plane can make a huge
difference, as overestimating the weather conditions may lead to not
completing the task, and as it is not uncommon that experienced pilots travel twice or more the distance that beginners travel.

The goal of this project is to develop a prototype of a gliding task recommendation system, which takes into account the factors mentioned above. A collaboration with the Aeroclub Bolzano allows to get feedback from domain experts.

**Prerequisites.** Opportunity for a student with some knowledge of recommender systems basics, or interest in spatial data analysis.

**3. Heuristics for an NP-hard Problem**

**Description.** Finding the largest subset of a partition of a bipartite graph that has at most n neighbours is an NP-complete problem [1] with applications in data management. And in contrast to the related problem of set-cover [2], a greedy approach can be arbitrarily worse than the optimal solution. The goal of this thesis is threefold:

- To construct problem instances where the optimal solution is known,
- To evaluate how well a greedy approach performs on synthetic problems,
- To devise heuristical methods that perform better.

**Prerequisites.** Opportunity for tackling a novel algorithmic problem for a student with basic programming skills (any language) and a good knowledge of data structures and algorithms.

**References**:

**4. Implementation of a Relational Algebra**

**Description.** In [1], the pattern algebra was presented as a way to compute the completeness assertions that hold for the answers of select-project-join-queries, given completeness patterns for database tables. The pattern algebra was defined for the operations selection by constant, selection by variable, projection and equijoin, and each pattern algebra operation is very similar to the corresponding relational algebra operation. In particular, it is possible to translate pattern algebra queries to relational algebra queries.

The goal of this project is to implement a compiler that takes as input a select-project-join-query, expressed as single-block SQL query, and outputs the corresponding pattern algebra query, again in SQL. Java code for translating between single-block SQL queries and algebra queries is already available, thus, the main task would be to implement the translation from relational algebra to the pattern algebra.

**Directions.** For students with interest in theory, the project may include a theoretical analysis of the translation, in particular, under which conditions the translation yields a query of polynomial size. For students with interest in quantitative methods, the project may include an experimental evaluation of the performance of pattern algebra queries, for instance using queries from the TPC-H benchmark.

**References**:

- [1] Identifying the Extent of Completeness of Query Answers over Partially Complete Databases, Simon Razniewski, Flip Korn, Werner Nutt, Divesh Srivastava, SIGMOD 2015