Daniel Zeman (Charles-University Prague)
From the Jungle to a Park: Harmonizing Annotations Across Languages
In this talk I will describe my work towards universal representation of
morphology and dependency syntax in treebanks of various languages.
Not only is such harmonization advantageous for linguists-users of
corpora, it is also a prerequisite for cross-language parser adaptation
techniques such as delexicalized parsing. I will present Interset, an
interlingua-like tool to translate morphosyntactic representations between
tagsets; I will also show how the features from Interset are used in a
recent framework called Universal Dependencies. Some experiments with
delexicalized parsing on harmonized data will be presented. Finally,
I will discuss the extent to which various morphological features are
important in the context of statistical dependency parsing.
Djamé Seddah (Paris-Sorbonne)
Overview of the SPMRL Shared Tasks: 2 years later, where are we now
In this presentation, we will present the outcomes on
the two shared tasks on statistical parsing of morphologically rich
languages held in 2013 and 2014. The task features data sets from
nine languages (Arabic, Basque, French, German, Hebrew, Hungarian,
Korean, Polish and Swedish), each available both in constituency and
dependency annotation. Large unlabeled data sets were also made available
in different forms (tagged, parsed, with morph analysis), in the hope of
boosting semi-supervised methods for MRL parsing.
We report on the preparation of the data sets, on the proposed parsing
scenarios, and on the evaluation metrics for parsing MRLs given different
We present and analyze parsing results obtained by the task participants,
and then provide an analysis and comparison of the parsers across languages
and frameworks, reported for gold input as well as more realistic parsing
scenarios. Both shared tasks saw submissions from 20 teams. The parsing
results were obtained in different input scenarios (gold, predicted,
and raw) and evaluated using different protocols (cross-framework,
cross-scenario, and cross-language). In particular, this was the first
time a multilingual evaluation campaign reports on the execution of
parsers in realistic, morphologically ambiguous, settings.
Interestingly, the SPRML data set has spread beyond its initial circle
of interest and is now used as a common benchmark for constituent parsing
as well as realistic dependency parsing evaluation.
(joint work with Reut Tsarfary, Sandra Kübler and many contributors)