English below
Das Ziel dieses Software Projektes ist, ein Generierungssystem zu bauen, bei dem man die linguistische Komplexität der generierten Sprache kontrollieren kann. Dies ist wichtig in Bezug auf Dialogsysteme, die in sicherheitskritischen Umgebungen wie z.B. dem Autofahren verwendet werden können.
In einem ersten Schritt werden wir ein schon existierendes datengetriebenes Dialogsystem fuer deutsch (in der vom Kurs gewuenschten Anwendungsdomäne) nachimplementieren, und dabei besonderes Augenmerk darauf legen, dass mehr sowie weniger komplexe Formulierungsalternativen generiert werden koennen. In einem zweiten Schritt kann das Generierungssystem dann nach Interesse der Teilnehmer weiter ausgebaut werden (z.B. durch Zufügung eines Grammatikalitaetsfilters, eines Klassifizierers, der komplexe von weniger komplexen Formulierungen unterscheiden kann, oder durch neue Dialogstrategieen in der Generierung.
Gute Programmierkenntnisse sind erforderlich. Siehe auch die etwas ausfuehrlichere Kursbeschreibung auf Englisch.
Now English...
Spoken dialogue systems are increasingly deployed in real-time and mission-critical environments, including in such common but safety-sensitive tasks as driving a car. In this software project course, we will construct together a language generation system, which can generate utterances at different levels of linguistic complexity. Such an adaptive natural language generation system constitutes a missing but necessary component of a future spoken dialogue system that manages cognitive workload in users. This project is motivated by on-going research here at Saarland University that is finding that linguistic complexity has a relevant effect on driving performance.
The software project is already well-defined and will be based on a state- of-the-art data-driven language generation system (see paper by Mairesse et al., 2010). We will first collect a small corpus of utterances from a target domain (chosen according to interests of course participants). Based on the corpus of target texts, the data-driven generation approach will allow us to generate a large number of alternative formulations for conveying the same message. The software project will heavily build on existing tools like the Graphical Models Toolkit (GMTK; Bilmes and Zweig, 2002) and SRILM (Stolcke et al., 2002). A particular focus of the software project will be to build a system that generates alternative formulations which differ in language complexity and the degree of redundancy (this part of the project is entirely new).
Once we have re-implemented Mairesse's generation system for our domain in German, there are several possible options for how to extend the system; we will make a choice with the course depending on interests and time. The possible options are that
The generation system will be in German. The language of instruction will be either English or German, depending on course participants. Students will come away with an understanding of how to compute the structural complexity of sentences and how to build and evaluate systems that are sensitive to this. Basic programming skills are required. However, students will also acquire practical skills, such as co-ordinating a research project and integrating disparate text-processing systems into a larger processing pipeline.
If you want to participate in the course, you have to sign up for the mailing list.
NEW! We have also established a discussion forum for the course where we can all discuss whatever technical questions you have, coordinate the team(s), etc. Please create an account, and we'll activate it for you.
Under construction. This is a software project course, so it will involve collectively building a system as described above.
Everyone will be given an account on the COLI systems if they don't have one already. We will prepare an SVN repository and set up the relevant tools as the course progresses and provide instructions here and in class.
There will be some amount of lecture, based on students' overall background. We'll proceed otherwise with regular meetings as the project progresses.
Date | Topic |
21.10.2013 | Introduction to the project; organizational details (Vera Demberg) |
31.10.2013 | Introduction to Mairesse et al. (2010) (Asad Sayeed) |
7.11.2013 | Cognitive workload and dialogue systems (Vera Demberg) |
This will be updated as the course proceeds, but initially, this is the most important reference from the literature:
We will explain it in the initial lectures for the course.