Introduction to Computational Linguistics
Linguistics 346
Winter 2001, Tu/Th 2.30--3.50
Northwestern University
Instructor: Prof. Chris Kennedy
Office: Linguistics Department,
Room 12 (2016 Sheridan Rd.)
Phone: 491-8054
Email: kennedy@northwestern.edu
Office hours: Wednesday 10-12 (or by appointment)
Course description
This course is an introduction to computational linguistics, designed
to familiarize students with the methods and goals of language
processing technologies at both an applied and a theoretical level.
The central goal of the course is to provide students with a solid
understanding of the core computational questions that arise in the
context of linguistic analysis, and to develop a set of language
processing skills and algorithms that will provide useful tools both
for the development of applied linguistic technologies, and for more
theoretical linguistic research. Specific computational topics to be
covered include regular expressions and finite state
automata/transducers, part of speech tagging, context free grammars,
parsing and complexity, unification grammars and feature structures;
we will examine these topics as they apply to the analysis of natural
language at different levels: morphology, syntax, semantics and
discourse.
Requirements
6 assigments (60%)
Take-home midterm (20%)
Final project (20%)
Students are encouraged to work together and collaborate in learning
the concepts and skills required to complete the various assignments
in this course, however the assignments themselves must be done
individually.
Text
Jurafsky, Daniel and James Martin, 2000, Speech and Language
Processing, Prentice Hall, Upper Saddle River, N.J.
(Available at Norris Bookstore.)
Online resources
Mini-corpus of examples of Verb Phrase Ellipsis (VPE) for evaluating
the ellipsis resolution algorithm you will construct for the final.
-
The test corpus. Use this to test your ellipsis resolution algorithm.
Occurrences of ellipsis are marked with a VPE
symbol.
-
The "gold standard". This is the same as the test corpus with occurrences
of VPE resolved. Ellipsis resolutions are marked in red,
and have the following format: VPE=[LEARNED SOMETHING].
As noted on the final, you will need to decide how to determine when your
algorithm gets a "correct" result, and to justify your decision.
PDF versions of overheads from class
Links to useful and informative websites related to computational
linguistics and natural language processing (to be updated throughout
the course).
Syllabus
Week 1: NO CLASS ON THURSDAY, JANUARY 4
CK will be at the annual meeting of the Linguistic Society of
America on the first day of class, but students should start doing
the reading for week two.
Week 2: Introduction
Introduction, regular expressions, finite state automata
Reading: Chapters 1-2
Week 3: Morphology
Finite state transducers, morphological parsing
Reading: Chapter 3
Week 4: Syntax
Part of speech tagging, context free grammars
Reading: Chapters 8-9
Week 5: Syntax
Syntactic parsing with context free grammars
Reading: Chapter 10
Week 6: Syntax
Feature structures and unification grammars
Reading: Chapter 11
Week 7: Semantics
Representing (various aspects of) meaning, mapping
syntax to semantics
Reading: Chapter 14
Week 8: Semantics
Semantic analysis, information extraction
Reading: Chapter 15
Week 9: Pragmatics
Reference resolution, coherence, rhetorical structure
Reading: Chapter 18
Week 10: Conclusion
Loose ends, looking ahead
Back to Kennedy's classes.