Anthony McEnery, Richard Xiao, Yukio Tono
 Home |  About the Book |  Resources |  Related Titles |  About the Series |  Forthcoming Titles  |  Buy this book
About the Book
Table of Contents
Sample Unit
About the Authors
Submit Review/Feedback

Book Jacket

Table of Contents

Preface
Contents

SECTION A INTRODUCTION

Section A of this book sets the scene for corpus-based language studies by focusing on the theoretical aspects of corpus linguistics and introducing key concepts in the field. This section is broken into ten units, each focusing on either a key concept in corpus linguistics or on a practical issue that may face the corpus builder or user.

Unit 1 introduces corpus linguistics and answers questions such as ‘What is a corpus?’ and ‘Why is a corpus-based approach important?’. Unit 2 is concerned with such key concepts as representativeness, balance and sampling, while units 3 and 4 discuss corpus markup and annotation respectively. In unit 5 we introduce the multilingual dimension of corpus linguistics. Unit 6 seeks to raise readers’ statistical awareness, an awareness which is essential in corpus-based language studies. Unit 7 introduces publicly available, well-known and influential corpora while unit 8 considers the important decisions and practical issues one may face when constructing a corpus. Unit 9 deals with copyright issues in corpus building. Finally, unit 10 explores the use of corpora in language studies.

Unit 1 Corpus linguistics: the basics
Unit 2 Representativeness, balance and sampling
Unit 3 Corpus markup
Unit 4 Corpus annotation
Unit 5 Multilingual corpora
Unit 6 Making statistical claims
Unit 7 Using available corpora
Unit 8 Going solo: DIY corpora
Unit 9 Copyright
Unit 10 Corpora and language studies

SECTION B EXTENSION

Section A introduced some important concepts in corpus linguistics. We also briefly considered the use of corpora in a range of areas of language studies. In this section, readers will get an opportunity to read excerpts from published material which will go into a number of research areas in more depth. The excerpts presented in this section have been selected carefully using a number of criteria. The primary criterion is the originality, importance and influence of the paper in the area of study. The second criterion is its current relevance. Given the second criterion, it is unsurprising that, with a few exceptions, the majority of the papers in this section were published in or after 1998. The final criterion is a pragmatic one – some papers, while interesting, simply did not fit well with the overall design of the book. We are fully aware that a book of this size cannot possibly include all of the publications which meet the above criteria. Also, the recentness of data included here can be viewed as an advantage or a disadvantage, depending upon one’s viewpoint. Those who view it as a disadvantage might argue that the book is wanting in historical background. Nevertheless, it can also be argued reasonably that the focus on current research is as important as historical depth. We would like to see this book as an extension to books such as Biber, Conrad and Reppen (1998), Kennedy (1998), and McEnery and Wilson (2001), which have already covered much of the history of corpus analysis. Furthermore, readers can refer to McCarthy and Sampson (2004) for an anthology of important publications on corpus linguistics from its early years.

The excerpts selected using the above criteria are designed to help readers understand a number of key concepts in corpus linguistics and bring them up to date with the latest developments in corpus-based language studies. They are also selected to get readers familiarized with a particular area of study so that they will be ready to explore the case studies in Section C. Note that in order to save space in this book, the excerpts are presented without notes or references. Readers are advised to refer to the original publications for these. We would also like to remind readers that the terminology used in each excerpt may differ slightly from that adopted in this book. At no point, however, does this slight imprecision interfere with the general argument presented.

This section consists of two parts. Part 1 ‘Important and controversial issues’ (units 11 – 12) discusses further some important or controversial issues in corpus linguistics introduced in Section A, namely corpus representativeness and balance, the related debate over the sample corpus vs. monitor corpus model (unit 11), and the pros and cons of the corpus-based approach (unit 12). Part 2 ‘Corpus linguistics in action’ (units 13 – 16) presents corpus-based studies in some of the areas we considered in Section A including, for example, lexical and grammatical studies (unit 13), language variation (unit 14), contrastive and diachronic studies (unit 15), and finally language teaching and learning (unit 16).

Unit 11 Corpus representativeness and balance
Unit 12 Objections to corpora: an ongoing debate
Unit 13 Lexical and grammatical studies
Unit 14 Language variation studies
Unit 15 Contrastive and diachronic studies
Unit 16 Language teaching and learning

SECTION C EXPLORATION

Having introduced the key concepts in corpus linguistics and presented excerpts from published material, we now want to engage readers in a series of case studies. These case studies investigate research questions in some of the areas of linguistic analysis introduced in Section A and further discussed in Section B. Each case study starts with an overview of the background knowledge needed for each study and a brief description of the corpus data used. Then it explores, together with the reader, a particular research question using specific tools (a corpus exploration tool and/or a statistics package). This is where the reader learns how to do corpus linguistics, as the process of investigating the data using the package(s) concerned will be spelt out step by step, using text and screenshots. Thus by the end of each case study, a corpus has been introduced, the reader has learnt how to use a retrieval package and some research questions have been explored. Readers are then encouraged to explore a related research question using the same corpus data, tools and techniques. Readers can visit the authors’ companion website given in the Appendix for details of the availability of corpora and tools used in these case studies.

This section consists of six case studies. Case study 1 explores the area of pedagogical lexicography on the basis of the BNC corpus (Word Edition), using BNCWeb. The focus of this study is on collocation analysis and the study seeks to describe collocation patterns from the BNC and integrate that information into a description of a dictionary entry. Case study 2 uses four corpora of the Brown family to explore the potential factors that may influence a language user’s choice of a full or bare infinitive after help, which include language variety (British English vs. American English), language change (English in the early 1960s and the early 1990s) and a range of syntactic conditions (e.g. an intervening nominal phrase, a preceding infinitive marker and the passive). This case study also introduces MonoConc Pro and SPSS. Case study 3 uses WordSmith version 4 and the Japanese component of the Longman Learners’ Corpus to study the second language acquisition of English grammatical morphemes. Case study 4 uses the metadata encoded in the BNC (version 2) pertaining to demographic features such as user age, gender and social class, and textual features such as register, publication medium and domain to explore such dimensions of variation to discover a general pattern of swearing (more specifically the use of fuck) in modern British English. This case study demonstrates how to use BNCWeb to make complex queries and provides readers with an opportunity to practice using SPSS. Case study 5 compares two approaches to genre analysis – Biber’s (1988) multi-feature/multi-dimensional analysis and Tribble’s (1999) use of the keyword function of WordSmith – through a comparison of speech and conversation in American English. This study introduces some advanced functions of WordSmith version 3. The final case study uses parallel and comparable corpora of English and Chinese to examine the effect of domain, text type and translation upon aspect marking in Chinese. This study also introduces parallel concordancing.

We would remind the readers that for each case study alternate versions of the study are available on our companion website covering most concordance packages. Note also that if any of the results gained by the readers do not match those given here they should check the website for an update.

Most of the case studies in this Section are based upon articles published elsewhere by the authors, as indicated in individual units. Readers interested in particular research questions can refer to our full papers for further discussion.

Unit 17 Collocation and pedagogical lexicography
Unit 18 help or help to: what do corpora have to say?
Unit 19 L2 acquisition of grammatical morphemes
Unit 20 Swearing in modern British English
Unit 21 Conversation and speech in American English
Unit 22 Domains, text types, aspect marking and English-Chinese translation

Bibliography
Appendix Useful Internet links
Index

Copyright © 2006 Taylor & Francis Group plc