Sketch Engine tasks

A. I want to know how to see a word in its context
This is the most basic corpus function and can be very useful for looking for patterns, for example if
you want to show how a particular word tends to collocate with a particular preposition.
1. As an example, open ukWaC and type higgledy in the box, as in the screenshot
2. You should now see something similar to this:
3. If you look to the right, you can see what it tends to describe
4. You could now try this for any other word you want to look at
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
B. I want to know how to see a list of word forms for a particular lemma.
1. Select the British National Corpus.
2. Click on ‘Word List’ from the top left-hand bar and you will see something similar
to this:
3. In order to see all the word forms for the lemma EMPLOY you should
a. Select ‘lemma’ from the drop-down box next to ‘search attribute’
b. write employ in the white box
c. tick ‘Use multilevel wordlist’
d. look at the drop-down boxes next to ‘use multilevel wordlist’
e. from the first drop-down box select ‘word’
f. from the first drop-down box select ‘tag’
g. click on ‘make word list’
This function can also be useful for investigating whether a word is used most
frequently in its verb or noun form, for instance. It can also be useful for comparing the
frequency of two near-synonyms or two spelling variants.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
C. I want to know how to find the differences between two near synonyms
1. Select SketchDiff from the left-hand menu and add your two search items like this:
You should then see something like this:
2. If you focus on the ‘modifies’ column on your screen, what differences do you notice
between the ‘large only’ patterns and the ‘big only’ patterns?
3. Using the Sketch Diff function in Sketch Engine, compare the following:
little / small
boy / girl (use UKWaC ). Do girl and boy seem identical except for the gender
difference?
4. Think of your own pair to investigate and write your findings below:
This function is also very useful for comparing/deciding between two possible translations of
an item.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
D. I want to know how to find out how a particular word behaves, for
example, whether it usually collocates with positive or negative things
Collocation refers to the tendency for words to go together and is an extremely important
concept in corpus linguistics and the study of lexis. Words which are habitually found together
are referred to as collocates.
1. Investigating bent on
The modern conference resembles the pilgrimage of medieval Christendom in that it allows the
participants to indulge themselves in all the pleasures and diversions of travel while appearing
to be austerely bent on self-improvement. (David Lodge, Small World)1
In this case, we can identify something unusual in the combination of bent on + selfimprovement. But why?
a)
b)
c)
d)
Select the BNC
Type bent on in the search box
Look at the words to the right of bent on – what do they have in common?
To see a summary list of all the terms that come immediately after bent on:
a. Click on ‘collocations’ from the left-hand menu
b. Set the range at 0 to 1
e) What can we say about the semantic prosody of bent on? Does it tend to collocate with
good or bad things?
2. What lexical items do you think are frequently found in the company/co-text of COMMIT?
3. To check this, you can also use the Word Sketch function. Select ‘Word Sketch’ from the top
left-hand bar and you should see something like this:
1
This is a well-known example from Hoey, M 2005. Lexical Priming. London & New York: Routledge.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
4. Type commit into the search box and select the correct part of speech from the drop
down box. You should then see something similar to this:
5. Did you find the items that you wrote down in the box above?
6. The Word Sketch function doesn’t just tell you what words are commonly found in the
company of your search word, but also tells you what their grammatical relationship is
to the search word.
What subjects most commonly collocate with COMMIT?
What are the most common objects of COMMIT?
7. Extension: Use Word Sketch to look at the collocates of KISS, do you notice anything about
the gender distribution of the collocates?
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
E. I want to know how to find out which verb is usually used with a
particular noun
Once again, we are investigating collocation, the tendency for certain words to go together.
There are various ways of investigating this in SketchEngine, but one of the easiest is using
Word Sketch. This is the same procedure as Q.5.
1. Select ‘Word Sketch’ from the left-hand menu.
2. Type meeting into the box and select the correct part of speech
3. You should then see something similar to this:
If you focus on the first column, you can see the verbs for which meeting is the object.
4. Now try another example, for instance a word that learners often struggle with in terms
of collocation.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
F. I want to know how to find out which adverbs are usually used with a
particular adjective
Once again, we are investigating collocation, the tendency for certain words to go together.
There are various ways of investigating this in SketchEngine, but one of the easiest is using
Word Sketch. This is the same procedure as Q.4.
1. Select ‘Word Sketch’ from the left-hand menu.
2. Type enormous into the box and select the correct part of speech
3. You should then see something similar to this:
If you look at the second column we can see the words which are used to modify
enormous. If working with students, for example, this would be one way of drawing their
attention to the relatively infrequent use of very with this type of adjective.
4. Try it again with another term. You could use another corpus.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
G. I want to know how to compare two (potential) translation equivalents
For instance, you have identified a possible translation for a particular lexical item, but you
want to check whether the SL and TL items really function in similar ways or have similar
evaluative meanings (in the corpora we have available). There are many ways you could
compare the items, but we will start with a simple one…
1. For this exercise I recommend that you use the WaC corpora because they have been
compiled in similar ways and are available for nearly all the languages. The WaC corpora
contain texts which were collected from the internet. The first two letters refer to the
domain where they were collected e.g. ukWaC is the British English corpus, itWaC is the
Italian corpus etc.
2. So, from the homepage select the WaC for your source language (if you are currently in
another corpus, first click ‘home’ from the top-right corner).
3. Now select ‘Word Sketch’ from the left-hand column and type in the word that you are
interested in. You will need to specify the part of speech (noun, verb etc).
4. You can save the resulting Word Sketch by selecting ‘save’ from the left-hand menu.
5. Now return to the Sketch homepage and choose the WaC corpus for your target
language and repeat the process, inserting your potential translation into the Word
Sketch.
6. Then you can (manually) compare your Word Sketches to see if they seem similar. (The
similarity/difference may of course be affected by a range of factors)
7. Did your items occur with similar collocates in the two languages?
This manual comparison is also useful for any kind of cross-cultural analysis.
8. Extension task: Create a Word Sketch for university in the UkWaC corpus. Now create a
Word Sketch for the equivalent word in another language. What similarities/differences
do you notice?
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
H. I want to know how to find out how other people have translated a
particular word
This is currently only possible for the following language pairs: English-German, EnglishSpanish, English-Finish, English-French, English-Italian, English-Dutch.
To look at translation, you need to select one of the parallel corpora included in Sketch. These
are all EUROPARL corpora and contain data from the European Parliament, they appear
something like this on the home page: (these are the corpora for which English is the source
language, you can also look at corpora for which English is the target language:
1. Open your chosen corpus (start with from English)
2. Type straightforward into the search box. This will give you a list of concordances
containing straightforward.
3. In order to see how it was translated you need to change the view by clicking on the name of
the corpus to the right of the hits summary
4. You may then need to change the view to make it easier to read. You can do this simply by
clicking on ‘KWIC/sentence’ from the left-hand menu. You should then see something that
looks like this:
5. Now try with another term (and of course you could use another corpus).
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
I. I want to know how to find out which adjectives are used most
frequently in a particular discourse type, e.g. Academic Spoken English
In this case, you need to use part-of-speech tags. All the corpora on Sketch have been tagged for
information about their part of speech, for instance whether a word is a singular noun etc but
not all corpora use the same system, so you will need to check.
When you use the concordance tool, it is very easy to find your tagset because there is a link
next to the search box, as shown below:
You should also note that the ‘Query type’ has been changed from ‘simple’ to ‘CQL’ which means
‘Complex query language’. It’s a good idea to open the tagset in another window (or print it out
if you will be using it frequently.
1. For this search should select the ‘British Academic Spoken English Corpus’ (click on
‘home’ from the top right to return to Sketch homepage with the list of corpora).
2. First, open the tagset as described above.
3. Then click on ‘wordlist’ from the left-hand menu.
4. Change the search attribute to ‘tag’ and insert your tag into the box marked ‘pattern’,
then click on ‘multilevel’ and ask for ‘word’ as an output, as in the screenshot:
5. This should generate a list of the most frequent adjectives in that corpus. You could then
save the list and repeat the process for the ‘British Academic Written English Corpus’
for instance.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
J. I want to know how to look for patterns (e.g. what else can be in used in
the form a couple of sandwiches short of a picnic)
You can also create more complex searches for the concordance and collocate functions, for
instance, we could look at how creative the lexical template an x short of a y is
1. Click on ‘home’ (top right) and switch to the UKWaC corpus (because it is larger and
more recent)
2. Click on ‘query type’ and select ‘CQL’ (complex query language)
3. Type the following into the query box:
"a|an" []{0,3} "short" "of" "a|an”
4. You should see something a bit like this:
5. Now sort your concordances by ‘node’ (left-hand menu), what examples did you find?
What is the function of this template?
6. Think of another pattern that you would like to investigate and see if you can create the
query. If you want to get some ideas, you could look up ‘snowclones’ on the Language
Log http://languagelog.ldc.upenn.edu/nll/
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
K. I want to know how to find out what words occur in similar lexical
patterns to my search word
The ‘Thesaurus’ function in Sketch Engine is not like a manually created thesaurus. It simply
provides a list of words which occur in similar grammatical and lexical contexts to your search
word.
1. Click on ‘Thesaurus’ from the top left-hand bar and you should see something similar to
this:
2. Type similar into the search box and select the correct part of speech from the drop
down box.
3. You will notice that different appears in the thesaurus list. This is because the thesaurus
function just looks at the contexts of use, not the meaning, and antonyms are usually
used in extremely similar contexts.
4. You could also look for words where you are more interested in the cultural patterns,
for instance you could try university or baby. For this, it is probably better to use the
ukWaC corpus because it is more recent.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
L. How can I find which words are typical of one corpus compared to
another? (e.g. spoken language ‘v’ written language)
You can also calculate keywords in Sketch Engine. Keywords here refers to items which occur
statistically significantly more frequently in one corpus than other.
For instance, if we wanted to compare spoken and written academic language:
1. Open the British Academic Spoken English Corpus and click on ‘Wordlist’ from the lefthand menu.
2. At the top where it says ‘subcorpus’ we don’t change anything as we want to use all of
the corpus. At the bottom we need to specify that we are interested in ‘keywords’ and
then select the corpus that we want to compare it to i.e. the ‘British Academic Written
English Corpus’.
3. Click on ‘Make Word List’ and you should see something rather like this:
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
To compare sub-corpora, you have to specify which parts of the corpus you are interested in. For
instance, to compare spoken and written language in the BNC, you need to ‘create’ the sub-corpora.
1. To do this, click on ‘Word list’ then ‘create new’. This will open a page with the various
sections of the BNC. You should give your sub-corpus a name, e.g. BNC spoken and then
select the two spoken divisions, as shown below, and click on ‘create subcorpus’ at the
bottom of the page.:
2. Now we need to repeat the process for the written components, so click on ‘create
subcorpus’ and this time select the three written texts types from the top-left box and give it
an appropriate name, e.g. BNC written
3. To compare the two sub-corpora, go to ‘Word list’ and select the new spoken subcorpus at
the top and the written subcorpus at the bottom, as shown below:
It will take a bit longer to process this type of query… but in the end you should see
something a bit like this:
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
This type of search is particularly useful when you are trying to find out which words characterise a
particular corpus, for example if you were preparing materials for teaching vocabulary relating to
specialised field or preparing to translate a text from a specialised field, this would be one way of
getting to know the key items that differentiate that field from general language. It is also interesting
as a starting point for discourse studies.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
1. How can I re-order the concordance lines?
When you have the concordances on the screen you can click on ‘left’ or ‘right’ from the left’hand menu. If you click on ‘right’ it will sort the concordance lines in alphabetical order
according to the first word to right of your ‘node’ (your search word).
For instance, these concordance lines have been sorted to the right:
Hint: It will always list punctuation first, so you often have to click on ‘next’ at the bottom to
get past the punctuation.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
2. How can I see more context for my concordance lines?
When you are looking at your concordance lines, you can click on the line that you want
to see in more detail and it will appear at the bottom of your screen, like this:
Alternatively, you can click on ‘view options’ from the left-hand menu and increase the
number next to ‘KWIC Context size (number of characters)’.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
3. How can I search for more than one thing at once?
Use the pipe | (usually on the bottom-left on a UK keyboard) e.g.
“emphasise|emphasize” will search for both spelling variations
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
4. How can I specify words that I don’t want in the co-text?
There are two ways that you could do this.
1. From your concordance view, you could select ‘filter’ from the left-hand menu. This
will show you something similar to this:
Now select ‘negative’ to say that you want to exclude certain items and type the
item that you wish to exclude in the box.
2. Alternatively, you could exclude the unwanted items when you enter your search
term at the beginning of the process. For example, let’s say I did a search for corpus
and there were lots of irrelevant results because several hits referred to the Corpus
Vasorum Antiquorum database. I could then exclude one or more words like this:
"corpus" [word!="Vasorum"]{0,3}
This says that I want corpus where it does not occur within 3 words of Vasorum.
This function can also be useful for excluding punctuation to ensure that your
complex searches are not split across sentences. For instance, to exclude a full
stop, you would insert [word!=”\.”] - in this case we have added the \ to indicate
that the dot is actually in the text and not part of a wildcard.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
5. How can I use wildcards in searches?
1. Any sequence of characters .*
“system.*” will retrieve: system
systems
systematic etc.
2. Any word []
"confus.*" [] "by" will retrieve: confusion generated by
confusion heightened by
confusion introduced by etc
3. Any group of words []{}
"a|" []{0,3} "short" "of" "a|an" will retrieve:
this man who is obviously a few co-ordinates short of a bearing
Jason, thou truly art a few Brontoburgers short of a picnic!
this guy's a couple of sandwiches short of a picnic etc.
NB remember to put quotation marks around your items and to set your ‘query type’ to ‘CQL’
(complex query language).
Wildcard tasks (use the BNC)
a) Find examples of words ending in ious
__________________________________________________________________
b) How many different words can you form with bloody? (Use ‘wordlist’ – you don’t
need to use quotation marks around your item in ‘wordlist’. Put the wildcard at
beginning and the end)
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
c) Which are the most common words beginning with the prefix anti-? (hint: use
wordlist)
__________________________________________________________________
d) Create a query to identify occurrences of better followed by worse with no full
stop in between
__________________________________________________________________
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
6. How can I use part-of-speech tags in searches?
All the corpora in Sketch have been tagged for part of speech- but not all the corpora
use the same tags. To find the tagset (the list of codes) for your corpus you can just
click on ‘tagset summary’ from the concordance page. Open it in another window so
that you can keep referring back to it.
1. As an example, in a text about the new funding regime the expression that
‘Bursaries are non-refundable’ struck me as being rather unusual. To find out
why that might be, we could concordance the occurrences of the verb BE
followed by non….able. The query syntax is:
[tag="VB.*"]+ "non.*able"
2. Copy this query in to the concordance tool in the BNC. You need to set the
query type to ‘CQL’ (complex query language).
You should get something similar to this:
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
The concordances show that the items in the node (the central part) of this
pattern are frequently desirable; we would like to get a refund, return items etc.
Therefore, we can hypothesise that the ‘unusuality’ I perceived in the original
phrase comes from a point of view difference – the author (representing the
university) presumably would like bursaries to be reimbursed, whereas I, the
reader, didn’t see that as being desirable.
Hint: When using CQL, you can’t search for phrases, so, for instance to find all examples
of nouns followed by set in, you would insert (the items set and in are separated):
[tag="N.*"]+ "set" "in"
More information on using the POS tags is also available here:
http://trac.sketchengine.co.uk/wiki/SkE/CorpusQuerying#1.
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk
7. How many different ways are there for investigating
collocation in Sketch Engine?
1. Using the concordance tool.
This is useful if you have a small amount of data or there is a very clear pattern you
want to show
2. Using collocates.
a. To calculate collocates you first create the concordance.
b. Then click on ‘collocations’ from the bottom of the left-hand menu and you
will see something similar to this:
c. To specify which collocates you are interested in you can alter the ‘range’ So,
for instance, if I only wanted to know which words come immediately to the
right of happily, I would set the ‘range’, to 0 in the left-hand box, because this
refer to the items that come before the node (hence the minus sign), and 1 in
the right-hand box which means 1 place to the right.
d. And this would give something like this:
3. Using ‘Word Sketch’
4. Using ‘Sketch-Diff’ (if you are comparing collocational patterns for 2 items)
SketchEngine Workshop 2011
charlotte.taylor@port.ac.uk

Download Report

Sketch Engine tasks

Paperzz.com

Your Paperzz