A. I want to know how to see a word in its context This is the most basic corpus function and can be very useful for looking for patterns, for example if you want to show how a particular word tends to collocate with a particular preposition. 1. As an example, open ukWaC and type higgledy in the box, as in the screenshot 2. You should now see something similar to this: 3. If you look to the right, you can see what it tends to describe 4. You could now try this for any other word you want to look at SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk B. I want to know how to see a list of word forms for a particular lemma. 1. Select the British National Corpus. 2. Click on ‘Word List’ from the top left-hand bar and you will see something similar to this: 3. In order to see all the word forms for the lemma EMPLOY you should a. Select ‘lemma’ from the drop-down box next to ‘search attribute’ b. write employ in the white box c. tick ‘Use multilevel wordlist’ d. look at the drop-down boxes next to ‘use multilevel wordlist’ e. from the first drop-down box select ‘word’ f. from the first drop-down box select ‘tag’ g. click on ‘make word list’ This function can also be useful for investigating whether a word is used most frequently in its verb or noun form, for instance. It can also be useful for comparing the frequency of two near-synonyms or two spelling variants. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk C. I want to know how to find the differences between two near synonyms 1. Select SketchDiff from the left-hand menu and add your two search items like this: You should then see something like this: 2. If you focus on the ‘modifies’ column on your screen, what differences do you notice between the ‘large only’ patterns and the ‘big only’ patterns? 3. Using the Sketch Diff function in Sketch Engine, compare the following: little / small boy / girl (use UKWaC ). Do girl and boy seem identical except for the gender difference? 4. Think of your own pair to investigate and write your findings below: This function is also very useful for comparing/deciding between two possible translations of an item. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk D. I want to know how to find out how a particular word behaves, for example, whether it usually collocates with positive or negative things Collocation refers to the tendency for words to go together and is an extremely important concept in corpus linguistics and the study of lexis. Words which are habitually found together are referred to as collocates. 1. Investigating bent on The modern conference resembles the pilgrimage of medieval Christendom in that it allows the participants to indulge themselves in all the pleasures and diversions of travel while appearing to be austerely bent on self-improvement. (David Lodge, Small World)1 In this case, we can identify something unusual in the combination of bent on + selfimprovement. But why? a) b) c) d) Select the BNC Type bent on in the search box Look at the words to the right of bent on – what do they have in common? To see a summary list of all the terms that come immediately after bent on: a. Click on ‘collocations’ from the left-hand menu b. Set the range at 0 to 1 e) What can we say about the semantic prosody of bent on? Does it tend to collocate with good or bad things? 2. What lexical items do you think are frequently found in the company/co-text of COMMIT? 3. To check this, you can also use the Word Sketch function. Select ‘Word Sketch’ from the top left-hand bar and you should see something like this: 1 This is a well-known example from Hoey, M 2005. Lexical Priming. London & New York: Routledge. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 4. Type commit into the search box and select the correct part of speech from the drop down box. You should then see something similar to this: 5. Did you find the items that you wrote down in the box above? 6. The Word Sketch function doesn’t just tell you what words are commonly found in the company of your search word, but also tells you what their grammatical relationship is to the search word. What subjects most commonly collocate with COMMIT? What are the most common objects of COMMIT? 7. Extension: Use Word Sketch to look at the collocates of KISS, do you notice anything about the gender distribution of the collocates? SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk E. I want to know how to find out which verb is usually used with a particular noun Once again, we are investigating collocation, the tendency for certain words to go together. There are various ways of investigating this in SketchEngine, but one of the easiest is using Word Sketch. This is the same procedure as Q.5. 1. Select ‘Word Sketch’ from the left-hand menu. 2. Type meeting into the box and select the correct part of speech 3. You should then see something similar to this: If you focus on the first column, you can see the verbs for which meeting is the object. 4. Now try another example, for instance a word that learners often struggle with in terms of collocation. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk F. I want to know how to find out which adverbs are usually used with a particular adjective Once again, we are investigating collocation, the tendency for certain words to go together. There are various ways of investigating this in SketchEngine, but one of the easiest is using Word Sketch. This is the same procedure as Q.4. 1. Select ‘Word Sketch’ from the left-hand menu. 2. Type enormous into the box and select the correct part of speech 3. You should then see something similar to this: If you look at the second column we can see the words which are used to modify enormous. If working with students, for example, this would be one way of drawing their attention to the relatively infrequent use of very with this type of adjective. 4. Try it again with another term. You could use another corpus. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk G. I want to know how to compare two (potential) translation equivalents For instance, you have identified a possible translation for a particular lexical item, but you want to check whether the SL and TL items really function in similar ways or have similar evaluative meanings (in the corpora we have available). There are many ways you could compare the items, but we will start with a simple one… 1. For this exercise I recommend that you use the WaC corpora because they have been compiled in similar ways and are available for nearly all the languages. The WaC corpora contain texts which were collected from the internet. The first two letters refer to the domain where they were collected e.g. ukWaC is the British English corpus, itWaC is the Italian corpus etc. 2. So, from the homepage select the WaC for your source language (if you are currently in another corpus, first click ‘home’ from the top-right corner). 3. Now select ‘Word Sketch’ from the left-hand column and type in the word that you are interested in. You will need to specify the part of speech (noun, verb etc). 4. You can save the resulting Word Sketch by selecting ‘save’ from the left-hand menu. 5. Now return to the Sketch homepage and choose the WaC corpus for your target language and repeat the process, inserting your potential translation into the Word Sketch. 6. Then you can (manually) compare your Word Sketches to see if they seem similar. (The similarity/difference may of course be affected by a range of factors) 7. Did your items occur with similar collocates in the two languages? This manual comparison is also useful for any kind of cross-cultural analysis. 8. Extension task: Create a Word Sketch for university in the UkWaC corpus. Now create a Word Sketch for the equivalent word in another language. What similarities/differences do you notice? SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk H. I want to know how to find out how other people have translated a particular word This is currently only possible for the following language pairs: English-German, EnglishSpanish, English-Finish, English-French, English-Italian, English-Dutch. To look at translation, you need to select one of the parallel corpora included in Sketch. These are all EUROPARL corpora and contain data from the European Parliament, they appear something like this on the home page: (these are the corpora for which English is the source language, you can also look at corpora for which English is the target language: 1. Open your chosen corpus (start with from English) 2. Type straightforward into the search box. This will give you a list of concordances containing straightforward. 3. In order to see how it was translated you need to change the view by clicking on the name of the corpus to the right of the hits summary 4. You may then need to change the view to make it easier to read. You can do this simply by clicking on ‘KWIC/sentence’ from the left-hand menu. You should then see something that looks like this: 5. Now try with another term (and of course you could use another corpus). SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk I. I want to know how to find out which adjectives are used most frequently in a particular discourse type, e.g. Academic Spoken English In this case, you need to use part-of-speech tags. All the corpora on Sketch have been tagged for information about their part of speech, for instance whether a word is a singular noun etc but not all corpora use the same system, so you will need to check. When you use the concordance tool, it is very easy to find your tagset because there is a link next to the search box, as shown below: You should also note that the ‘Query type’ has been changed from ‘simple’ to ‘CQL’ which means ‘Complex query language’. It’s a good idea to open the tagset in another window (or print it out if you will be using it frequently. 1. For this search should select the ‘British Academic Spoken English Corpus’ (click on ‘home’ from the top right to return to Sketch homepage with the list of corpora). 2. First, open the tagset as described above. 3. Then click on ‘wordlist’ from the left-hand menu. 4. Change the search attribute to ‘tag’ and insert your tag into the box marked ‘pattern’, then click on ‘multilevel’ and ask for ‘word’ as an output, as in the screenshot: 5. This should generate a list of the most frequent adjectives in that corpus. You could then save the list and repeat the process for the ‘British Academic Written English Corpus’ for instance. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk J. I want to know how to look for patterns (e.g. what else can be in used in the form a couple of sandwiches short of a picnic) You can also create more complex searches for the concordance and collocate functions, for instance, we could look at how creative the lexical template an x short of a y is 1. Click on ‘home’ (top right) and switch to the UKWaC corpus (because it is larger and more recent) 2. Click on ‘query type’ and select ‘CQL’ (complex query language) 3. Type the following into the query box: "a|an" []{0,3} "short" "of" "a|an” 4. You should see something a bit like this: 5. Now sort your concordances by ‘node’ (left-hand menu), what examples did you find? What is the function of this template? 6. Think of another pattern that you would like to investigate and see if you can create the query. If you want to get some ideas, you could look up ‘snowclones’ on the Language Log http://languagelog.ldc.upenn.edu/nll/ SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk K. I want to know how to find out what words occur in similar lexical patterns to my search word The ‘Thesaurus’ function in Sketch Engine is not like a manually created thesaurus. It simply provides a list of words which occur in similar grammatical and lexical contexts to your search word. 1. Click on ‘Thesaurus’ from the top left-hand bar and you should see something similar to this: 2. Type similar into the search box and select the correct part of speech from the drop down box. 3. You will notice that different appears in the thesaurus list. This is because the thesaurus function just looks at the contexts of use, not the meaning, and antonyms are usually used in extremely similar contexts. 4. You could also look for words where you are more interested in the cultural patterns, for instance you could try university or baby. For this, it is probably better to use the ukWaC corpus because it is more recent. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk L. How can I find which words are typical of one corpus compared to another? (e.g. spoken language ‘v’ written language) You can also calculate keywords in Sketch Engine. Keywords here refers to items which occur statistically significantly more frequently in one corpus than other. For instance, if we wanted to compare spoken and written academic language: 1. Open the British Academic Spoken English Corpus and click on ‘Wordlist’ from the lefthand menu. 2. At the top where it says ‘subcorpus’ we don’t change anything as we want to use all of the corpus. At the bottom we need to specify that we are interested in ‘keywords’ and then select the corpus that we want to compare it to i.e. the ‘British Academic Written English Corpus’. 3. Click on ‘Make Word List’ and you should see something rather like this: SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk To compare sub-corpora, you have to specify which parts of the corpus you are interested in. For instance, to compare spoken and written language in the BNC, you need to ‘create’ the sub-corpora. 1. To do this, click on ‘Word list’ then ‘create new’. This will open a page with the various sections of the BNC. You should give your sub-corpus a name, e.g. BNC spoken and then select the two spoken divisions, as shown below, and click on ‘create subcorpus’ at the bottom of the page.: 2. Now we need to repeat the process for the written components, so click on ‘create subcorpus’ and this time select the three written texts types from the top-left box and give it an appropriate name, e.g. BNC written 3. To compare the two sub-corpora, go to ‘Word list’ and select the new spoken subcorpus at the top and the written subcorpus at the bottom, as shown below: It will take a bit longer to process this type of query… but in the end you should see something a bit like this: SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk This type of search is particularly useful when you are trying to find out which words characterise a particular corpus, for example if you were preparing materials for teaching vocabulary relating to specialised field or preparing to translate a text from a specialised field, this would be one way of getting to know the key items that differentiate that field from general language. It is also interesting as a starting point for discourse studies. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 1. How can I re-order the concordance lines? When you have the concordances on the screen you can click on ‘left’ or ‘right’ from the left’hand menu. If you click on ‘right’ it will sort the concordance lines in alphabetical order according to the first word to right of your ‘node’ (your search word). For instance, these concordance lines have been sorted to the right: Hint: It will always list punctuation first, so you often have to click on ‘next’ at the bottom to get past the punctuation. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 2. How can I see more context for my concordance lines? When you are looking at your concordance lines, you can click on the line that you want to see in more detail and it will appear at the bottom of your screen, like this: Alternatively, you can click on ‘view options’ from the left-hand menu and increase the number next to ‘KWIC Context size (number of characters)’. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 3. How can I search for more than one thing at once? Use the pipe | (usually on the bottom-left on a UK keyboard) e.g. “emphasise|emphasize” will search for both spelling variations SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 4. How can I specify words that I don’t want in the co-text? There are two ways that you could do this. 1. From your concordance view, you could select ‘filter’ from the left-hand menu. This will show you something similar to this: Now select ‘negative’ to say that you want to exclude certain items and type the item that you wish to exclude in the box. 2. Alternatively, you could exclude the unwanted items when you enter your search term at the beginning of the process. For example, let’s say I did a search for corpus and there were lots of irrelevant results because several hits referred to the Corpus Vasorum Antiquorum database. I could then exclude one or more words like this: "corpus" [word!="Vasorum"]{0,3} This says that I want corpus where it does not occur within 3 words of Vasorum. This function can also be useful for excluding punctuation to ensure that your complex searches are not split across sentences. For instance, to exclude a full stop, you would insert [word!=”\.”] - in this case we have added the \ to indicate that the dot is actually in the text and not part of a wildcard. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 5. How can I use wildcards in searches? 1. Any sequence of characters .* “system.*” will retrieve: system systems systematic etc. 2. Any word [] "confus.*" [] "by" will retrieve: confusion generated by confusion heightened by confusion introduced by etc 3. Any group of words []{} "a|" []{0,3} "short" "of" "a|an" will retrieve: this man who is obviously a few co-ordinates short of a bearing Jason, thou truly art a few Brontoburgers short of a picnic! this guy's a couple of sandwiches short of a picnic etc. NB remember to put quotation marks around your items and to set your ‘query type’ to ‘CQL’ (complex query language). Wildcard tasks (use the BNC) a) Find examples of words ending in ious __________________________________________________________________ b) How many different words can you form with bloody? (Use ‘wordlist’ – you don’t need to use quotation marks around your item in ‘wordlist’. Put the wildcard at beginning and the end) __________________________________________________________________ __________________________________________________________________ __________________________________________________________________ c) Which are the most common words beginning with the prefix anti-? (hint: use wordlist) __________________________________________________________________ d) Create a query to identify occurrences of better followed by worse with no full stop in between __________________________________________________________________ SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 6. How can I use part-of-speech tags in searches? All the corpora in Sketch have been tagged for part of speech- but not all the corpora use the same tags. To find the tagset (the list of codes) for your corpus you can just click on ‘tagset summary’ from the concordance page. Open it in another window so that you can keep referring back to it. 1. As an example, in a text about the new funding regime the expression that ‘Bursaries are non-refundable’ struck me as being rather unusual. To find out why that might be, we could concordance the occurrences of the verb BE followed by non….able. The query syntax is: [tag="VB.*"]+ "non.*able" 2. Copy this query in to the concordance tool in the BNC. You need to set the query type to ‘CQL’ (complex query language). You should get something similar to this: SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk The concordances show that the items in the node (the central part) of this pattern are frequently desirable; we would like to get a refund, return items etc. Therefore, we can hypothesise that the ‘unusuality’ I perceived in the original phrase comes from a point of view difference – the author (representing the university) presumably would like bursaries to be reimbursed, whereas I, the reader, didn’t see that as being desirable. Hint: When using CQL, you can’t search for phrases, so, for instance to find all examples of nouns followed by set in, you would insert (the items set and in are separated): [tag="N.*"]+ "set" "in" More information on using the POS tags is also available here: http://trac.sketchengine.co.uk/wiki/SkE/CorpusQuerying#1. SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk 7. How many different ways are there for investigating collocation in Sketch Engine? 1. Using the concordance tool. This is useful if you have a small amount of data or there is a very clear pattern you want to show 2. Using collocates. a. To calculate collocates you first create the concordance. b. Then click on ‘collocations’ from the bottom of the left-hand menu and you will see something similar to this: c. To specify which collocates you are interested in you can alter the ‘range’ So, for instance, if I only wanted to know which words come immediately to the right of happily, I would set the ‘range’, to 0 in the left-hand box, because this refer to the items that come before the node (hence the minus sign), and 1 in the right-hand box which means 1 place to the right. d. And this would give something like this: 3. Using ‘Word Sketch’ 4. Using ‘Sketch-Diff’ (if you are comparing collocational patterns for 2 items) SketchEngine Workshop 2011 charlotte.taylor@port.ac.uk
© Copyright 2025 Paperzz