Corpus and Corpi

Posted by emile · August 10, 2006 · 6 replies

Does anyone here regularly use 'corpuses', and if so what for?

6 Replies

mesmark

August 10, 2006

Corpora are databases of language and you can use them to look up frequency of collocations to get a sense of 'native' gramaticality.

There are some set up databases and websites you can hop onto that will do this. For example if you want to see what the most common word to follow 'such a ooo' is or 'too OOO to OOO' is, you can using their database and search functions. You can also test what is more common and get a sense of frequesncy between two similar phrases. 'since a year ago' and 'since last year.' They're good for curriculum design.

I don't use them because I can check frequency using google's search function which is quite similar. The up and down side of that is that it is a lot of web language. While it contains a lot of natural speech/writing it also contains a lot of poor grammar and errors.

Was that the question?

simplyesl

August 10, 2006

That's Interesting. I've never heard of it before!

mesmark

August 10, 2006

http://www.natcorp.ox.ac.uk/
http://www.collins.co.uk/Corpus/CorpusSearch.aspx (sampler of the CoBuild Corpus with some info)

I've used a really nice free one and I'll try to get back with the link if I can find it again.

emile OP

August 11, 2006

'corpora'! That's the word I was trying to think of!

Anyway, Mark, you hit the nail on the head, I was wondering what the benefits were of using corpuses (corpora) in the age of google. It would actually be quite easy to turn google into a proper corpus tool, wouldn't it? Do you think they might hire me?

mesmark

August 11, 2006

emile wrote:Do you think they might hire me?

Give it a shot! 🙂

Google is good for the web but again it's just indexed web text. Corpora usually collect litterary works or 'respected' works for their database. They both have their advantages and disadvantages.

Corpora are pretty cool. They have different languages, old English, a corpus for coins, ...

mesmark

August 11, 2006

Sorry by 'cool' I meant in a real geeky way that I would never tell my non-virtual friends about.