Contents
Researches on existing stuff
OpenCalais that does approximately what we're looking for
Proxem is a really good starting point for links and pointers to interesting topics, resources and documentation.
Common vocabulary in Natural Language Processing
Information Retrieval (IR) => http://en.wikipedia.org/wiki/Information_retrieval
Information Extraction (IE) => http://en.wikipedia.org/wiki/Information_extraction
Sentence Boundary Disambiguation (SBD) => http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation
Word Sense Disambiguation (WSD) => http://en.wikipedia.org/wiki/Word_sense_disambiguation
Named Entity Recognition (NER) => http://en.wikipedia.org/wiki/Named_entity_recognition
Message Understanding Conference (MUC) => http://en.wikipedia.org/wiki/Message_Understanding_Conference
Anaphora => http://en.wikipedia.org/wiki/Anaphora_(linguistics)
Resource Description Framework => http://www.w3.org/RDF/
Linked Data => http://en.wikipedia.org/wiki/Linked_data
Common Sense Knowledge Bases => http://en.wikipedia.org/wiki/Commonsense_knowledge_bases
Documents & Publications
Cours d'introduction à l'analyse sémantique : http://www.limsi.fr/Individu/habert/Cours/PX/IntroductionSemantiqueArticle/IntroductionSemantiqueArticle.html
Web as a Corpus [Kilgariff & Grefenstette(2003)] => http://www.kilgarriff.co.uk/Publications/2003-KilgGrefenstette-WACIntro.pdf
Existing resources
Classifications of words :
http://en.wikipedia.org/wiki/Roget's_Thesaurus