Senso Comune


Da Senso Comune.

In the 1900, means of mass communication enabled the few who mastered them to spread information among those who had access to it. Nowadays, the Web can be considered the extension of that ability to produce, find, and elaborate knowledge and information, generally in a free and fair way.

Every day, in a more or less random fashion, the Web receives an astonishing amount of textual, visual, and audio data. The tool that we have at our disposal to access this constantly growing mass of information is language. English, though dominant, is closely followed by other languages, such as Spanish, French, Chinese, Japanese, German, and Italian. What we use are the very common and well known words. These are found in texts or added as tags to images and sounds, and are generally used to describe data and provide references.

In order for the Web's content to be fully exploited in its dimensions and its dynamics, it is necessary that automatic systems be able to interpret words as carriers of meaning, rather than just mere strings. This does not only have to do with relations between words across languages. Words always meet things tangentially: different words can refer to the same object and different objects can be referred to by the same word. Even scientific and domain-specific terminology, while trying to limit ambiguities, focuses on certain senses of words which might clash with senses that are more common in everyday language, thus multiplying ambiguities.

The Web absorbs all the complexity of natural language. Furthermore, one could claim that the Web, the most representative means of communication of our time, depends, for its development, on the answers we will be able to give to the challenges of semantics.

Natural languages exist in use, and they belong to their users. They are a social product and their main strength is the consensus of the speakers. However, language has specific aims: it is used to refer to entities that belong to the physical world or are made up in social environments where expressions are produced and understood by means of the creativity that belongs to those who produce words and those who try to understand them. This is the reason why the ontology, i.e. the conceptualisation of the physical and social world, comes to be the container within which the speakers' consensus exists. Semantics and ontologies, though different, are strongly related. Currently, we have the appropriate theoretical tools and background knowledge to analyse such complex relation.

Knowledge of meaning is first of all knowledge of rules of usage that are appropriate in a given context of speakers. It is also awareness of a large variety of "games" where rules are created while playing. Understanding a discourse, for man or machine, can be thought of as interpreting the moves in a linguistic game that is played in a given context. Such moves can obviously include not only 'descriptive' and honest uses, but also lies, ambiguities, vagueness, manipulative techniques, creativity. This treasure of knowledge can only be exploited in information technology if there is a common background, which must be the outcome of a cross-discipline effort.

Linguistic knowledge, like the Web, is indeed an extended net of points and relations between them. Therefore, a system that aims at dealing with linguistic meaning, will have to be equipped with (and be able to reason on) a formal, and thus machine-readable, representation of the complexity of common sense. Collecting and organising such knowledge, as well as developing tools that can reason on it, is crucial for the evolution of the Web and information technology, towards the development of intelligent systems and towards cultural progress in general.

How can we acquire and organise that huge amount of complex, though natural, knowledge that is linguistic meaning? Generally, on matters of language, speakers should speak. Therefore, knowledge of meaning should arise from sharing information about language and about reality. More specifically, formalising linguistic meaning means, first of all, building repositories of structured data where lexical and ontological knowledge are appropriately integrated. Also, such repositories must be built and constantly validated by the speakers themselves, by all those who nowadays use the Web as a means for creating shared knowledge and who managed to create the largest encyclopaedia of all times through simple interaction methods.

Finally, we believe that repositories of linguistic knowledge which are cooperatively built by the speakers over the Web are an invaluable treasure, and should thus be owned by the entire community, like language itself.