S.-Petersburg, Russia
Skype: Abacus-Rus
+7 (812) 935 8020
+7 (812) 300 1077
work model
Fulfilled projects (samples)
Tips & tricks
Artificial Intellect Information Retrieval Engine
Russian projects

Artificial Intellect Information Retrieval Engine

We propose the AIRE (Artificial Intellect Information Retrieval Engine) natural language processor applicable to your document management system.

The Engine, together with special customized add-ons, can

  • categorize the documents from an input flow, or created by user;
  • assign tags and prepare the documents to use in the Customer's document workflow.

Input data:

  • non-structured text documents;
  • parameters of categorization, e.g. a structure of tags.

Output data:

  • a set of the tagged documents;
  • statistical data.

Realized project:

  • Universal search system between corporate documents of The Constitutional Court of the Russian Federation (in Russian); a working model based on a subset of real documents is available by inquiry.

How does it work:

A natural language processor AIRE has been recently developed basing on many years activity in the field of Artificial Intelligence [1].

It provides grammatical and semantical analysis of input texts starting from each byte from the input stream upon the agenda, and building hypotheses of binding and decoding for each byte pair, further - for the results of the preceding binding cycle, etc. Progressively, an hierarchy of of the conceptual bindings is built: from bytes - to graphemes, then - to phonemes - syllabic chains - morphemes - morphemic complexes - sentences - paragraphs - whole documents.

Those bindings are provided using so called mindmaps of corresponding levels (mindmaps of character encoding, graphematics, phonology, morphology, syntax,and finally, formal semantics and pragmatics rules and concepts that are embedded into the UCO (Universal Conceptual Ontology) mindmap).

Simultaneously, the working binding hypotheses are rated, according to the coverage criterion, counting the number of bytes bounded/decoded by this hypothesis.

This process results in construction of the so called Conceptual Graphs which are sets of identified concepts and relations between them. They are formed by a combination of routes that connect the bytes covered by the hypothesis, and that pass through the concepts of higher levels of hierarchy.

As related to a problem of intellectual search, such conceptual graphs are built both for for a collection of raw plain-text documents, and for a search query.

The information retrieval can be done by by matching the graphs for the query with those for he result, in this case so called informational noise, that is typical for any other search engines, is completely eliminated.

Also, the search is done by sets of equivalent graphs, or by graphs with superclasses of concepts, these sets being built during indexing process. A measure of relevance between the query and the result of search depends inversely on the product of lengths of routes from the superclasses in the query graph to subclasses in the graph of the result. So, the search by synonyms or by the closeness of meanings can be obtained.

The processing of input texts described above can be used to provide automatic subject rubrication - the attribution of the input text to that or another predetermined category, or even an automatic creation of subject heading lists is possible, that is based on a-priori unknown content and structure of the input texts.

[1] A.V.Dobrov, Technologies of Intellectual Information Retrieval and Techniques Evaluating Their Effectiveness (in Russian: Технологии интеллектуального поиска и способы оценки их эффективности // Структурная и прикладная лингвистика, вып. 8 - Издательство СПбГУ, 2010)