Professionals engaged in translation, edition,
terminology and lexicography are obliged to handle vast numbers of terms online
quickly and effectively. Traditionally,
the standard procedure would unfold as first copying terms on clipboards, opening
browsers and the appropriate online resources, pasting them in search boxes,
setting the appropriate search parameters, clicking search buttons, analyzing
the results, copying the corresponding results back to the clipboard and returning back to the working environment to
paste the results in the texts. This sequence of actions is by definition time
consuming and risks the accuracy of the results.
The accuracy of the results is the most important part
of the terminology extraction procedure; therefore no inaccurate results can be
afforded. With term extraction we mean the identification of equivalents for
special terms. Subject fields and sectors such as law, economy, science, industry
etc. are characterized by field-specific terminology. Moreover, those
responsible for the drafting of documents might use their own terminology.
Term extraction can be monolingual in that it tries to analyze a text corpus
for candidate terms, or bilingual i.e., analysis of existing source texts along
with their translations to identify potential terms and their equivalents.
With that in mind, the use of advanced computer-aided extraction tools with
sounds imperative. Term extraction tools assist in the populating
procedures of a term base and setting up the terminology that is required for
particular tasks and projects. However, despite their potentiality in the
extraction procedure, the resulting list of candidate terms must always be
verified by translators or terminologists.
The main term extraction methods used in terminology management falls
under three categories:
Linguistic approach: when extraction tools identify word combos matching
specific morphological/syntactical patterns. Analytically, for the content of a
corpus to be annotated, parsers, part of speech and morphological analyzers are
used. Various matching techniques are also used for the filtering of the
candidate terms. The linguistic approach is characterized by heavy language
dependency given the different term formation patterns among languages. Term
extraction tools with a linguistic approach work effectively in a single
Statistical approach: Term extraction tools seek the repeated sequences
of lexical items. The user specifies the number of times a word/sequence of
words must be repeated in order to be considered a candidate term ( frequency
threshold). Statistical approach is language independent.
Hybrid approach: uses both linguistic and statistical elements. It
incorporates syntactic rules/filters to allow the sorting out of terms with
certain syntactic structures.
However, apart from accuracy on the selection of the term candidates,
the supported languages and formats of an extraction tool is also highly important.
Since different extraction tools cater for different extracting needs
and expectations it is important for the user to choose the right tool. For
instance, terminologists and translators expect tools with specific terms and
term variants recognition and tools that deliver delimited term candidates.
IntelliWebSearch is a tool that semi-automates the terminology search
process so the task can be completed
more rapidly and effortlessly.
MagicSearch is also a multilingual tool which allows visitors
to look-up multiple sources with a single search. The customization of the
sources included in the search and the order they appear in is also supported.
The tool uses are dictionaries, corpora, machine translation engines and search