Discipline-Conditioned Choice And Use Of General Scientific (Academic) Vocabulary

Abstract

The article introduces the preliminary results of the systematic study of the functioning and semantics peculiarities of general scientific (academic) vocabulary in the scientific discourse of biomedical orientation in comparison with other sciences discourses. This article compares the results with humanities and social sciences. For this study, a corpus of scientific texts on medicine and biology was specially created, which consists of 5 484 665 word usage. It provides a comparative analysis of the frequency of academic vocabulary units (10 most common verbs, adjectives and nouns) most commonly used in this type of scientific discourse. It is compared with the frequency of the same units in the texts of the humanities and social sciences (according to the well-known corpus «Academic Vocabulary List» by D. Gardner и M. Davies). Statistical analysis of the frequency of general scientific vocabulary is supplemented by the study of the frequency and distribution of collocations characteristic of its individual units. In addition, the analysis was supplemented by a qualitative analysis of changes in their semantics due to the discourse type. The particular example of the general scientific noun response and its most common collocations with verbs, nouns and adjectives in biomedical discourse demonstrates the differences due to the discourse type. It shows disciplinary preferences in combinatorics of the same unit of academic vocabulary. The results suggest that general scientific vocabulary is not common for discourses of all fields of knowledge and can be a marker of the discipline of discourse.

Keywords: Academic vocabularycorpus linguisticscorpusbiomedical discourse

Introduction

Before the advent of corpus linguistics, it was considered that a significant part of academic vocabulary was common to all fields of scientific knowledge. This was expressed in the term "general scientific", which was established for it in the national tradition (the terms "general scientific" and "academic" are used in the article as synonymous). The methods of corpus linguistics make it obvious that disciplinary differences in the functioning of general scientific vocabulary do not just exist. They are manifested both in the frequency of use of units in different fields of knowledge, and in changing their semantics and syntactics. As Hyland and Tse (2007) rightly point out, although the same words are used in texts of completely different sciences, “all disciplines adapt words to their own ends, displaying considerable creativity in both shaping words and combining them with others to convey specific, theory-laden meanings associated with disciplinary models and concepts” (p. 240). In this regard, the interest of corpus linguistics has shifted from studying the functioning of individual vocabulary units in the academic discourse to the study of the frequency and distribution of collocations inherent in general scientific vocabulary (Ackermann & Chen, 2013; Biber et al., 2004; Hyland, 2008; Hyland, 2012).

Problem Statement

The vocabulary of academic discourse is divided into a) terms, b) words and collocations that are thematically unscientific and present in any speech style (function words and everyday vocabulary) and c) general scientific vocabulary. General scientific vocabulary is the most difficult for mastering it by students of non-linguistic faculties of higher educational institutions due to its functional and semantics features (Polubichenko, 2019). This accounts for the increased attention of linguists to the language of science vocabulary in recent years. First of all, from the point of view of teaching a foreign language of specialty in non-linguistic faculties, translation of narrowly disciplinary academic literature and bilingual lexicography.

Research Questions

1. Are there differences in the frequency of use and distribution of general scientific vocabulary in different types of scientific discourses (on the example of texts of biomedical, humanities and social sciences)? How significant are they?

2. Are there qualitative differences in the compatibility and semantics of general scientific vocabulary in the considered varieties of scientific discourses?

Purpose of the Study

The purpose of the study is to test the hypothesis that the vocabulary, which is called general scientific vocabulary in the national linguistic tradition, is not common to discourses of different disciplinary orientation. On the contrary, it is able to show disciplinary specificity both in quantitative (frequency and distribution) and qualitative (collocations and semantics) relations. The article introduces the progress and preliminary results of the study.

Research Methods

The study was conducted using corpus linguistics methods. In 2004, the British lexicographer and corpus linguist Kilgarriff and Czech specialist in the field of computer processing of natural language Rychlý created the corpus query system Sketch Engine (as cited in Kilgarriff et al., 2004). As “for language learning and teaching, smaller corpora can be more useful as they are designed to represent the specific part of the language under investigation” (Mudraya, 2006, p. 236). on the basis of Sketch Engine, the corpus of scientific texts of biomedical subjects (hereinafter – BIOMED), which consists of 5 484 665 word usage, was specially compiled. The 872 scientific articles of different types (research article, review article, clinical investigation article) from journals of narrow professional orientation were material for the corpus. The selection of the material was carried out in terms of the authenticity of the text. 50% of the material included in the corpus is written by scientists from the UK, 30 % – the USA and 20 % – Australia, Canada and New Zealand. The corpus includes 21 subcorpuses, each representing one or another of the main biomedical specializations (biochemistry, biophysics, biotechnology / bioengineering, botany, cardiology, cell biology, zoology, etc.). The corpus is well balanced. Each subcorpus occupies approximately 4.8 % of the total corpus volume and contains from 250 to 270 thousand word usage.

The second stage of the study was the keyword selection using the Keywords and Terms function incorporated in the Sketch Engine. It allows a comparison of the frequency of the corpus units with their frequency in the reference corpus. For this study, the reference corpus was English Web 2013 (EnTenTen13) (Jakubíček et al., 2013). One of the defining characteristics of academic vocabulary is its high frequency in scientific discourse. This makes using the Engine Keywords and Terms function appropriate. After excluding highly specialized and terminological vocabulary and checking with the lists of academic vocabulary by Coxhead (2000), Gardner and Davies (2014), the list of 258 units of general scientific vocabulary was compiled. The vocabulary is presented in the BIOMED corpus and refers to the three parts of speech (94 verbs, 121 nouns and 43 adjectives). During lemmatisation, the disambiguation of different parts of speech, which is inherent in English, was carried out. The list included only academic vocabulary units that occur at least 5 times in each BIOMED subcorpus.

The next stage of the study was to obtain information on the frequency and semantics of these general scientific vocabulary units in the scientific discourse of different disciplinary orientation. For this purpose, the Academic Vocabular List (AVL) corpus of 120 032 441 word usage was used. It was created by American researchers Dee Gardner and Mark Davies in 2013 and includes nine groups of texts on scientific disciplines. A comparative analysis was made of the frequency of the most common general scientific vocabulary units in the BIOMED corpus and in the three AVL subcorpuses that do not intersect thematically: Social Science, Humanities, and History. Inexplicably, the creators of the AVL singled out history from the Humanities into an independent subcorpus. Taking into account that all compared corpuses have different volumes (BIOMED – 5 484 665 word usage, Social Sciencе – 16 720 729, Humanities – 11 111 225, History – 14 289 007), relative frequency of units was used for comparison instead of absolute frequency. This quantity is statistically stable and allows abstracting from the real corpus size. The conversion of absolute frequency into relative frequency was done using the statistical probability formula:

N F = ( A F / C S ) * 1000000 ,

NF (normalized frequency) is the relative frequency (measured in instances per million, hereinafter ipm), AF (absolute frequency) is the absolute frequency (total quantity of occurrences in the studied corpus), CS – corpus size (measured in word usage). The quantity (NF) indicates how many times a token would appear in the corpus equal to a million word usage. It allows a comparison of frequency data on the use of each token in different corpuses of different sizes. In this study, tokens are either individual units of academic vocabulary or collocations with them.

The 10 most common academic verbs, adjectives and nouns were selected for a quantitative analysis of the frequency of general scientific vocabulary in texts of different fields of scientific knowledge based on the BIOMED corpus. The absolute frequencies of use of these units in the History, Social Science and Humanities corpuses were successively set and their relative frequencies were calculated (ipm). The results are shown in Table 01 . Further, the most frequent collocations that occur in the BIOMED corpus 5 or more times are identified for each unit of academic vocabulary presented in the table. They are compared with similar data on the reference corpuses of History, Social Science and Humanities with the identification of common collocations and colligations for different disciplines. At the final stage of the study, a qualitative structural-semantic contextual analysis of each of the selected most common units of general scientific vocabulary is carried out in its inherent common and disciplinary specific colligations and collocations.

Findings

Table 1 -
See Full Size >

In Table 01 , the colossal imbalances in the frequency of use of academic vocabulary units in different fields of scientific knowledge, which in some cases exceed 30 times, attract attention. The general scientific verb use demonstrates the most uneven frequency distribution across different disciplinary discourses. In biomedical texts (BIOMED) it is 37 times more common than in historical texts (History), 18 times more common than in Social Science and 3 times more common than in Humanities. The verbs increase and reduce are 11 and 8 times more common in the BIOMED corpus than in Humanities. The verbs associate and assess appear in BIOMED 8 and 9 times more often than in the History corpus. The noun outcome is used in biomedical discourse 6 times more often than in historical discourse and 9 times more often than in humanitarian discourse. The nouns factor and function are 8 times more common in BIOMED than in Humanities and History respectively. Of all the studied general scientific nouns, only effect has a higher (about three times) frequency of occurrence not in biomedical, but in historical texts. Functional shows the most uneven distribution among adjectives. In the BIOMED corpus it is 22 times more common than in historical texts, 4 times more common than in the Social Science corpus and 10 times more common than in Humanities. The adjective multiplе is 7 times more common in BIOMED than in History, and the adjectives total and relative are used, respectively, 8 and 9 times less often in Humanities than in BIOMED.

According to the provided data, we can conclude that the discourses of biomedical and social disciplines shows the greatest similarity in the frequency of the studied general scientific words. Such general scientific vocabulary units as suggest, indicate, identify, assess, study, result, model, factor, difference, significant, similar are found in texts of these fields of scientific knowledge with approximately the same frequency.

We illustrate the progress of analysis of the academic vocabulary functioning in scientific discourse of different disciplinary orientation on the example of noun response.

The selected common collocations with the word response for different disciplines are presented in the tables (Tables 2 , Table 3 , and Table 4 ).

Table 2 -
See Full Size >

The comparison of the frequency of common collocations of the general scientific noun response with other nouns for different disciplines shows, for example, that the collocation stress response is about 9 times more common in the BIOMED corpus than in the Social Science corpus and is not represented at all in the History and Humanities corpuses (Table 02 ). On the other hand, the collocation reader response is used in the History and Humanities corpuses (in Humanities 9 times more often than in History). This collocation is completely out of character for biomedical discourse, as well as the combination of response with the nouns student, audience and policy. The collocation treatment response shows the greatest variation in frequency values. It is 23 times more common in biomedical discourse than in social science texts and is not represented at all in the History and Humanities corpuses.

The frequencies of combinations of the word response with different adjectives also demonstrate uneven disciplinary use (Table 03 ). For example, positive is found in Social Science about 1,5 times more often than in BIOMED, 2 times more often than in History and 5 times more often than in Humanities. The adjective initial in combination with response appears in BIOMED 3-4 times more often than in History, Humanities and Social Science, where this collocation occurs with the same frequency. The collocation emotional response demonstrates the maximum frequency of occurrence in Social Science. It is about 2 times less common in Humanities and 7 and 13 times less common in the biomedical and historical sciences, respectively. The adjective direct shows a relatively even distribution of frequencies, the difference between the maximum (Humanities) and the minimum (Social Science) is 2 times.

Table 3 -
See Full Size >

The data on the frequency of common collocates-verbs of the general scientific noun response for all studied scientific discourses are given in Table 04 . The verb compare shows the greatest differences in frequency. It is found in BIOMED about 20 times more often than in Humanities and 6 times more often than in Social Science. The verb show is used in biomedical texts 11 times more often than in historical texts, 7 times more often than in Social Science, and 15 times more often than in Humanities. The verb mediate is 7 times less common in BIOMED than in Social Science. The verb elicit has a relatively even distribution of frequency in all fields except biomedical disciplines, where its frequency is 3 times higher.

Table 4 -
See Full Size >

Functional-semantic analysis of the disciplinary use of academic vocabulary

The qualitative analysis, namely semantic and functional analysis of general scientific words, depending on the disciplinary field of their use, reveals significant differences, as well as statistical analysis. By studying the most frequent collocations for the noun response in different scientific discourses, it becomes apparent that most of the nouns and adjectives that form the collocation are predictably different (Table 05 ). Because they are subjectively determined and take part in the formation of terminological vocabulary. As Hyland (2008) notes, different disciplines give preference to certain word semantics and form their phraseological patterns. Both quantitative and qualitative analysis reveal the greatest similarity in the use of the same collocations between biomedical discourse and social science texts.

Thus, almost all collocates-nouns in BIOMED (except questionnaire common with Social Science) and most adjectives combined with the noun response form terms used to describe the reactions “ of a muscle, nerve, gland, or other excitable tissue to a stimulus ”. This demonstrates one of the functions of general scientific vocabulary as a term-forming resource. The identified nouns and adjectives are characterized by participation in the formation of terms by juxtaposing more than two bases (so-called string compounds) characteristic of the modern language of science ( competitive stress response, auditory brainstem response, acute phase response , etc.).

Table 5 -
See Full Size >

We consider the most interesting cases where the collocation occurs in all discipline groups in order to determine which meanings of the noun response are implemented in different disciplinary discourses.

positive

  • So although the positive response to the antiviral does point to potential correct clinical diagnosis it is not possible to confirm this. (BIOMED)

  • In a November 2009 interview, Bayanouni stressed that the group's suspension of opposition activity was conditional upon a positive response from the regime. (History)

  • When my graduate students reported on their completed projects, the parent/student/teacher surveys indicated an overall positive response, and some participants even recommended that the CPLP be expanded to cover other subjects. (Humanities)

  • It is only a positive response to these principles by those with extra resources that will ultimately bring life to Africa, as well as communicating the warm message to the world's poor that the world is not such a cruel place after all. (Social Science)

In the first example, response implements the meaning “ a bodily process occurring due to the effect of some antecedent stimulus or agent ”. In the second example, it implements the meaning “an action of agreement, approval, encouragement”. In the third, the meaning is “an answer to the survey, a feedback”, and in the fourth, it is a non-verbal reaction of “acceptance”.

initial

  • These data suggest that while initial responses occur quickly, deep responses are associated with longer time on treatment and continue to develop over time. (BIOMED)

  • Did al-Qaeda expect such an overwhelming initial response from the United States? What, after all, did Bin Laden think he was going to accomplish strategically by killing thousands of innocent Americans? (History)

  • When he asked me why 1 liked music, my initial response was, "Because it makes me feel...". My friend interrupted me <…> (Humanities)

  • Respondents were encouraged to take their time. Once they had made their initial response to a question, a general probe was used to ensure that respondents tried hard to list everything they knew relevant to that question. (Social Science)

In the first context, initial response is used in the meaning of “ the reaction at the beginning of the cure, treatment, etc. ”. On the one hand, it demonstrates how important the temporal aspect is in the biomedical discourse when using the word response . On the other hand, initial responses is contrasted with deep responses . Moreover, the collocation deep responses is not characteristic of other disciplines, as it is a term meaning “ a sign of disease remission ” (in BIOMED it occurs 7 times). It does not mean temporal (quantitative), but a qualitative characteristic, which is the case, for example, in the following passage: Cobimetinib plus vemurafenib improved outcomes across quartiles of response regardless of prognostic factors or gene signatures and provided durable survival benefits in patients with deep responses (BIOMED).

The second example implements the meaning “The initial decisions and actions taken in reaction to a reported incident”, and the third, and fourth implement the meaning “a verbal, written, or electronic answer”. The use of response in the meaning of “answer” is most characteristic of humanitarian, sociological and historical discourses. We note that the noun answer is found in these fields of scientific knowledge much more often than in biomedical sciences: 15.3 ipm – BIOMED, 83.8 – History, 88.5 – Social Science, 115.8 – Humanities.

Most notable is the presence of adjectives in biomedical discourse ( temporal response, sustained response, long-term response, etc.), which in combination with response add temporal characteristics to its meaning and are not frequency in texts of other disciplines. On the other hand, the temporal adjectives characteristic of historical and sociological discourse ( rapid – 9 in History and 31 in Social Science, quick – 7 in History), are found only once in BIOMED.

Conclusion

The systematic study of the academic vocabulary functioning in biomedical discourse is carried out in comparison with the humanities and social sciences discourses. It is based on the corpus linguistics methods using statistical methods and methods of qualitative analysis of language units. This study confirms the hypothesis and demonstrates that general scientific vocabulary, like the special (terminological) vocabulary, can be a marker of the discipline of a text both in terms of frequency and distribution of units, and in semantics and specific collocations.

References

Copyright information

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

About this article

Publication Date

31 October 2020

eBook ISBN

978-1-80296-091-4

Publisher

European Publisher

Volume

92

Print ISBN (optional)

-

Edition Number

1st Edition

Pages

1-3929

Subjects

Sociolinguistics, linguistics, semantics, discourse analysis, translation, interpretation

Cite this article as:

Polubichenko, L. V., & Beliaeva, T. R. (2020). Discipline-Conditioned Choice And Use Of General Scientific (Academic) Vocabulary. In D. K. Bataev (Ed.), Social and Cultural Transformations in the Context of Modern Globalism» Dedicated to the 80th Anniversary of Turkayev Hassan Vakhitovich, vol 92. European Proceedings of Social and Behavioural Sciences (pp. 898-907). European Publisher. https://doi.org/10.15405/epsbs.2020.10.05.120