Professional pilots and air traffic controllers (ATCOs) are used to taking exams. From the moment they embark on their chosen career path, the assessments begin. Aptitude tests, training checks, instrument ratings, licensing checks, professional knowledge exams. And they continue. So what is one more test? Well for many, mainly speakers of English as a second language, aviation English tests are actually a big deal.

Tests of aviation English in accordance with the ICAO Language Proficiency Requirements (LPRs) are without a shadow of a doubt an example of high-stakes testing, the results of which may engender the loss of a professional licence with the associated organisational and economic repercussions for airlines and air navigation service providers (ANSPs), not to mention the implications for operational safety and efficiency.

Since coming into force in 2011, LPRs have imposed an obligation on all professional pilots and ATCOs to validate their operational licences with a language proficiency certificate of Operational level 4 or above. The impact of this policy has undoubtedly been huge, compelling individuals and organisations alike to invest in English language training and ultimately testing. Whilst the large majority of personnel affected by the introduction of the LPRs have reconciled themselves to the new system, there is a certain amount of underlying resentment.

Why resentment? One explanation could be that the current language testing system is considered unsatisfactory by many stakeholders. A lack of consistency between the tests available and questions as to whether the majority of so-called ICAO level 4 or 5 tests are fit for purpose appears to have resulted in a sense of injustice in workplaces across the aviation industry. Professor Charles Alderson highlighted these issues in a survey of a number of aviation English tests conducted in 2008. Despite his empirical data, conclusions and recommendations, it would seem that the situation remains largely unchanged today.¹

Fairness is the overriding factor to be taken into account in the construction of any test and even more so for one that may have far-reaching implications for an individual or a specific community. If the credibility of the test or tests is doubted, the ICAO Standards and Recommended Practices (SARPs) themselves may be questioned or labelled unfair. Kim and Elder made this observation in their research with Korean test-takers in the aviation context.

“Questions of justice may arise when the construct espoused by a particular policy, and reflected in tests used to implement this policy, fails to reflect the real-life situation or to accord with the views of relevant stakeholders” (2014, p.2).²

We can hear echoes of inequity, even anger, in the voice of this pilot who was interviewed about his LPR test-taking experience for academic research purposes³:

“I was really disappointed when I got this level 4 with the (examining authority) and it didn’t seem um really fair you know about the way they assessed our English level, it was not really fair. And I had the explanation of my level 4 and not a level 5, that I would need for entering a company for example, because everybody is asking a level 5 today, and the only explanation they gave me was that I was not experienced enough in the field of aviation to have this level 5. But it didn’t have anything to do with my English level, in fact just my aviation experience and it didn’t sound really fair to me and I know that other colleagues said exactly the same thing..it’s a pity.”

So are aviation English tests fair? Do they address the issue of improving radio communications by ensuring that aircrew and air traffic control personnel are capable of operating in a common language safely and effectively?

First and foremost these tests are proficiency tests. That is to say they should not be linked to a set training course. The ICAO document 9835 is very clear on this point:

Proficiency testing is different from progress or achievement testing in that proficiency tests do not correspond directly to a training curriculum. That is, it should not be possible for test-takers to directly prepare or study (by memorizing information, for example) for a proficiency test. Proficiency tests require test-takers to demonstrate their ability to do something representative of the full spectrum of required knowledge and skills, rather than to simply demonstrate how much of a quantifiable set of curriculum learning objectives they have learned. (6.2.5.4 )

Some language schools or training programmes offer a training course with the LPR “exam” at the end. The shorter the course, the more it could be said to deviate from the guidelines laid down by ICAO. Having said that, there is nothing to suggest that these establishments are deliberately disregarding ICAO’s direction. It may be that the criteria for aviation language testing have not been universally understood and their implementation is not being regulated closely enough.

Another feature of proficiency tests is that they should be kept within the bounds of what language test developers call “constructs”. Constructs are related to, and contained in a central theory, a concept of what the language ability we are looking to assess is. Let’s take an example of testing an individual’s oral fluency. Items in such a test need to scrutinise the different elements that correspond to a preconceived idea of what features fluency consists of. In practical terms this might include appropriate tempo, rhythm, use of discourse markers etc. If the construct of the test has not been clearly defined at the outset, there is a real danger that the candidate may face items or tasks that target irrelevant skills rather than the ones that we are looking for.

An illustrative, and sadly quite common case might be where an aviation English test of listening comprehension requires a test taker to 1) watch a video or listen to a pre-recorded text, taking notes if he or she wishes and then 2) answer questions presented after the video or text has finished. This sort of task relies heavily on almost super-human short term memory and note taking skills: Expert listeners may understand the video/text perfectly in real time but could fail unfairly simply because they do not recall or have not noted down the specific information required to answer the questions.

There should also be a relationship between the test tasks and the language use in a given domain. This is a difficult requirement to satisfy in the context of aviation English language testing because it often conflicts with another principle of language assessment: practicality. Hopefully, the irreconcilability of presenting an ATCO with the task of reporting a hydraulic failure or explaining the aircraft electrical system will be obvious for those of you involved in aviation language testing and training. Whilst some ATCOs may have a keen interest in aircraft systems, the pressing operational question for them is ‘how will this problem with the aircraft affect its climb / approach / landing …? They do not usually use the type of language that such tasks necessitate. Notwithstanding the additional cost, tests for ATCOs and pilots should be different, each developed to correspond to their individual work contexts. However, as Latitude’s Managing Director, Henry Emery, has argued, the list of variables could be long⁴. There are many different circumstances that may differentiate one pilot’s professional experience from that of another depending on, amongst many things, aircraft type, aircraft role, geographical and climatic conditions. Accordingly, test developers should be aware of items referring specifically to fixed-wing aircraft which may be presented to helicopter pilots or scenarios featuring sandstorms that may completely confound a B737 Captain from Iceland.

Raters measure a test taker’s performance in an aviation English test using a predetermined set of criteria, or descriptors, that form the ICAO rating scale. The descriptors help the rater make inferences about the speaker’s English language abilities and quantify his or her performance in order to decide which overall level to award. To have faith in a certain test we should be able to see that those levels correspond to real life aviation language proficiency in the cockpit or the control room. When this isn’t the case it can stir up ill feeling as this dialogue, taken from an interview⁵ with an airline pilot, shows:

Pilot: And in fact I think there are some difficulties even if we are supposed all to have ELP(English Language Proficiency) 5. I am sure that some guys are ELP 6 but in the real life it’s very difficult, do you understand that? From my point of view what (nationality) examiners in X (name of company) or civil aviation authority, they call a level 6, it’s a level 8 or 7 in (country) (laughs) or something like this……. I don’t know what is the expectation really.

Interviewer: So you're saying that some of the (nationality) pilots had a level 6 but you didn’t find their level any different to yours really?

Pilot: Yes much more between 4 or 5 like me, yes something like this.

A well-designed aviation English test should not only look like a good test, it should also ensure that it respects the conditions detailed above. Only then can we say that a test is valid - put in its simplest terms - it tests what it’s supposed to test.

Having meaningful results is not the only consideration though. Test developers and administrators should also ensure that candidates are presented with a homogeneous set of tasks and have a similar experience to their peers. Reliability and validity are the two fundamental principles of any test. Whilst we need to accept that procedures involving subjective judgement can never be a hundred percent perfect, it is important to strive for the most consistent test instrument possible so that we can be confident that its results are accurate.

In reality, one particular aviation English test should not be easier than another. If it is, perhaps there is something wrong somewhere. It may be, for example, that the majority of test items have been written with a low ICAO level 4 in mind, which means that more proficient candidates will pass the test with flying colours. Yet, how do we know if they are performing sufficiently to meet a level 5 standard? Without more challenging tasks the test cannot discriminate effectively between the levels of proficiency. Alternatively, the Language Proficiency Organisation (LPO) raters might lack initial and regular harmonisation training leading to incongruous scoring. The testing environment, delivery method and conditions can also impact significantly on the reliability of aviation English tests.

Although steps have been taken to improve the quality of language proficiency testing for aviation professionals, there is arguably some way to go. ICAO warned that “(I)nadequate aviation language testing can result in either serious safety gaps or have highly negative social and economic consequences” (6.2.2.1). Inadequate can mean too few. Nevertheless, a quick browse on the Internet can produce any number of test providers offering the “ICAO English exam”. Inadequate also means not good enough or unfit. This is the definition that we should be wary of. A test that is unfit for purpose is probably worse than no test at all.

¹Alderson, J.C. (2010). A survey of aviation tests. Language Testing, 27(1), 51–72.

²Kim, H. & Elder, C. (2014). Interrogating the construct of aviation English: Feedback from test takers in Korea. Language Testing, 1– 21, 2014, p.2.

³Howard, A. (2016). Investigating evidence of change in English language proficiency in L2 pilots and its significance in the context of aviation English testing in Europe. Dissertation written in the course of a Masters in Language Testing, Lancaster University (unpublished)

⁴Emery, H. J. (2014). Developments in LSP testing 30 years on? The case of aviation English. Language Assessment Quarterly, 11(2), 198-215.

⁵Howard, A. (2016). Investigating evidence of change in English language proficiency in L2 pilots and its significance in the context of aviation English testing in Europe. Dissertation written in the course of a Masters in Language Testing, Lancaster University (unpublished)