19

A spaceship is headed for Earth. We detected it with a few months' warning. Messages have been sent to and received from the ship by radio, but we haven't deciphered anything they've said, and don't believe they've deciphered anything we've said. We know, from the radio messages, that their language is human-pronounceable.

The government assembles a team of linguists to prepare for when they land. What type of plan are the linguists likely to come up with for establishing initial communication?

John99
  • 307
  • 1
  • 3
  • 9
    The type of plan that the linguists in your story are likely to come up in your story is the type of plan which makes sense for your story. (And if we established communications and yet after a few months we still haven't understood a single word it means that they are either incredibly dumb or positively hostile. Let's see if you understand some Ancient Greek words. Beep, enah. Beep-beep, dyo. Beep-beep, beep, triah. Beep-beep, beep-beep, tesserah. Beep-beep, beep-beep, beep, penteh. What does the word tesserah mean in Ancient Greek?) – AlexP Aug 14 '23 at 00:06
  • Mutually intelligible communication is eventually going to be established (after they land), without anything seriously bad happening (misunderstandings leading to hostilities etc.). So the details of how it's established are a matter of indifference to the broader story arc. But I'd like the details to be plausible, given the assumptions linguists would have about how to overcome a barrier between totally unrelated languages. – John99 Aug 14 '23 at 00:21
  • It means four. The people on the ship come from a planet where everyone speaks the same language; they have no concept of different languages existing and therefore aren't anticipating a language barrier. Their assumption is that something must be wrong with their receivers causing our messages to be garbled. Of course we don't know any of that in advance. – John99 Aug 14 '23 at 00:21
  • 1
    https://en.wikipedia.org/wiki/Voyager_Golden_Record – KEY_ABRADE Aug 14 '23 at 00:49
  • 5
    @John99 do the people on the ship have to learn how to speak their language, or are they born knowing how to speak it? – Monty Wild Aug 14 '23 at 01:44
  • 3
    Moreover, does the alien language not drift at all, over their recorded history? They should have at least some concept that language isn't innate and universal, esp. among aliens. – BMF Aug 14 '23 at 02:24
  • https://en.wikipedia.org/wiki/Story_of_Your_Life – Mark Aug 14 '23 at 15:07
  • 2
    In support of @AlexP's comment, a fairly good example of how a story can be crafted around the idea of first contact is the movie Arrival. But the real problem you're going to run into is that we humans have no honking idea what the alien language will be like. We don't speak porpoise or whale yet, after all. Therefore, there won't be a universal plan. Of course, if there was, NASA would have it. Out of curiosity, did you contact NASA and ask them if they had one? Stranger things have come out of those hallowed grounds. – JBH Aug 14 '23 at 17:57
  • 2
    I would suggest you ask this instead on Linguistics Stack Exchange. – nearsighted Aug 15 '23 at 01:04
  • 1
    Record everything they say and feed it into ChatGPT. It probably won't understand it, but it will learn to bullshit convincingly. Of course, it may start an interstellar war... – Simon Crase Aug 15 '23 at 01:30
  • @BMF It's almost plausible that the aliens have little concept of the history of their language - if they do not have FTL, and if because of that they are basically test-tube babies grown in artificial wombs, machine-educated and meant to colonize an uninhabited system. The history of language then might not be on the short list of things to get up to speed on before initial arrival. Except, why would they then have radios? – Jedediah Aug 15 '23 at 03:11
  • A lot depends on how the aliens intelligence compares to ours and what their purpose is. Is it just a case of translation or are there deeper issues? Do they want to make contact? Or are they on some other mission and simply show no interest. – Slarty Aug 15 '23 at 07:16
  • I believe every issue in this question has been discussed and addressed in this other question. See if those answers help. – Vogon Poet Aug 15 '23 at 12:29
  • @VogonPoet Your belief is wrong. That other question addresses a completely different scenario (deciphering a single static message vs. deciphering a language interactively with the speakers), which both permits and requires completely different approaches. – Logan R. Kearsley Aug 15 '23 at 14:47
  • 6
    "We know, from the radio messages, that their language is human-pronounceable." -- how? Radio, both analog and digital, doesn't follow some kind of natural law that is trivial to deduct. Here is a quick glimpse into how tricky it gets - and that's from a known-human source: https://witestlab.poly.edu/blog/capture-and-decode-fm-radio/ – Tom Aug 15 '23 at 15:55
  • 7
    Yeah, if we know that their language is human-pronounceable from their radio transmissions, it's almost certainly because they've worked out our transmission protocols and are deliberately transmitting in the clear to let us know that. – notovny Aug 15 '23 at 22:01
  • 2
    Yeah, even being able to confidently say that their radio transmission is an encoding of audio is a huge step towards mutual understanding. As opposed to a (relatively) direct radio encoding of written language, or some more complex digital protocol that might contain language somewhere (i.e. when you send an email over radio your email content might be encoded into IMAP messages which are encoded in TCP/IP packets which are finally encoded in actual physical radio waves...), or digital messages that have nothing to do with language. And that's not considering any actual encryption. – Ben Aug 16 '23 at 02:55
  • I’m sorry but I agree this question is asking to fill a blank spot in a story rather than define a world aspect. Vote to close. Better to describe your arrival preparations and aliens, and ask how to reach a desired outcome (they declare war, they eat us, we share “music” with them, we eat them, etc.) – Vogon Poet Aug 17 '23 at 14:43
  • While it could use more specific information, I interpret this question as asking how, given an alien visit in current times, humankind would deal with the language barrier. This doesn't require to be interpreted as asking for a story-based answer: it's similar to questions asking for example how, given the current situation, celestial bodies react or adapt to a change. VTR. – Joachim Aug 31 '23 at 13:35

4 Answers4

28

Sadly, you're just a couple of months too early to read an upcoming volume of scientific papers specifically addressing the issue of xenolinguistics and theoretical issues with fieldwork on alien languages....

In the meantime, a lot depends on what the aliens look like. The fact that their language is pronounceable by humans is an amazingly unlikely and super convenient coincidence which means that all of the tools of monolingual fieldwork can be brought into play. The movie Arrival does a good job of illustrating the basics, in a much more challenging situation than what you have described. However, the more human-like the aliens are, the smoother it will go, as the linguists involved will be able to rely more heavily on things like assuming that the aliens will have analogous words for analogous body parts and understand pantomime in similar ways as us. If their bodies are radically different, there will have to be a longer stage of figuring out how to understand their body language, and teaching them to understand ours. Note that this is still an issue for human-targeted fieldwork anyway, as gestures are not universal, but it will be a bigger issue when you're not at least starting with bodies of the same shape.

It can be assumed that the aliens can hear at least the same bands of frequencies in which they voices exist, but not outside that. Similarly, no assumptions can be made about their other sensory capacities. Maybe they see a different spectral range from us. Maybe they don't see color. Maybe they don't see at all. Maybe they pay more attention to scent than we do. To cover all of those options in preparation for elicitation work, teams of researchers would be compiling a wide variety of different types of stimulus materials to determine what the aliens can distinguish and what they care about as groundwork for more targeted elicitation later.

Absolutely everything that the aliens say should be recorded, in the highest possible fidelity, so as to capture any distinctions that may not be obvious to not-yet-trained ears. This will be accompanied by video recordings to provide context for each utterance. For everyone interacting directly with the aliens, there will be five or ten who just do data analysis on the recordings that come out of elicitation sessions. While none of these are done in isolation, and elicitation experts will figure stuff out about multiple organizational levels at once, the first issue for analysts will be identifying contrasting phonemes--which might be possible ahead of time based solely on the corpus of transmissions--then establishing a transcription convention, identifying "words" (roots, collocations, idioms, etc.), and then finally building up successively more complex levels of grammar.

And while nobody will strongly expect anything to come of it, someone is going to try zero-shot learning by producing an embedding vector space on the tokenized corpus of recorded alien speech and trying to correlate it with equal-dimensional word-vector spaces for major human languages. It's relatively cheap, and hey, you might get lucky.

Edit, to explain the last paragraph:

Zero-shot learning: learning to classify inputs that belong to categories the learning system has never examples of before, based on correlating knowledge from multiple other sources. I.e., a zero-shot image classifiers might be able to correctly identify pictures of zebras without ever having been trained on zebras because it knows what stripes are, and knows what horses are, and has been told that a zebra looks like a striped horse.

Embedding vector space on a tokenized corpus: this is how LLMs, like ChatGPT, encode their inputs. It's a way of being able to do math on words. Basically, you come up with a method of splitting a collection of texts (a corpus) into discrete tokens (letters, words, or whatever happens to work), and then you compute a list of numbers--a vector--that represents each of those tokens based on the other tokens that it occurs in context with. The position of the resulting vectors in higher-dimensional space often correlates with useful semantic features of the tokens.

Correlating word vector spaces: zero-shot learning for machine translation is done by producing embedding vectors for multiple languages, and then looking for clusters of points that have the same shapes in each model. If you assume that the matching points are translation-equivalents, then that gives you a way to convert a semantic vector from one model into a semantic vector from the other model, and start translating languages without ever having seen a parallel text.

This technology is only proven to work at all when starting with extremely large data sets of relatively closely related languages, but it is being seriously researched to see if can be extended to provide cheap machine translation for less well-documented languages and even to decipher animal communication, like whale songs.

Logan R. Kearsley
  • 42,311
  • 4
  • 96
  • 178
  • 5
    could you repeat the last paragraph in english please? – ths Aug 14 '23 at 14:02
  • 1
    @ths: The last paragraph of this answer is obvious technobabble. It is sort of the point: we are speaking of a science-fiction story, aren't we? – AlexP Aug 14 '23 at 18:29
  • 18
    @AlexP No, it isn't. It is a real technology that is being developed for machine translation of human languages, and is being investigated for deciphering cetacean communication. – Logan R. Kearsley Aug 14 '23 at 20:52
  • 8
    The gist is: give a machine a bunch of data, it makes up it's *own* categories for it and then starts sorting everything into them. At the end you get a string of numbers (called a "vector") that represent how closely something matches each category. To use the zebra example, say there were four categories, black, white, horse and beans. From that, you might get [0.3, 0.3, 1.0, 0.0] for "a bit of black, a bit of white, all horse, and no beans". Then run it in reverse and out pops something human-readable. This is basically ChatGPT in a nutshell (just with *waaay* less categories). – Samwise Aug 14 '23 at 22:51
  • 5
    @AlexP that is not technobabble, but you need to know linguistic theory to make sense of it. In plain English: someone will try to correlate recorded alien speech to recorded human speech in a very mathematical way to see if there is a close match to any human language. – The Square-Cube Law Aug 14 '23 at 22:59
  • 2
    I so hope that I'll live to see the day when we crack communication of (or with!) sperm whales... – fgysin Aug 15 '23 at 08:40
  • What exactly does “fieldwork on alien languages” mean?? No, don’t answer that. – Vogon Poet Aug 15 '23 at 12:17
  • 3
    It means exactly what it says on the tin. But if you want the long answer, wait a couple months and you can read Claire Bowern's chapter in https://www.routledge.com/Xenolinguistics-Towards-a-Science-of-Extraterrestrial-Language/Vakoch-Punske/p/book/9781032399591 – Logan R. Kearsley Aug 15 '23 at 14:45
  • 2
    Just because that paragraph bears no obvious relation to any known language, it doesn't make it meaningless technobabble - we just need the right techniques to decipher it. – James Bradbury Aug 15 '23 at 17:11
  • 1
    I think it says a lot of our possible strategies when we have yet to decipher cetacean communication, a species which we will be far more familiar with than aliens. – Passer By Aug 16 '23 at 04:16
  • Word embeddings (and for that matter, phonetic embeddings) are created not with zero-shot learning, but by modeling the correlations of words from a large corpus. For example, take a sentence from the corpus, remove a word ("I have the high X"), and ask the model to predict the missing word ("ground"). – Passer By Aug 16 '23 at 04:20
  • @PasserBy Yes, but creating word embeddings is only half the problem. The zero-shot learning component is correlating the embeddings between two independently-trained models, allowing the system to predict translations without ever having been trained on any translation pairs. – Logan R. Kearsley Aug 16 '23 at 04:57
  • 3
    That approach has so far been demonstrated to "work" (depending on what level of accuracy you decide is the boundary between "working" and "not working") for English-French and English-ASL language pairs. – Logan R. Kearsley Aug 16 '23 at 04:59
  • Ah, I see what you mean now. I think your answer reads as if suggesting the embeddings are generated by zero-shot learning. – Passer By Aug 16 '23 at 05:19
  • 3
    I may have missed that point, but what about inverting the POV ? Considering the POV of the aliens incoming, if they are capable of intergalacting voyage, maybe Earth isn't the first civilization they meet, and maybe they developed tools and methodologies to communicate in those cases ? – Mouke Aug 16 '23 at 07:59
  • 2
    @Mouke Per clarifying comments on the initial question, the aliens no experience with other languages. But, if they did, the best we could predict is that they would probably use similar fieldwork & decipherment methods as we do. – Logan R. Kearsley Aug 16 '23 at 14:15
  • Has anybody tried to apply this technique to Etruscan? We do have a reasonably sized corpus, and we can read the letters themselves (because they are mostly the same as the good old Roman letters). The Etruscans even had the foresight to do word segmentation for us (with interpuncts). Have they met with any success in understanding Etruscan? Has the success been confirmed by linguists? If not, why not? – AlexP Aug 17 '23 at 10:29
  • @AlexP Not that I know of, but I wouldn't really expect it to work in that case. The surviving Etruscan corpus is only a few hundred thousand words, which isn't large at all for these purposes. – Logan R. Kearsley Aug 17 '23 at 15:18
15

Start with METI

Take a look at the Arecibo Message, the Voyager Golden Disk, and other METI (Messaging to Extra-Terrestrial Intelligences) that humanity has already done. These projects have already established a baseline on how to attempt communication with an unknown intelligence that shares no common heritage with humanity. There's too much for me to type out here, but try to understand why each of these messages (and each part of each message) were built in the way that they were.

Understand the basics of Cryptology

Linguistics is important for you the author to understand, but you should do at least a brief study on cryptology as well. This is essentially cryptography of the highest order, messages encoded in a medium that you have no context for. Not a cipher of a shared alphabet, not an ASCII message encrypted in binary data, or a secret message hidden in a plain-text letter. Understand the basics of where to start decrypting a message when the encryption method is unknown, and that will help you understand how we would approach this problem.

Establish a method of communication

As other answers have said, you will have to figure out a communal channel to communicate. Maybe they don't hear in a similar way as us, or use audio vibrations through a gaseous medium to 'talk'. Maybe they don't see similar wavelengths or use written symbols to 'write'. Do they feel vibrations like we do? Could we tap on them to create a 'morse code'? Some answers will be evident from the fact that they have a machine (the spaceship), what it looks like, and how they interact with it. At a minimum, they can perceive the material it's made out of, we could find some similar material and arrange pieces of it in a weird order, and assume that they'll be aware of how many of them there are. Consider what other mediums we could assume that they do perceive, or might perceive. How can we confirm that they understand that medium (pro-tip: prime numbers aren't accidents)

With a Cooperative Partner

Assuming that the aliens desire to communicate as well, have established a medium of communication, and are actively engaged in a back-and-forth effort, the best way to start is to establish a shared ontology for the fundamental properties of our universe. Maths for starters; establish communal symbology for numbers, counting, algebra, and primes. With numbers, you can communicate about physics and the elements, and establish a shared symbology for chemistry.

From there you can further abstract. Represent the Earth's position in the Solar System and our planets, map Earth, and discuss geology; planetary science should hold constant across space to where they came from. Discuss the Galaxy, and try to figure out which Solar System they came from. Represent and discuss DNA (do their lifeforms have a similar structure for genetic information?)

At this point, there are enough common references that you're close to learning a second language without knowing a shared language. How would you help an exchange student from Japan to learn English when you don't know any Japanese?

AeroSigma
  • 501
  • 2
  • 7
  • Once you have math you can extend that to physics and chemistry as the aliens will be operating with the same reality that we do. As for that exchange student--it wasn't exchange, student, or Japan but I've expressed concepts in chemistry many times to overcome a language barrier as it was faster than the dead-tree dictionary. – Loren Pechtel Aug 17 '23 at 02:10
11

For starters, if they were smart they would talk to people who have deciphered languages of other humans. People from civilization regularly run into primitive tribes in Africa or South America, etc, who have not previously been in contact with the outside world. They usually manage to establish communication pretty quickly.

I once talked to a missionary who was the first to contact such a tribe. He said that when he walked into their village, he was carrying various equipment they weren't familiar with, canteen and tent and radio and whatever, and so they crowded around and said "what's this? what's that?" So then he went around the village pointing at various things and asking "what's that?" and was able to learn words for many common items.

Of course a lot depends on how similar the aliens are to us. If they eat food and wear shoes and so on, building up a list of words for common objects might be fairly easy. But if their physiology, or even more important, their way of thinking, is very different from us, the challenges quickly mount. Does their language have nouns and verbs and adjectives? Or do they just not think in those terms at all.

I'm reminded of a science fiction story I read once where a human meets aliens. Somehow they are able to talk -- that wasn't explained. But the aliens ask him, "How do you hear?" He replies, "With my ears," and points to his ears, and tries to explain how they work. The aliens are baffled by this response. So he asks them how they hear. And one of them says, "I hear of my home world, of ground and sky."

I thought it was a good scene. Perhaps aliens would be so alien that they would just not think like us, and such basic questions would get totally different answers from what we would expect.

Or not. One could also speculate that logic is inherent in the nature of the universe and if aliens are capable of building technological devices like spaceships, they just MUST think about science and technology in essentially the same way we do.

As we have no examples of intelligent aliens to work with, we just have no empirical knowledge and can only speculate.

Jay
  • 14,988
  • 2
  • 27
  • 50
  • I was about to make a comment along the lines of the second to last paragraph, until I got to the second to last paragraph... Good job. – Jedediah Aug 15 '23 at 03:04
  • 1/10001001 is a surprisingly good entry for math assuming the aliens have starships. – Joshua Aug 15 '23 at 19:16
  • Is the missionary you talked to Daniel Everett? His book, Don't Sleep, there are Snakes, could be of great interest to OP. – Omar and Lorraine Aug 16 '23 at 09:28
  • @Lorraine It was decades ago, I'm afraid I don't remember his name or even where I met him. But I'd guess there are dozens, maybe hundreds, of missionaries and anthropologists who could share similar stories. – Jay Aug 18 '23 at 04:44
4

Start with maths and work from there. 1+1=2 will be the same to an alien, just different symbols. So it gives us a point of reference and a bunch of concepts known to both sides to begin with.

Then look at getting hold of their version of a childrens primer and move forwards.

Unlike earth languages, I don't think the audio methods would be better than the visual. It's unlikely we'd produce similar sounds, and those are just a beginning anyway. Whereas we can communicate in mutually defined symbols.

Chinese script is an example. Old Chinese each character was a word. It doesn't matter what language you applied it to it had the same meaning in any language. So widely different languages can use the same symbols for the same concept.

Kilisi
  • 26,524
  • 1
  • 36
  • 104
  • Chinese script is useful for writing Chinese. It cannot be used to write a different language unless specifically adapted. For example, Japanese script uses many of the same characters as Chinese, but when writing Japanese those characters (1) are supplemented with home-grown kana, and (2) they have different phonetic and semantic meanings. Only some very few ideographic characters can be used trans-lingually; for example, 93 means than same thing regardless if it is read ninety-three, dreiundneunzig, or quatre-vingt-treize. But the vast majority of words cannot be neatly mapped like this. – AlexP Aug 14 '23 at 15:52
  • @AlexP China has hundreds of languages not just one. From at least 9 distinct language groups. – Kilisi Aug 14 '23 at 19:23
  • Yes, it does. And each of them has its own writing system. You cannot write a sentence in Mandarin and read it in Cantonese. Yes, both Mandarin and Cantonese use Sinitic characters. Yes, many of those characters are used by both languages. No, they are not read the same, nor do they mean the same, nor are they combined in the same ways. – AlexP Aug 14 '23 at 19:49
  • 1
    @AlexP from Wikipedia on Chinese languages "share the same writing system (Hanzi) and are mutually intelligible in written form." This isn't all, your example is right, but enough to prove my premise is sound. – Kilisi Aug 14 '23 at 19:52
  • Both English and Polish share the same writing system (Latin letters). I have no idea why Wikipedia would say that they are mutually intelligible in their written form. There is a degree of mutual intelligibility of written texts, but it is far from complete. Maybe they are thinking of Classical Chinese, which was used as the imperial written language? (And, in addition, Cantonese is usually written with traditional characters whereas in the People's Republic Modern Standard Mandarin is written with simplified characters, which makes it really awkward for the average person to read...) – AlexP Aug 14 '23 at 19:56
  • @AlexP not something I'm going to argue about, feel free to look it up. I think where we differ is you're looking at the mainstream Chinese languages, which long had their own script rather than the more marginalised ones of which China has hundreds. But it's just an example, symbols can be used to communicate quite complex ideas regardless of language, especially mathematical ones. – Kilisi Aug 14 '23 at 21:23
  • 4
    @AlexP Mainstream Chinese languages are mutually intelligible too. I've been learning mandarin for a few years and at some point I was looking for correspondents; one of my correspondents turned out to speak Cantonese, not Mandarin. So we had text-message conversations where she wrote Cantonese and I wrote Mandarin, and we understood each-other. Of course I could never read a newspaper article meant to be read by Cantonese speakers, but a text-message conversation is fine. – Stef Aug 14 '23 at 21:50
  • Old Chinese each character was a word. There are still many symbols like that. It doesn't matter what language you applied it to it had the same meaning in any language. – Kilisi Aug 15 '23 at 05:24
  • 1
    Beijing standardized the writing system long ago. When two Chinese speakers run into a dialect issue (most commonly Mandarin/Cantonese) they resort to writing because it will be the same even if the sounds are different. Finger on palm, it doesn't even need a piece of paper. Chinese movies are frequently subtitled in Chinese for this reason, also--it works in both the mainland and Hong Kong. Furthermore, my wife has taught herself to get by in Cantonese from such subtitles--she's a native Mandarin speaker. – Loren Pechtel Aug 17 '23 at 02:18