10^12: there are two compound words each having three parts....

anyname

New member
Joined
Jun 2, 2023
Messages
5
I'm an historian and there are two compound words each having three parts. Each part seems to be a short word from another language and all the words relate to 'military building'. In that first language, about 1% of the short words have that meaning. So the chances of each compound word would be .000001 in the second language. Would the two multiply the odds or is the second one at the same odds? Maybe the two compounds having one meaning are a chance of .0001?
 
I am sorry, but your question is incomprehensible. Please clarify.

There are two compound words, x and y. (Are both in language Alpha or one in language Alpha and the other in language Beta? Obviously, it would be best if we knew what specific languages are being discussed.)

The words x and y each consists of three parts: a, b, and c comprise x, and p, q, and r comprise y. Are a, b, c the same as p, q, r? If so, is there one match, two matches, or three matches? Again, specifying the words would make everything less vague.

All the words (x, y, a, b, c, p, q, and r) mean “type of military building.” So, for example, x might be “keep-castello-burg.” The words (a, b, c, p, q, and r) come from “another language.” How many languages are we talking about? Do all six words come from the same language.

At this point, it is unclear if we are talking about 3, 4, 5, 6, 7, or 8 languages. So when you say “in that first language,” which one are you talking about.

So in some language, approximately 1% of the words refer to a type of military building. Are you trying to calculate probabilities or odds. Sometimes you say “chances,” and sometimes you say “odds,” but the math you are doing seems to be based on independent probabilities. But what mathematics makes 0.000001 become 0.0001?

Furthermore, you seem to be assuming independence. But that is highly implausible. Languages have a history; they are not historically independent and thus not statistically independent. In fact, lack of statistical independence is a major tool in linguistics. If three of your languages are Castilian, Romansch, and Occitan, or Friesian, Schweizerdeutch, and Yiddish, you are going to find huge overlaps.
 
I'm an historian and there are two compound words each having three parts. Each part seems to be a short word from another language and all the words relate to 'military building'. In that first language, about 1% of the short words have that meaning. So the chances of each compound word would be .000001 in the second language. Would the two multiply the odds or is the second one at the same odds? Maybe the two compounds having one meaning are a chance of .0001?
We definitely need more information in order to be able to say anything meaningful. People often seem to think that probabilities can be worked out without knowledge of a situation, but everything depends on the details. Here, there are a lot of linguistic issues to consider, among other things.

As I read this, you appear to be asking for the probability that two different words would exist in a given language, both of which happen to be composed of three roots, all six of which are somewhat synonymous. Or something like that. It is not at all clear what probability you mean, because you have not asked the question very well.

But words don't assemble themselves randomly! I can't imagine a compound word whose three parts all mean the same thing, or even are in the same general category. So I would expect the probability of this happening at all to be zero.

The idea of two languages being involved (apparently the compound words are in, say, English, while all six parts are in, say, Chinese), and that you say the parts only "seem to be" words in another language, suggests that you may not even really be talking about genuine compound words, which are usually compounded of words in their own language, like "haircut" or "snowball", but perhaps about accidental alignments, like if it happened that "hair" and "cut" both sounded like different words for small mammals in Chinese. (Or that "ha", "irc", and "ut" were three kinds of tree.) That might indeed be random; but you'd have to know how many syllables in English even sound like anything in Chinese.

In any case, the ultimate question will be, does whatever probability you can come up with actually mean anything. If your goal is, say, to show that these languages are related, or that the words were intentionally made to encode some secret knowledge, or whatever, then you would need to state your hypothesis, and work out probabilities relative to some null hypothesis. The world is full of claims that some probability is so low that ... whatever. But with no context, such claims are meaningless. (And historians should know this, but probably don't.)

You may benefit from reading an article I've recommended several times here: Should Rare Events Surprise Us?
 
Chances of what?

As such I cannot figure out the "find" of this query. Please enunciate your question using numerical examples.
Hi khansaheb,
The chances of 1% of a vocabulary being a term in a foreign compound-word, where two other terms from that original language appear. All have a related meaning which limits them to being 1%.
For example, words numbered as 100, 200 and 300 in the vocabulary are joined to form a compound.
 
I am sorry, but your question is incomprehensible. Please clarify.

There are two compound words, x and y. (Are both in language Alpha or one in language Alpha and the other in language Beta? Obviously, it would be best if we knew what specific languages are being discussed.)

The words x and y each consists of three parts: a, b, and c comprise x, and p, q, and r comprise y. Are a, b, c the same as p, q, r? If so, is there one match, two matches, or three matches? Again, specifying the words would make everything less vague.

All the words (x, y, a, b, c, p, q, and r) mean “type of military building.” So, for example, x might be “keep-castello-burg.” The words (a, b, c, p, q, and r) come from “another language.” How many languages are we talking about? Do all six words come from the same language.

At this point, it is unclear if we are talking about 3, 4, 5, 6, 7, or 8 languages. So when you say “in that first language,” which one are you talking about.

So in some language, approximately 1% of the words refer to a type of military building. Are you trying to calculate probabilities or odds. Sometimes you say “chances,” and sometimes you say “odds,” but the math you are doing seems to be based on independent probabilities. But what mathematics makes 0.000001 become 0.0001?

Furthermore, you seem to be assuming independence. But that is highly implausible. Languages have a history; they are not historically independent and thus not statistically independent. In fact, lack of statistical independence is a major tool in linguistics. If three of your languages are Castilian, Romansch, and Occitan, or Friesian, Schweizerdeutch, and Yiddish, you are going to find huge overlaps.
Hi NotJeffM,
'a, b, and c comprise x, and p, q, and r comprise y. Are a, b, c the same as p, q, r?'
No and your query states that. The meanings 'relate' which is not 'a match'.
Yes the six are from one language and the compounds are in a different language.
My guess is that 1% repeated twice would be .000001. That's for three words being chosen. If the compound has a meaning that's 1% of the vocabulary, then two compounds would repeat that as .0001
It's orthodox dogma that the languages are independent. This is the reason for my attempt at calculation. It's to express the argument for contact and word-borrowing from the original language.
 
We definitely need more information in order to be able to say anything meaningful. People often seem to think that probabilities can be worked out without knowledge of a situation, but everything depends on the details. Here, there are a lot of linguistic issues to consider, among other things.

As I read this, you appear to be asking for the probability that two different words would exist in a given language, both of which happen to be composed of three roots, all six of which are somewhat synonymous. Or something like that. It is not at all clear what probability you mean, because you have not asked the question very well.

But words don't assemble themselves randomly! I can't imagine a compound word whose three parts all mean the same thing, or even are in the same general category. So I would expect the probability of this happening at all to be zero.

The idea of two languages being involved (apparently the compound words are in, say, English, while all six parts are in, say, Chinese), and that you say the parts only "seem to be" words in another language, suggests that you may not even really be talking about genuine compound words, which are usually compounded of words in their own language, like "haircut" or "snowball", but perhaps about accidental alignments, like if it happened that "hair" and "cut" both sounded like different words for small mammals in Chinese. (Or that "ha", "irc", and "ut" were three kinds of tree.) That might indeed be random; but you'd have to know how many syllables in English even sound like anything in Chinese.

In any case, the ultimate question will be, does whatever probability you can come up with actually mean anything. If your goal is, say, to show that these languages are related, or that the words were intentionally made to encode some secret knowledge, or whatever, then you would need to state your hypothesis, and work out probabilities relative to some null hypothesis. The world is full of claims that some probability is so low that ... whatever. But with no context, such claims are meaningless. (And historians should know this, but probably don't.)

You may benefit from reading an article I've recommended several times here: Should Rare Events Surprise Us?
Hi Dr Peterson,
'I can't imagine a compound word whose three parts all mean the same thing, or even are in the same general category'.
Yes that may be so. The expressions may have a single term and a compound of two terms. However, the meanings remain related, as 1% of vocabulary. They form names of locations where a government ship-wharf for gold-mining and army barracks were later built.
 
@Dr.Peterson

I cannot speak for historians generally, but, while in college, I considered becoming one. In general, historians have little need to study probability because there are so many variables and so few verifiable data points that the assumptions required to apply probability are absent. J. D. Hexter talked about the historians modals: “may have been” meaning “possibly was, subject to revision if additional, relevant data becomes available,” and “must have been” meaning “very probably was, subject to revision if additional relevant, data becomes available.” It is an unreliable historian who gives an unqualified “was” except about the best documented crudities of history. Carr said (approximately) that “history is a shell of facts surrounding a kernel of interpretations.” The interesting things in history are those that are plausibly inferred; deduction is impossible because we never know everything “wie es eigentlich gewesen.” Von Ranke was a trifle naive about the epistemology of history.

But the history of languages is an exception. There probability comes into its own. The data are extensive and multidimensional. In the history of language, validation of things like Grimm’s and Verner’s laws are backed by hundreds of data points. An historical linguist who does not know probability is a contradiction in terms. I greatly suspect that the OP is no more an historian than is my three-year old grandson.
 
I wrote 'historian' to indicate I'm not discussing this as a statistician would. I didn't claim to be an historical linguist.
There are no apparent sound shifts ( Grimm, Verner). Other evident loans are near-exact copies.
Can I reduce my query this way: If a collection has 1% being related items and a group of three contains just those 1%, what is the probability / chances / odds? That is, in everyday common-sense use of 'chance'.
If it happens twice, then is it two examples of the same probability or half as likely or is it inverse squared?
 
The chances of 1% of a vocabulary being a term in a foreign compound-word, where two other terms from that original language appear. All have a related meaning which limits them to being 1%.
For example, words numbered as 100, 200 and 300 in the vocabulary are joined to form a compound.
Yes the six are from one language and the compounds are in a different language.
My guess is that 1% repeated twice would be .000001. That's for three words being chosen. If the compound has a meaning that's 1% of the vocabulary, then two compounds would repeat that as .0001
It's orthodox dogma that the languages are independent. This is the reason for my attempt at calculation. It's to express the argument for contact and word-borrowing from the original language.
The expressions may have a single term and a compound of two terms. However, the meanings remain related, as 1% of vocabulary. They form names of locations where a government ship-wharf for gold-mining and army barracks were later built.

Thanks for partially answering the main question, which was, What hypothesis are you testing? Unfortunately, a lot still has not been stated clearly enough to generate a definite probability.

Probability arguments require precision; they are often misused. What exactly are you saying the "orthodox dogma" would imply about these names? The role of chance is not at all clear, without knowing details.

If you stated the entire story, someone might be willing to help; but as it stands, you are hiding too much from us.
 
Top