Decoding without a crib

Translations utterly disconnected from the speakers of the translated words

Then they can't be translations - but they just might turn out to be right. Herein I expect to present a program that takes what is assumed to be a rhetorical manifestation - at least, we are assuming that it is communication and that we can say what separates words and sentences - and returns a story. It'll guess, not unintelligently, certain parts of speech, give these parts of speech arbitrary semantic values, and construct a (very much shorter) summary, one that is, if not "right," at least "not wrong" - not presenting any obvious internal inconsistencies. Much of the incoming corpus will be ignored, but only because there is no evidence to include or exclude it. It will be frankly if implicitly admitted that we just don't know what to do with it.

Yet.

The approach will be as follows:

Zero in on two-word sentences, which will be imagined to comprise:

noun-noun (indicating equivalence..."simple copula," as the grammarians say)
noun-adjective, or the reverse
noun (subject)-verb, or the reverse
conjugated verb-adverb, or the reverse
preposition-noun, or noun-postposition
adverb-adjective, or (maybe this is an unfailingly triumphant deep-space rhetorical device) the reverse.

For each of these, tote up intolerable inconsistencies, and go with what is, overall, the least offensive.
Try to construct a dictionary, one whose entries are merely the word and its presumed part of speech.
Based on that, and on observed frequencies, write a little story with "nouns" as is, verbs rendered as "does a [noun]-oriented thing," and adverbs and adjectives like "in a [noun or verb]-agreeable manner."

Sentences of other lengths will be significant, though I am not yet sure how. One-worders: what could they mean? Well, they might be answers to questions. Can we distinguish questions? This is one way. My guess is if you see a sentence followed by another sentence with nearly the same words, and in slightly different sequence, the former is a question and the latter its exasperated, pointedly labored answer.

(Or what you're looking at is an argument.)

Another thing that might be guessed: numbers, but only if periodic readings of something are emitted. If a sentence is repeated many times, differing only in one word or phrase, that one word or phrase could be a number. "Zero" may be the first number deciphered, if what is being read out is the exhaustion, and continued absence of, something. And another thing which can ONLY be guessed: technical terms common to two otherwise non-overlapping fields. On Earth, the words "piston," "cylinder," and "sparkplug" appear thickly among themselves; "pentane," "hexane," and "heptane" likewise crowd in their own separate world; but "octane" has a big foot planted in both. Word frequencies can show us this sort of thing, but never tell us what the thing means. The back end of this program will simply have to make something up.

But then all the output will be made up, and subject to complete reappraisal with every new incoming transmission. A big grinding job, but that's what computers are for.

To arms then. In the meantime, here's an idea I will give to anyone who wants to write a short story. A planet sends out "language." We have no idea what it means, but we can measure the incidence of words. One appears progressively more frequently. As it does, transmissions themselves become progressively more frequent, and longer. There is, we imagine, some urgency about...something. But what? We never find out, because one day the planet explodes.

UNDER CONSTRUCTION!