Features and Limitations
- This translator is "rules-based": it depends on words being where and how Turkish grammar
expects them to be. This is in contrast, perhaps great contrast, to the preferred practice
in machine translation, which is "example-based." This latter requires the existence of
two very large corpora, as they are called, one in the source language and the other (ideally) a
sentence-for-sentence mirror in the destination language. The idea is that you find in the
source corpus the sentence you've been given, or something close to it, and then fish up
the parallel sentence in the destination corpus, and there's your translation. If such corpora
exist for Turkish and English, they are unavailable to me, and anyway, even if they were, any Turkish-translating
"machine" would have to capitalize on the great straightforwardness of this language's
"rules." Even if you could rig a straight example-based system, you'd want the rules too,
because in Turkish - and this is what makes it so fascinating and fun - these supply so much
useful and often unambiguous information about what word connects with what. In an example-based
system, you'd need that grammatical facility anyway, because not every "example" is or can be
available in the Turkish corpus.
My sources for "the rules" are Hugo Turkish in Three Months
(Bengisu Rona, 1989) and Turkish Grammar (G.L.Lewis, 1988). But my understanding and implementation
of the rules are entirely my own, as are all misunderstandings and malfeasances. Don't blame these books: they
are great, and you should read them in their entirety. Turkish whether or not you speak it, like baseball or
football whether or not you play it, writes well.
- The other chief shortcoming of this translator, the one which really keeps it from everyday
use as such, is the deficiency of its lexicons. The glossary shows all the words I have
troubled to make available, and that's not very many. This'll improve, but it'll take time:
I'll have to scan an entire dictionary and reformat every single entry to be readable by my
own program.
- Since well below 100% of all Turkish words are represented in these lexicons, I haven't
done the next obvious thing, which is to feed it stuff straight from the Internet and see
when it chokes. I can, however, guarantee that all the sample sentences provided will
return intelligible translations.
-
Speaking of lexicons, I have provided ones for irregular English preterits and participles,
that being pretty easy to arrange. You'll never see "goed" or "comed."
-
Other irregularities have not been dealt with. You may see "most good." And verb-noun agreement
is not guaranteed. Expect to see a lot of "is/are" and "he/she/it/they" waffling.
-
You may type in your own Turkish sentences, or contribute, by means of typing, your own
supplements to the samples. If what you type is in the glossary and is grammatically correct,
I THINK it'll work. Who knows what you'll do, though? I have endeavored to, in
programmer parlance, handle all exceptions, by which I mean the program ought not to crash;
but crash or not, you may not get a translation.
-
Something you should be able to do is type using all-English characters. There is a subroutine
to try alternate spellings. Thus if you typed agac when you meant ağaç, the machine
should figure out you meant "tree," as there is no other Turkish word within the possible
permutations. But if you type sinir ("nerve") when you meant sınır ("border"),
you are out of luck: the machine will leap on the first translatable spelling it finds.
-
Which brings me to another feature of this and probably most machine translators: no inter-sentence
assessment of context is attempted, and very little intra-sentence context assessment is done.
The latter is pretty much confined to conjunctions like hem...hem ("both...and") and particles like
kadar (as in xAnkara'ya vs. xAnkara'ya kadar vs. bu kadar, "to Ankara" vs.
"as far as Ankara" vs. "this much").
Further on the pointedly ignored subject of context, I might mention the verb oturmak,
which can mean both "to sit (on)" and "to live (in)." Well, which? If you say xSnoopy evinde oturuyor,
which do you mean? The machine can't know. And the only way it can even guess is if the programmer
has given it lists of All The Things You Can Sit On and All The Things You Can Live In, and the program
reviews both lists. These oceanic checks for what I think philosophers call categorical errors:
even the most mentally underprovided biped does them all the time, while
few if any computers do them at all.
-
I realize just now that I've made no provision at all for interjections. The reason I realize
it just now is, Turkish has quite a few which are multiword - or at least there are quite a few Turkish
expressions, like afiyet olsun, "Bon appetit," which I consider interjections. And then
there are the expressions which are single-word, like buyurun, "What'll it be?", whose
part of speech is strictly not "interjection" but are best considered as such. Whatever
you consider 'em, they ain't here. My own horseback vision is that really effective translation
software would assume all sentences to be interjections, and only after certain disproof
of this undertake rules-based processing.
-
The program accepts just one sentence at a time, though sentences within sentences are dealt
with, by which I mean quotations. IF they are enclosed within double quotes (""), that is.
Observe that a single quote or inverted comma has a specific grammatical
meaning in Turkish: xAnkara'ya, "to Ankara" vs. odaya, "to the room."
-
The only other punctuation considered significant for translation purposes is the comma.
In written Turkish its main job is to indicate the subject of a sentence has at last been
got out, and/or that the word just before the comma doesn't modify one after the
comma. That is how it is taken here.
-
Something that doesn't have any meaning at all in Turkish is that "x" business. It is there
to assist the machine, to clue it that a potentially untranslatable proper noun is on the
way. The glossary has only a limited roster of the obvious ones.
-
Vowel harmony matters to most speakers Turkic, but it mostly doesn't matter to this program.
If you say silahlersiz when you meant (and should have said) silahlarsız, the machine
should take it in stride. In point of fact, with this particular word, I DON'T know what
you should have said: silah being a loan word from a language with no care for
vowel harmony, it is possible that the first usage is acceptable and even correct.
-
There is in the code, but not at this moment accessible, a limited ability to assess what part of
speech an untranslated word is. In Turkish, nouns and verbs almost always take very distinctive
suffixes, and if the machine spots those, it ought to be able to identify and handle them as nouns and
verbs, without ever determining what they really mean. The feature is being withheld here because
I haven't fully tested it, and maybe can't. The number of words that are Turkish is fairly
large, but the number of words that aren't is infinite.
-
I experimented briefly with an interactive feature, in which the program would pause when
it came to an ambiguity the human user could help with. You'd tell it what you thought
was the right idea or meaning or sense, and it would then carry on. As languages go,
Turkish seems to present relatively few logical forks. But I dropped the idea fast.
Accommodating this stuff fattened the code fast, and I doubted any likely user wants it.
If you already know the right idea or meaning or sense, you probably already know Turkish.
-
Just guessing, but I think the most unreliable feature of the code is its treatment of
what Prof. Lewis calls izafet. A noun is said to be "in izafet with" another noun if it
can be construed as the latter's possessor. E.g., "John's room," or the two halves of "bedroom."
Those are simple. What's not simple in Turkish is that izafet pairs can be concatenated
(i.e., the latter half of one pair is the front half of another pair), nested (one pair
is inside another pair), or absent the front half (which could have been established in
some preceding sentence and therefore did not, in the speaker's opinion, need to be
repeated). There is also the matter of indefinite izafet, in which the front half does not
carry the readily recognizable genitive ending. (Like in "bedroom" - we don't say, nor
would a Turk say, "bed's room".) People can figure this out; machines, this machine at least,
can't, and with some reluctance I have eliminated code meant to infer indefinite izafet.
The only exceptions are çay bahçesi, "tea garden," and kuş gribi, "bird flu,"
which are not treated here in any izafet-related subroutine
but rather in a very specialized word-pair subroutine.
Izafet gets even more hairy when it comes to verbs, which can themselves be possessed
by nouns. In Turkish, "You Are Here" on a city map appears as Bulunduğunuz yer,
which literally means "the place where you find yourself." In Turkish, "you" OWNS "find yourself";
and as if that weren't strange enough, the result collapses to an adjective modifying the noun
"place." These are called (in English, at least by the abovementioned grammarians) object
participles, and I have labored considerably to get them translated because Turkish expression
depends so heavily on them. The example sentences given will, as I say, come out OK. But
beware otherwise.
-
A comparably dicey subroutine is that which handles conjunctions. The approach is to see what's
immediately west of a conjunction, assume that that is what's being conjoined, then rove
east in search of a similarly situated word. For adjectives and adverbs, this is hardly fraught,
but verbs and nouns present hazards, as they themselves may be but minor elements in what might
be called (in view of the preceding discussion of izafet) supernouns or superadjectives.
-
Just a few of the things that I just haven't got around to programming AT ALL:
- Imperatives
- The subjunctive...as in afiyet olsun!
- The adjectival ending -mtrak
-
Other quirks, which are at least reliable, this being a computer that's doing all the work:
-
Adverbs are made to go in front of verbs. Why? Seemed like a good idea at the time.
At least I stopped the machine before it split infinitives.
-
The Turkish equivalents of "there is" or "there are" aren't quite so: you will
see disagreement in number.
-
If you really want to learn Turkish, you won't be spending much time on this website: you'll
be in Turkey. Translation software knows where words go: it doesn't know what words do.