Thursday, December 08, 2011

Using Google Translate from VFP

Stonefield Query is a localizable application: all strings displayed to the user are stored in a resource file and we provide a resource editor that allows a developer to translate the strings into other languages. This means that someone has to do the translation into a particular language and keep it update to date when we release a new version, which as you can guess is a lot of work.

Inspired by Christof Wollenhaupt’s Googlefy Your Apps session at Southwest Fox 2011, I looked at using the Google Translate API to automate the translation process and allow us to translate into more languages than have currently been done. It actually turned out to be pretty easy.

First, you have to sign up for a Google account. The Translate API isn’t free but it isn’t very expensive: $20 for 1 million characters. After you’ve enabled the API, you’re assigned a key that has to be passed to the API on every call.

The API uses REST, which is a fancy of way of saying that the parameters are passed as part of a URL. Here’s an example:

https://www.googleapis.com/language/translate/v2?key=INSERT-YOUR-KEY&q=hello%20world&source=en&target=de

This tells the API to translate “hello world” (the encoded text in the “q” parameter) from English (“en” in the source parameter) to German (“de” in the target parameter). It returns the result as JSON:

{ "data": { "translations": [ { "translatedText": "Hallo Welt" } ] } }

To access the API from VFP code, use Craig Boyd’s VFPConnection library. Below is a function that does all the work. Pass it the text to translate and the source and target languages (spelled out, such as “English” and “German”) and it returns either the translated text if it succeeded, null if the language is invalid, or blank if the translation failed. This function supports all of the languages the Translate API supports. Note: replace the assignment to lcKey with your Google API key.

Automatic translation may not be quite as good as manual translation because it doesn’t necessarily use the same colloquialisms a native speaker would. However, it’s an excellent starting point; someone can use the Resource Editor to tweak any strings to the proper translation.

lparameters tcPhrase, ;
    tcFromLanguage, ;
    tcToLanguage
local lcKey, ;
    lcPhrase, ;
    lcFromLanguage, ;
    lcToLanguage, ;
    lcURL, ;
    lcResult, ;
    lcTranslate

* Specify the Google API key.

lcKey = 'PUT YOUR KEY HERE'

* HTML encode the phrase to translate.

lcPhrase = Encode(tcPhrase)

* Get the language codes.

lcFromLanguage = GetLanguage(tcFromLanguage)
if empty(lcFromLanguage)
    return .NULL.
endif empty(lcFromLanguage)
lcToLanguage = GetLanguage(tcToLanguage)
if empty(lcToLanguage)
    return .NULL.
endif empty(lcToLanguage)

* Set up VFPConnection.

set library to VFPConnection.FLL

* Call Google Translate and return the result.

lcURL       = 'https://www.googleapis.com/language/translate/v2' + ;
    '?key=' + lcKey + ;
    '&q=' + lcPhrase + ;
    '&source=' + lcFromLanguage + ;
    '&target=' + lcToLanguage
lcResult    = HTTPSToStr(lcURL)
lcTranslate = ''
if not empty(lcResult)
    lcTranslate = strconv(strextract(lcResult, '"translatedText": "', '"'), 11)
endif not empty(lcResult)
return lcTranslate


function Encode(tcString)
local lcString
lcString = strtran(tcString, '<', '&lt;')
lcString = strtran(lcString, '>', '&gt;')
lcString = strtran(lcString, '"', '&quot;')
lcString = strtran(lcString, '&', '&amp;')
lcString = strtran(lcString, ' ', '%20')
lcString = strtran(lcString, '?', '%3F')
return lcString


procedure GetLanguage(tcLanguage)
local laLanguages[52, 2], ;
    lnLanguage, ;
    lcLanguage
laLanguages[ 1, 1] = 'Afrikaans'
laLanguages[ 1, 2] = 'af'

laLanguages[ 2, 1] = 'Albanian'
laLanguages[ 2, 2] = 'sq'

laLanguages[ 3, 1] = 'Arabic'
laLanguages[ 3, 2] = 'ar'

laLanguages[ 4, 1] = 'Belarusian'
laLanguages[ 4, 2] = 'be'

laLanguages[ 5, 1] = 'Bulgarian'
laLanguages[ 5, 2] = 'bg'

laLanguages[ 6, 1] = 'Catalan'
laLanguages[ 6, 2] = 'ca'

laLanguages[ 7, 1] = 'Chinese Simplified'
laLanguages[ 7, 2] = 'zh-CN'

laLanguages[ 8, 1] = 'Chinese Traditional'
laLanguages[ 8, 2] = 'zh-TW'

laLanguages[ 9, 1] = 'Croatian'
laLanguages[ 9, 2] = 'hr'

laLanguages[10, 1] = 'Czech'
laLanguages[10, 2] = 'cs'

laLanguages[11, 1] = 'Danish'
laLanguages[11, 2] = 'da'

laLanguages[12, 1] = 'Dutch'
laLanguages[12, 2] = 'nl'

laLanguages[13, 1] = 'English'
laLanguages[13, 2] = 'en'

laLanguages[14, 1] = 'Estonian'
laLanguages[14, 2] = 'et'

laLanguages[15, 1] = 'Filipino'
laLanguages[15, 2] = 'tl'

laLanguages[16, 1] = 'Finnish'
laLanguages[16, 2] = 'fi'

laLanguages[17, 1] = 'French'
laLanguages[17, 2] = 'fr'

laLanguages[18, 1] = 'Galician'
laLanguages[18, 2] = 'gl'

laLanguages[19, 1] = 'German'
laLanguages[19, 2] = 'de'

laLanguages[20, 1] = 'Greek'
laLanguages[20, 2] = 'el'

laLanguages[21, 1] = 'Hebrew'
laLanguages[21, 2] = 'iw'

laLanguages[22, 1] = 'Hindi'
laLanguages[22, 2] = 'hi'

laLanguages[23, 1] = 'Hungarian'
laLanguages[23, 2] = 'hu'

laLanguages[24, 1] = 'Icelandic'
laLanguages[24, 2] = 'is'

laLanguages[25, 1] = 'Indonesian'
laLanguages[25, 2] = 'id'

laLanguages[26, 1] = 'Irish'
laLanguages[26, 2] = 'ga'

laLanguages[27, 1] = 'Italian'
laLanguages[27, 2] = 'it'

laLanguages[28, 1] = 'Japanese'
laLanguages[28, 2] = 'ja'

laLanguages[29, 1] = 'Korean'
laLanguages[29, 2] = 'ko'

laLanguages[30, 1] = 'Latvian'
laLanguages[30, 2] = 'lv'

laLanguages[31, 1] = 'Lithuanian'
laLanguages[31, 2] = 'lt'

laLanguages[32, 1] = 'Macedonian'
laLanguages[32, 2] = 'mk'

laLanguages[33, 1] = 'Malay'
laLanguages[33, 2] = 'ms'

laLanguages[34, 1] = 'Maltese'
laLanguages[34, 2] = 'mt'

laLanguages[35, 1] = 'Norwegian'
laLanguages[35, 2] = 'no'

laLanguages[36, 1] = 'Persian'
laLanguages[36, 2] = 'fa'

laLanguages[37, 1] = 'Polish'
laLanguages[37, 2] = 'pl'

laLanguages[38, 1] = 'Portuguese'
laLanguages[38, 2] = 'pt'

laLanguages[39, 1] = 'Romanian'
laLanguages[39, 2] = 'ro'

laLanguages[40, 1] = 'Russian'
laLanguages[40, 2] = 'ru'

laLanguages[41, 1] = 'Serbian'
laLanguages[41, 2] = 'sr'

laLanguages[42, 1] = 'Slovak'
laLanguages[42, 2] = 'sk'

laLanguages[43, 1] = 'Slovenian'
laLanguages[43, 2] = 'sl'

laLanguages[44, 1] = 'Spanish'
laLanguages[44, 2] = 'es'

laLanguages[45, 1] = 'Swahili'
laLanguages[45, 2] = 'sw'

laLanguages[46, 1] = 'Swedish'
laLanguages[46, 2] = 'sv'

laLanguages[47, 1] = 'Thai'
laLanguages[47, 2] = 'th'

laLanguages[48, 1] = 'Turkish'
laLanguages[48, 2] = 'tr'

laLanguages[49, 1] = 'Ukrainian'
laLanguages[49, 2] = 'uk'

laLanguages[50, 1] = 'Vietnamese'
laLanguages[50, 2] = 'vi'

laLanguages[51, 1] = 'Welsh'
laLanguages[51, 2] = 'cy'

laLanguages[52, 1] = 'Yiddish'
laLanguages[52, 2] = 'yi'

lnLanguage = ascan(laLanguages, tcLanguage, -1, -1, 1, 15)
if lnLanguage > 0
    lcLanguage = laLanguages[lnLanguage, 2]
else
    lcLanguage = ''
endif lnLanguage > 0
return lcLanguage

5 comments:

Rick Strahl said...

You can actually bypass the paid API and go directly to the URLs that google is using:

http://translate.google.com/translate_a/t?client=j&text=Hello%20World&hl=en&sl=en&tl=de

Blogged about the why's here a while back:

http://www.west-wind.com/weblog/posts/2011/Aug/06/Translating-with-Google-Translate-without-API-and-C-Code

Rick Strahl said...

Doug,

You can bypass the paid API by using this URL:

http://translate.google.com/translate_a/t?client=j&text=hello%20world&hl=en&sl=en&tl=de

Blogged about the hows and whys of this a while back - around the time when Google was cancelling the Translate API due to abuse. I guess since then they brought it back as a paid API.

The above still works though - it's the same links that the Google translate API pages use internally.

Doug Hennig said...

Hi Rick.

I knew I'd read about that somewhere but had forgotten where. Thanks for the reminder.

Doug

Eric said...

So do you have your user's select a language the first time they run your app, and then you run all your strings through the API and save them in your resource file so the user can fix them up if needed?

Doug Hennig said...

Stonefield Query can be configured to be unilingual and what that language should be (for example, you might want a French-only version) or multilingual, in which case the user selects which language they want to use.

All translation is done before the application is deployed to the user, so the translation utility is a design-time rather than run-time tool. It's the developer (the user of our SDK rather than the end-user of the reporting tool generated by the SDK) that can use the Resource Editor.

Doug