首页 > 代码库 > [Good for enterprise] How to monitor GFE?
[Good for enterprise] How to monitor GFE?
Q: What is Good for enterprise (GFE)?
A: A mobile messaging software, allow enterprise internal email synchronize to user‘s mobile devices, support client policy management, e.g. forbid "Save as" emails, disable accounts when jailbreak or root detected, remote lock client, wipe client. Andriod and iOS are supported, windows phone probably in progress, not sure. Client can be downloaded from google play or app store, your company must purchased GFE, otherwise it‘s meanless to install a client. GFE is a part of Good technology products family, it contains two parts, GMC (Good mobile control) and GMM (Good mobile messaging).
My company encountered some issues before, neither email sync to user‘s client nor user send email via Good client, the GFE service is defined as "critical" in my company, so once issue occurs, our management team will changlle us, like, why there is no monitoring for the service, why the monitoring stopped working?
Good technology provides their way for monitoring, when you purchase GFE you can request, but you should know the function is really limited, they only can simply check network connectivities with NOC and GMM, also like "how many messages are pending".
I had asked Good engineer before, Good doesn‘t provide end-to-end monitoring, logs of GMC and GMM are actually used by Good engineer analysis, otherthan customers. I confirmed what we can do from our end is check windows event logs, also service startup state if you want, anyway when most of issues occurring, the GMM service is running normally, and probably some of users still works, GMM is not crashed or something, so, service monitoring is good but not have much meanings. So, at last, to provide a more stable service, i use powershell wrote a script to monitor, after several times of modifications, the script now start working and successfully captured several times of incidents.
My company is using Exchange 2010, so of course, we use GFE for Exchange. Below is the brief summaries i did:
(Please note, one EventID can be mapped to many messages, some of them are information, some are errors, so you can‘t just simply use EventID for monitoring, the content message must be involed. PS: Good engineer told me they had a list of valuable EventIDs for service monitoring, ask your vendor. :D)
1. MAPI errors between GFE and Exchange, EventIDs are 3563 and 3386.
3563: Almost every errors or informations related to users like "pausing user" and "unpausing user" are managed by the event, when the event being generated, the number of events depends on the number of users managed by GFE. The event also generated occasionally, it will recover itself, so we must avoid this kind of "auto-self-recovery".
-------Pausing user XXXXXX. (MAPI error - Can not access the users mailbox due to network error.)
3386: GMM failed to invoked Exchange API.
-------Good Messaging Server failed to open message store for user XXXXXX. Verify that Exchange Server is operation and accessible. Verify that GoodAdmin account has correct permission on user‘s mailbox. (HRESULT:GDMAPI_OpenMsgStore failed (FormatMessage returned 0. The error was 0x80040115))
Again, the two events mentioned above generated occasionally, so we must exclude the situation, you can find details from the script i paste at the bottom.
2. Failed communication with NOC, normally enterprise will never allow a internal used server facing with internet, my company is using proxy for GFE and NOC, the events below are quite danger cause when they generaged, means sometime really not good, but sometimes it will be recovered automatically.
5662,
Good Messaging Server failed to login to Good Mobile Messaging Data Center as hostname XXXXXXX Error code 65547 (errNetTimeout)
5669,
Good Messaging Server failed to connect and authenticate with Good Mobile Messaging Data Center at proxy://:?????@proxyxxxx:proxyport/https://xml28.good.com:443/ with hostname XXXXXX. Reason code 65546 (errNetRecv)
5675,
Failed to add a message to connection xxxxxxx. Error code 65538 (errNetConnect)
5733,
Unable to batch messages towards XML gateway 65547:errNetTimeout
If those events show up in event viewer, means GFE did failure to communicate with NOC, which should really not happen, as a resule in script, i define threshold as one, which means "alert right away".
3. Thread hung up, cause GFE process failure.
1299/1300/1301: they all occur sometimes, but GFE will automatically recovered, what we want is capture the situation it doesn‘t.
1299,
Timed operation failed, name:dowork-5 time (ms):1200000, thread name:dowork-5, thread state:handleMessage:svcAdmin, thread id;505861040
1300,
Timed operation failed (list of threads), name:dowork-5 time (ms):1200000, thread name:InboundMgr, thread state:starting, thread id;688420216
1301 contains some information of stack, i just paste first few lines here,
Stack for thread 1384 is --# FV EIP----- RetAddr- FramePtr StackPtr Symbol0 .V 77821f46 759e338a 52d3ff88 52d3fe28 ZwWaitForWorkViaWorkerFactory +00018 bytesSig: ZwWaitForWorkViaWorkerFactoryDecl: ZwWaitForWorkViaWorkerFactory
Thread hung up caused GFE stopped working, the situation happend in my environment once, and it‘s rare to see them. when the issue happens, those events will be generated hundards in few minutes.
Now, several events described, below is the script part, i am using task scheduler run it every 15 minutes.
some events depends on the number of uses on GFE, this need IT team keep an eye on it, so you can adjust a proper threshold, and some events also generated occasionally, normally will recover in 5-10 minutes, so i used "Pattern" and "MinusPattern" to exclude "auto-recover" situation.
#change working directorySet-Location (Get-Item ($MyInvocation.MyCommand.Definition)).DirectoryName#define events to be monitored and their properties#ID means eventID, if you use array like @(xx,yy), means combine results first, e.g. xx matched 10 enties, yy matched 10 enties, combine as 20 than compare with threshold#Pattern is regular expression in C#, used for filter specific events.#MinusPattern also regular expression, used for filter specific events.#if Pattern and MinusPattern be defined, pattern matched 100 enties, MinusPattern matched 90 enties, so final number is 10, than compare with threshold, this is the way to exclude "auto-recover".$Events = @( @{ID = 3563; Pattern = ‘\bPausing .*MAPI error‘; MinusPattern = ‘Unpausing‘; Threshold = 100;}, @{ID = @(1299, 1300, 1301); Pattern = $null; Threshold = 100;}, @{ID = 3386; Pattern = ‘GDMAPI_OpenMsgStore failed‘; Threshold = 100;}, @{ID = @(5662, 5669); Pattern = $null; Threshold = 1;}, @{ID = 5675; Pattern = ‘errNetConnect‘; Threshold = 1;}, @{ID = 5733; Pattern = ‘errNetTimeout‘; Threshold = 1;})$Date = Get-Date$strDate = $Date.ToString("yyyy-MM-dd")$End_time = $Date$Start_time = $Date.AddMinutes(-15)$strLogFile = "${strDate}.log.txt"$strLogFile_e = "${strDate}_Error.log.txt"#define email properties$Mail_From = "$($env:COMPUTERNAME)@fil.com"$Mail_To = ‘xxxxx@xxx.xxx‘$Mail_Subject = ‘Good event IDs warning‘$Mail_SMTPServer = ‘mailhost.hk.fid-intl.com‘Set-Content -Path $strLogFile_e -Value $null function Add-Log{ PARAM( [String]$Path, [String]$Value, [String]$Type ) $Type = $Type.ToUpper() Write-Host "$((Get-Date).ToString(‘[HH:mm:ss] ‘))[$Type] $Value" if($Path){ Add-Content -Path $Path -Value "$((Get-Date).ToString(‘[HH:mm:ss] ‘))[$Type] $Value" }}Add-Log -Path $strLogFile_e -Value "Catch logs after : $($Start_time.ToString(‘HH:mm:ss‘))" -Type InfoAdd-Log -Path $strLogFile_e -Value "Catch logs before: $($End_time.ToString(‘HH:mm:ss‘))" -Type InfoAdd-Log -Path $strLogFile_e -Value "Working directory: $($PWD.Path)" -Type Info$EventsCache = @(Get-EventLog -LogName Application -After $Start_time -Before $End_time.AddMinutes(5))Add-Log -Path $strLogFile_e -Value "Total logs count : $($EventsCache.Count)" -Type Info$Mail_Body = $nullforeach($e in $Events){ $Events_e_ALL = $null $Events_e_Matched = $null $Events_e_NMatched = $null $Events_e_FinalCount = 0 $Events_e_ALL = @($EventsCache | ?{$e.ID -contains $_.EventID}) Add-Log -Path $strLogFile_e -Value "Captured [$($e.ID -join ‘], [‘)], count: $($Events_e_ALL.Count)" -Type Info $Events_e_Matched = @($Events_e_ALL | ?{$_.Message -imatch $e.Pattern}) Add-Log -Path $strLogFile_e -Value "Pattern matched, count: $($Events_e_Matched.Count)" -Type Info if($e.MinusPattern) { $Events_e_NMatched = @($Events_e_ALL | ?{$_.Message -imatch $e.MinusPattern}) Add-Log -Path $strLogFile_e -Value "Minus pattern matched, count: $($Events_e_NMatched.Count)" -Type Info } $Events_e_FinalCount = $Events_e_Matched.Count - [int]$Events_e_NMatched.Count Add-Log -Path $strLogFile_e -Value "Final matched, count: $Events_e_FinalCount" -Type Info if($Events_e_FinalCount -ge $e.Threshold) { Add-Log -Path $strLogFile_e -Value "Over threshold: $($e.Threshold)" -Type Warning $Mail_Body += "Event ID: [$($e.ID -join ‘], [‘)] warning!`n" }}Add-Log -Path $strLogFile_e -Value "===================split line====================" -Type InfoGet-Content -Path $strLogFile_e | Add-Content -Path $strLogFileIf($Mail_Body){ try { Send-MailMessage -From $Mail_From -To $Mail_To -Subject $Mail_Subject -Body $Mail_Body -SmtpServer $Mail_SMTPServer -Attachments $strLogFile_e } catch { Add-Log -Path $strLogFile -Value "Failed to send mail, cause: $($Error[0])" -Type Error }}