Mark Minasi's Tech Forum
Register Calendar Latest Topics
 
 
 


Reply
  Author   Comment  
zuss

Still Checking the Forum Out
Registered:
Posts: 2
Reply with quote  #1 
Hello,

this is not about a specific problem, but more general.

I work in IT and if a problem occurs, we search the internet for solutions/fixes, i.e. "change this registry key" or other settings. If the problem still occurs after spending some time looking for a solution, management decides to "rebuild" the server, meaning a complete new install of the OS and applications. If the problems still occurs, they either change the product or "accept" the problem. For example, we have a RDS Server with 20+ users that needs to get rebooted ca. 3 times a week during the workday (!!).

I understand from a commercial standpoint, that it would be too expensive to pay IT staff for deeper analysis (because time = money and much time needed).

But for me personally this is frustrating. For me, the big advantage of tech is, that it is deterministic and you can get the root cause of problems, if you analyse the informations rigourously. This gives me a feeling of control and therefore a feeling of security. In the example of the rebooting RDS Server - while I don't get any blame and users accept it (c level management orders them to accept it) - I still feel bad, because I have no control of this unpleasant situation.

As a thought experiment, given you have unlimited time, you could decompile "Windows Server" and the running applications and then debug to solve the problem. A lifetime is probably not enough time to acomplish anything.

But i think there must be something in between "change this setting and look if it works" and decompiling the executables.

I have heard, that bigger companies employ experts, who do deeper analysis and solve problems that have not been solved in public forums.

How do they work? What tools are they using? Are they exchanging informations in public blogs, forums etc.? Where do I find deeper knowledge?

Where I have searched so far is:
a): develolper forums: devolpers need deep knowledge of the OS, I think, but I mostly find soutions like "use this .NET function" etc., no explanation of mechanisms..
b): Mark Russinovich has written books about "windows internals" and "sysinternals" tools. There seems to exist a community of experts, which use "procmon" and "process explorer" to find out, which processes hang/lock each other. Is more on the low level side/tendency towards the decompilation extreme.

Would you recommend further pursuing path a) or b) or both or none of them?



0
donoli

Senior Member
Registered:
Posts: 598
Reply with quote  #2 
Before I search google, I would look at the event viewer.
0
wobble_wobble

Avatar / Picture

Associate Troublemaker Apprentice
Registered:
Posts: 883
Reply with quote  #3 

Hi and welcome to the forum.

We do ask that people fill in a bit more details on their account so we can address you by name.
I feel odd calling you a name we called two of our dogs!

Yes, there are tools out there to allow you look into what is happening, and you will find that you need to look at a lot of them, especially until such time as you "get there" and getting there is the expert you speak of.

Your looking for what changed.....since it was stable and how to I get it back to that point in time.
Sometimes, its just easier to rebuild - I see that about 30% of the time.
Sometimes its just too broken - it got ransomware, some massive amount of tweaking, several bad pieces of software installed, it has malware, end users are local admins...

So first thing is to find out when it started happening.
Then see if you can find out why?
Was it an update, new piece of software, a P2V whatever.
Or at least try to find out when the rebooting started.
This brings you to the first thing mentioned event logs.

Does the issue happen at random times or at certain times.
If its a physical server, the hardware logs, hardware manufacturer event logs.
What happens that you need a reboot - can't ping/ rdp/ can't run an application....
Can you get a copy of all the logs (Server, hardware, programs) and look into them.
Can you P2V or V2V the server and see the issue on your test environment.

After you've spent enough time on that and that's the hard part to define, you can go to the other tools.
Process Monitor, Process Explorer, Wireshark, Netmon, syslogging, memory dumps etc.
Then you get to the low level decompiling.

Your not going to learn the techniques in a week or a month, but you can improve your knowledge all the time by looking, asking questions and trying it out.

Good luck and keep asking questions.

Some Links

Process Explorer is easier to use at the start than Process Monitor.
ProcExp info 1- Link
ProcExp info 2 - Link

Process Monitor can use massive amounts of resources if you let it run for too long
ProcMon info 1 - Link 





__________________
Have you tried turning it off and walking away? The next person can fix it!

New to the forum? Read this
0
wkasdo

Avatar / Picture

Administrator
Registered:
Posts: 232
Reply with quote  #4 
Agreed with Joe, the method is often more important than the tool. Sticking to the basics goes a long way:
- get a repro, the simpler the better.
- find a workaround.
- check the eventviewer. research specific errors.
- performance stuff (like your RDS): taskmgr, resmon, Perfmon.
- enable tracing/debug log, etc
- the only "deep" tool that I use regularly is wireshark. Dunno why most people seem to be afraid of it.
- sysinternals: process explorer, sometimes procmon, the rest: rarely.


__________________
[MSFT]; Blog: https://blogs.technet.microsoft.com/389thoughts/
0
Phil-n-JaxFL

Avatar / Picture

Grumpy Old Men
Registered:
Posts: 87
Reply with quote  #5 
Another thing that might make your life easier is making a clone of it after you've rebuilt it (assuming it is a VM).
If it is a physical server, I would use Disk2VHD, which is free and you can get it here:
https://live.sysinternals.com/


__________________
Phil
0
zuss

Still Checking the Forum Out
Registered:
Posts: 2
Reply with quote  #6 
Thank you for your help,

the event viewer is certainly very useful when analysising errors.

Joe, I updated my user profile, so you don't have to call me dog names [smile]

I will further research the use of procmon and Process Explorer.

To recreate the Server in a testing enviroment, whats the best way to simulate user behavior? Like running 20+ RDP Sessions, that perform standard actions? Can I automate such a thing with powershell (not just the connection, but also mouse an keyboard input)?
0
wobble_wobble

Avatar / Picture

Associate Troublemaker Apprentice
Registered:
Posts: 883
Reply with quote  #7 
Michael

There is no easy way to simulate end user interaction.
You could login as multiple users and run a short script to list dir contents in notepad files or something, but not a lot else.
I'm not saying its the load that stops the machine, but its not something I'd normally see. You can see latency/ RAM + CPU maxing out if it was a load issue.

But write down your notes and like I said keep asking questions. 
We're a friendly bunch that covers a very wide range of technologies.

__________________
Have you tried turning it off and walking away? The next person can fix it!

New to the forum? Read this
0
dennis-360ict

New Friend (or an Old Friend who Built a New Account)
Registered:
Posts: 67
Reply with quote  #8 
great write-up wobble, nice of you to take the time!
__________________
-----
Home is where is sleep
0
Previous Topic | Next Topic
Print
Reply

Quick Navigation:

Easily create a Forum Website with Website Toolbox.