求解答 System Engineer 的工作

avatar 21160
lqs4188980
4359
1
最近 Amazon 的 recruiter 发邮件问我对 AWS System Engineer 职位有没有兴趣,我之前完全没有相关经验和背景,不知道跟 Software Developer 的工作相比如何,希望地里大大给予答疑解惑,谢谢!

下面是 S3 和 CloudWatch 的 job description:

S3:
AWS Systems Engineer:
· Amazon is building some of the largest distributed systems in the world, and we need smart people to help plan and put together the pieces. Amazon Simple Storage Service (S3) provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. We have high standards for our computer systems as well as our employees: our systems are highly reliable, highly available, and turn scale into an advantage for our business and an asset to our customers; our employees are super smart, driven to serve customers, and fun to work with. The successful Systems Engineer / Administrator joining the S3 team will do much more than plug computers together or track changes. The candidate will be instrumental in deploying, operating, and scaling a massive always-on distributed system that is part of an entirely new software development approach. We are looking for a seasoned Systems Engineer / Administrator to join our energetic, hard-driving, and passionate team.

You should have or be most of the following:
Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) at least hundreds of machines
Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments
A solid grasp of networking fundamentals, preferably including hands-on experience with load balancers, switches, routers, etc.
Familiar with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems
You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
Through participation in all phases of the development of a large distributed system, provide hardware, manageability, operability and performance perspectives on all aspects of S3 and potentially its dependencies
Define and/or refine hardware requirements and selected designs, balancing raw up-front dollar cost with operability and TCO, from the data center infrastructure up specify and participate in the development and delivery of operability-related features such as system health monitoring, diagnostics, repair, and other self-healing automation
Develop or further existing application and system management tools and processes that reduce manual efforts and increase overall efficiency
Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic
Participate in the design and execution of production acceptance tests and new hardware evaluations
Maintain fleet inventory management, including producing, maintaining, and evolving capacity plans for various components
Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed
Perform various system maintenance tasks (your hands get dirty here), including configuration of new machines
Manage directly assigned tasks and on-call duties gracefully
· Basic Qualifications
BS Computer Science or other technical degree and/or related
1-5+ years of solid *NIX system administration experience
Development of systems management and administration automation in a high level scripting language (Perl, Python, Ruby) or Java
· Preferred Qualifications
Experience with systems management or monitoring software (home-grown or commercially available)
Automation or monitoring framework experience, deployment or development
Experience with very large distributed systems such as multi-terabyte storage farms, and/or horizontally scaled request processing fleets
Experience with SATA, ISCSI, NAS, SAN and other modern storage technologies
Experience with hardware load balancer administration, network optimization, or other related and demonstrable TCP-level experience

CloudWatch:
CloudWatch is the monitoring service that AWS customers use to gain system-wide visibility into resource utilization, application performance, and operational health. A picture is worth a thousand words, so CloudWatch gives customers easy easy-to-use graphs to see how their systems are working. With its automatic alarms, CloudWatch helps customers keep their systems running 24x7.

The CloudWatch team is currently looking for a systems engineer with SYSTEM AVAILABILITY OBSESSION DISORDER. Is that you? How can you tell? You detest outages and high latency like the plague, so it’s your life’s mission to find and eliminate root causes of problems. You follow evolution of the industry to find useful tools that can enhance efficiency. You find data center build-outs, performance engineering, and other scaling activities to be almost as fun as Mardi Gras. Finally, you insist upon giving customers what they want: high quality, highly usable, always-on services.

In this position you’ll get to:
· Work with developers to design, build, and manage massively scaled monitoring services
· Build monitoring systems in new data centers and regions, and add/manage capacity in existing regions as our usage grows
· Optimize the performance of our systems by analyzing and deploying new hardware configurations
· Track the health of our services, identify problems, drive to root cause, and fix
· Collaborate with some of the leading minds in distributed systems
· Tell your grandkids that you were you “helped create the Cloud all those years ago”

Basic Qualifications:
Scripting experience in a Linux environment (in Perl, Ruby, bash, etc.)
· Experience with building and running systems for Internet-facing services
· BS in computer science or other technical degree and/or related experience
· This position requires the applicant selected to obtain and maintain a Top Secret security clearance with Sensitive Compartmented Information (TS/SCI) eligibility and access. A US Government administered polygraph examination will be required. TS/SCI eligibility is not required to start; however, the applicant selected will be subject to a Single-Scope Background Investigation (SSBI) and must meet eligibility requirements for access to classified national security information. Applicants with a current SSBI, SBPR, or PPR, may be eligible for crossover in accordance with ICPG 704.4

Preferred Qualifications:
Experience with TCP/IP network troubleshooting and administration
· Experience in a 24x7 production environment, esp. one based on Linux
· Excellent troubleshooting skills at all levels, from application to network to host
· Experience with systems management and monitoring software (home-grown or commercially available)
· Experience with performance testing and tuning
· Automation or monitoring framework experience, deployment or development a plus
· Experience with SQL scripts and database administration preferred
· Degree in computer science, mathematics, or a related field
1条回复