LCA2016 Sysadmin Miniconf Presentations
Presentation Titles
- A Gentle Introduction to Ceph - Tim Serong
- 'Can you hear me now?' Networking for containers - Jay Coles
- Creating bespoke logging systems and dashboards with Grafana, in fifteen minutes - Andrew McDonnell
- Ergonomics of Automation - Jamie Wilkinson
- From Commit to Cloud - Daniel Hall
- Is that a data-center in your pocket? - Steven Ellis
- Keeping Pinterest Running - Joe Gordon
- Network Performance Tuning - Jamie Bainbridge
- Order in the chaos: or lessons learnt on planning in operations - Peter Hall
- Pingbeat: y'know, for pings! - Joshua Rich
- Revisiting Unix principles for modern system automation - Martin Krafft
- Site Reliability Engineering at Dropbox - Tammy Butow
- Sysadmins: present, past and future - Javier Turegano
- The life of a sysadmin in a research environment - Eric Burgueno
Full Abstracts
Tammy is a Site Reliability Engineering Manager at Dropbox.
Dropbox is the home for your most important stuff -- now we're bringing it to life with a growing family of products.
Our engineering team is architecting a family of products that handle over a billion files a day. We take on the complexities of technology that affect everyday life, so that people can get back to living and doing their best work.
Dropbox's Site Reliability Engineering (SRE) team is a hybrid software/systems group which works with traditional software engineering, capacity engineering, and infrastructure teams to ensure that Dropbox runs smoothly.
About Tammy Butow:
Tammy is a Site Reliability Engineering Manager at Dropbox.
She ran a workshop at O'Reilly OSCON in 2015 Linux bootcamp: From casual Linux user to kernel hacker
Networking: The final frontier
These are the voyages of the Sysadmin, JayC
His 5 year mission
To explore strange new worlds
To seek out new life and new containerizations
To boldly go where no man has gone before
Macvlans? VXLANs? VETH pipes? just how many types of virtual networking does Linux have and why would I want to use them?
This talk is intended to give you a quick run through of all the technologies available, why you would use them and why they are designed how they are in an attempt to dissuade you from reimplementing the networking wheel in userspace over TCP (the talk will explain why this is bad as well)
You should be able to come away from this talk knowing what technology to pick for each situation and understand the trade offs with each choice as you attempt to chose the best networking model for your container (or VM) fleets
This is part 3 of 3 in the doger.io series of talks on containerization
About Jay Coles:
2015: Security options for container implementations (http://youtu.be/P_brthaqyTQ)
2014: Linception: Playing with containers under linux
2014: Streaming templates with asyncio
Maintainer of http://doger.io, Material from which talk is based is available at http://www.pocketnix.org
A SysAdmin job is pretty standard, and more or less the same across the board, right? (right?). It seems this is not always true.
When working in a science research organisation, you not only get to do everything you know and loved to do in your previous jobs, you also learn a lot.
Scientists have varied demands. So varied that they never finish using one thing before asking for another one. And they move at a much faster pace than your average IT organisation, where weekly Change Control meetings and ITIL-based operations are king. But at the same time, Science is playing catch-up with DevOps methodologies. Things do not move at the same rapid pace today, because of the complexities inherent to a shared computational environment.
Can long running jobs that take literally months to complete, have a peaceful and stable existence; in an environment that is striving to become more flexible and volatile?
About Eric Burgueno:
Eric Burgueno is a geek that went to law school but works as a Linux SysAdmin instead. He's worked for the big companies like EDS, HP, and IBM; but now is part of a team of two.
He likes doing presentations at work to explain what lies behind the curtain to non-IT people, and he presented at the 2015 LCA SysAdmin Miniconf for the first time. He also likes Oxford commas.
Grafana is an open source, highly configurable, web charts dashboard.
Grafana can be used to monitor not just the usual suspects such as collectd but Internet of Things data sources using MQTT or similar protocols, and is easily extended with minimal effort.
Grafana can be configured to use a variety of backend data stores, including Carbon, an alternative to RRD.
In this talk I'll complete a live demonstration, starting from a fresh Ubuntu 14 VM with Docker installed, where I will install and setup Graphite using Carbon to log both host CPU resources and MQTT feeds and create a custom dashboard to suit.
About Andrew McDonnell:
Andrew McDonnell is an experienced software engineer & systems architect, having spent previously his teenage years hacking code on the Commodore 64 he received for Christmas one year. He loves the challenge of integrating disparate components into a useful system, from PCB design and embedded microcontrollers all he way through to GPU programming and cloud integration.
Beyond of family and work he sometimes has time to play with his collection of 8-bit and 8086 era computers; computing and electronics has always been his passion.
This talk is a crash-course in performance tuning for high-speed low-latency LANs for the most common sysadmin usage.
It differs from Glen Turner's similar talk at LCA2008 as he mostly discussed WAN tuning over slower and more latent links.
Topics covered:
- Performance Tuning Do's and Don'ts
- How Packet Receive Works
- Understanding NUMA
- Bottlenecks - locating, identifying, fixing
- NIC Receive
- Protocol Layers
- Application
About Jamie Bainbridge:
I'm a Senior Software Maintenance Engineer at Red Hat, my focus areas are networking, NFS, and performance.
I'm co-author of Red Hat Enterprise Linux Network Performance Tuning Guide (https://access.redhat.com/articles/1391433 subscription required) and author of several other pieces of performance-related Red Hat knowledgebase content.
I've previously been a network engineer for commercial and residential ISPs, and sysadmin before that.
This will be my first LCA.
How hard is it to get something you have done into production, and how long does it take? At LIFX we believe that deployments should be fast, small and easy as possible. In this talk we go over why we think this is the case, and how we built our systems to in order to meet these goals.
About Daniel Hall:
Daniel Hall is currently the sole Infrastructure Engineer at LIFX.
Continually obsessed with making things more efficient, he hopes to one day replace himself with a very small Docker machine. He has written two books on Ansible and an open source password management tool named RatticDB.
Fed up fighting with the public cloud, or running out of space on your laptop next time you want to try our latest and greatest technology out?
Thanks to technologies like nested virtualisation and thin-lvm you can now build, run and redeploy on your personal laptop a small data-centre's worth of technology. As great as the public cloud and shared lab environments are(n't), sometimes you just want to thrash out a problem quickly. What do you do when you're missing that extra storage or physical compute resource to make it happen?
Come learn some tips and tricks.
About Steven Ellis:
Steve's day job with Red Hat is to persuade organisations to spend their IT budgets on Open Source rather than traditional proprietary technologies. With over 20 years experience of Open Source from development to enterprise architecture his passion for Open Source helped bring linux.conf.au to Auckland NZ back in 2015.
In his spare time he still hacks on MythTV and debugging random new bits of hardware that really should know better.
Steven gives regular talks on FOSS as part of his day job at Red Hat, including technology briefings and community meetups.
In addition he's presented at linux.conf.au 2008, OSDC 2008 / 2009 / 2013, Linux World, OSCON, and is a regular presenter at the linux.conf.au SysAdmin miniconf.
Modern system automation has turned into a "Not Invented Here" maze of competing standards and implementations. But there is no need for this madness. The Unix toolbox has all in store that you'd need to manage clusters and the Cloud. Let's reuse and benefit from proven tools and standardised interfaces!
This presentation is a plea for simplicity, and for the Unix principles. The possibility of a live demo shall not be entirely ruled out.
About Martin Krafft:
Martin is a Debian developer, with a background in security and QA. System automation has been a part of his work before DevOps were conceived. He likes a nice Cloud on an otherwise sunny day playing outside with his kids, and wonders a lot about alternative uses of the blockchain.
With over a 100 million users and under 500 engineers, Pinterest has become one of the most popular websites. But what does it take to run a large service like Pinterest? What tools, processes and best practices do we use? This talk will cover topics such as our software stack, monitoring and alerting, public clouds, continuous deployment, the tools we have open sourced and more.
About Joe Gordon:
Joe Gordon is a Site Reliability Engineer at Pinterest. Before that he spent 4 years working on OpenStack, where he was a top 1% contributor. He has spoken at numerous conferences around the world such as, LinuxCon North America, All Things Open, OpenStack Israel, Pycon AU and LCA 2015.
Ceph is a massively scalable open source distributed storage solution, which runs on commodity hardware. This short talk explains the components of Ceph and how Ceph works, and provides an overview of the things you need to be thinking about when deploying real world Ceph clusters.
About Tim Serong:
Tim currently works for SUSE developing the SUSE Enterprise Storage product, which (surprise!) is powered by Ceph. He has spoken about high availability and distributed storage at several previous LCAs. In his spare time he wrangles pigs, chickens and sheep, which may or may not actually constitute relaxation.
The sysadmin role has evolved quite considerably from the early days. From BOFHs to Devops the journey has been quite interesting and the craftsmanship has evolved a lot. But, is there a future for sysadmins? Is there any light between those clouds?
I would like to explore during this talk the past, present and future of the sysadmin role from a generic and personal perspective including the evolution of the cultural (mindset) and technical (skillset) components of it, what does the job involve, how the demand and salaries have evolved and how to get hired as a sysadmin.
About Javier Turegano:
Javier Turegano is an IT engineer whose passions are open source, web operations and IT leadership.
Currently based in Melbourne he leads the Global Infrastructure and Architecture team at REA Group, which hosts some n? 1 realestate advertising sites internationally (realestate.com.au, casa.it, athome.de, etc...). Previously he's played different roles including systems architecture lead, project manager and sysadmin in one of the first Open Source consultancy firms in Spain which worked with the objective of empowering Public administration organisations by using Open Source technology.
His love for knowledge sharing and his sense of community commenced in his early years at University where with other enthusiasts he founded a local Linux user group (Linux Albacete) and continues today with different levels of collaboration within the community.
In other fields, safety science has had a remarkable impact on systems reliability -- aviation, medicine, engineering; all of these fields understand the importance of designing human-machine interfaces to reduce error.
Many of the lessons from several decades old safety science apply well to computer operations as well. As systems grow we find the overheads of maintenance grow as well, straining our automation to the point of failure or at least requiring humans to supervise that automation, which increases to the point of being a greater cost than the original work!
In this talk I'll present some findings from the 1990s and discuss their applicability to systems administration and site reliability.
About Jamie Wilkinson:
Jamie is a Site Reliability Engineer working on one of Google's storage infrastructure services, a globally replicated low latency eventually consistent buzzword compliant key value store. He used to be passionate about monitoring and automation, but now really cares about keeping the cost of maintenance constant as the service scales.
Planning can be challenging in any part of IT, but what if you don't know what *might* happen tomorrow, service reliability, load issues, changing user requirements or that special project that you only just heard about and needs to go live Monday. Of course to any systems administrator I'm not explaining anything new.
So how we plan in operations? or are we destined to continue like this forever?
This talk will look at some of the strategies my team has used to increase happiness, productivity and visibility of the work we are doing.
While a lot of what I will talk about would certainly fit in the "good devops practices" bucket, this isn't a devops talk. I will attempt to keep it to actionable items you can start to implement on Monday.
About Peter Hall:
Peter Hall is a Operations Delivery Lead at REA Group (realestate.com.au). Peter and his team work with three agile development teams and look after all the related infrastructure.
Over the last 10 years Peter has experience working in a number of different operations teams from a big four bank and Australia Post to the "more agile than agile" environment of catchoftheday.com.au. This has given him a keen interest in how different teams work and how to we measure value.
Peter has spoken at Amazon Web Service Sydney submit in 2015, Puppet Camp 2015 Sydney, is a regular attendee and speaker at the Melbourne infracoders meet-up and was on the organising committee of Devops Days Melbourne 2015.
Ping, it is your go-to tool for diagnosing networking issues. But what is a ping actually doing and what is it telling you? What if you could keep a record of ping responses across your network to look at historical issues and potentially predict upcoming problems?
In this talk I'll give a quick overview of the venerable and beloved ICMP ping. I'll then introduce Pingbeat, a small open-source program written in Go that can be used to record pings to hundreds or thousands of hosts on a network. There are many existing tools out there similar to Pingbeat, but its power lies in its ability to write the ping response to Elasticsearch, an open-source NoSQL-like data-store with powerful, built-in search and analytics. Combined with Kibana, a web-based front-end to Elasticsearch, you get an interactive interface to track, search and visualise your network health in near real-time.
About Joshua Rich:
Joshua came to Elastic from a background in scientific research and high-performance computing. Nowadays he helps people get the most out of their Elasticsearch, Logstash and Kibana deployments, which basically means he gets to meet a lot of awesome people doing awesome things with awesome products. He interacts customers every day and has given several talks around the world, including plenty back when he was a lowly PhD student trying to promote his research!