Summary

As a manager, I have spent 5 years of my career guiding teams through challenging conditions, ranging from technical challenges through COVID-19 related challenges. I have kept my team together, and we have thrived on the challenges of taking care of the data for a $200 million company. This includes managing my team through the COVID-19 pandemic, without losing a single member of my team to another company.

I am a Google Cloud Certified Professional Data Engineer. I have spent 9 years working with streaming challenges, ETL challenges, and data latency challenges as we keep our overall latency to 5 hours or less. We have done this using Hadoop, Python, Kafka, and Alluxio. We have worked with different data storage formats, different compute engines for accessing the data, and different ways to coordinate our ETL pipeline to avoid having jobs crash into each other.

Furthermore, as a DevOps Engineer, I have 23 years of experience managing Linux, UNIX, Windows, and OSX/macOS systems. This means that I look at the whole picture, not just System Administration or Software Development. Shepherding a system through the creation and deployment process, and seeing the customer’s happiness at having things work the way they need it to, is a particular joy of mine. Making people’s lives better is the point of technology, after all.

Finally, as a Software Engineer, I have spent 6 years of my career focused on delivering high quality software to my company’s customers, with their focus and needs being on sorting through large numbers of documents in a timely fashion. This has meant understanding ingestion, storage, and display of arbitrary data. It has included custom data visualizations. This was primarily done with Python and Ubuntu Linux, but has also included work with Perl and PHP.

I am comfortable in a wide range of working conditions. Work environments have been heterogeneous (several flavors of Linux, Windows, and OSX/macOS), small to medium sized (from 10 to 1200 servers, 20 to 300 workstations), and mixed locations (all local to all remote teams). Programming languages have included Python, PHP, Perl, and Java.

Relevant Technical Skills

Job History

Pulsepoint - Data Engineer and Director of Infrastructure for Data

New York City, NY & Newark, NJ (Telecommute) - Mar 2015 - Nov 2023

Pulsepoint is an internet healthcare marketing company with a focus on activating health care providers. Pulsepoint was acquired by WebMD in June 2021.

My role evolved over time from dealing with individual data jobs to overseeing the entire ETL pipeline to leading the entire department.

Director of Infrastructure for Data, May 2018 - Nov 2023

Data Engineer, Mar 2015 - May 2018

Weight Watchers - Systems Engineering Lead

New York City, NY - Nov 2014 - Feb 2015

Weight Watchers is a Fortune 500 company focused on helping customers manage their weight and reduce health problems caused by it.

My role was focused on providing internal support within the company to enable other groups to support the customer base.

OrcaTec, LLC - Developer

Atlanta, GA (Telecommute) - Jun 2012 - Oct 2014

OrcaTec is in the litigation support industry (they help their clients reduce the costs of being sued). OrcaTec is primarily a software-as-a-service company, allowing OrcaTec to host customer data. While working here, my focus has been on improving the GUI. This has involved refactoring code heavily, adding new features, and adding new tests to cover existing and new code.

The team structure at OrcaTec is geographically very diverse. In addition to my own telecommuting, I have teammates in many states. We all work remotely, and we all work together to make the product the best that it can be.

Choopa.com - Developer

Sayreville, NJ - Jan 2012 - May 2012

As a developer at Constant.com (renamed from Choopa.com in Jan, 2012), I worked with a variety of technologies, with the heaviest focus being on OpenStack and Nagios. I helped bring two products to production level availability for their customers (specifically: the Dedicated Cloud Server and Backup systems).

6th Avenue Electronics - Systems Administrator, DevOps Engineer

Springfield, NJ - Aug 2005 - Apr 2008, Feb 2011 - Dec 2011

In 2007, 6th Avenue began switching from their then-current POS system (named Tyler) to SAP. At the end of 2010, SAP was declared unworkable, and the effort was begun to switch back to Tyler.

The environment at 6th Avenue covered a wide range of platforms spread out over 120 servers (both physical and virtual). We had VMware ESX, Windows Server 2003, Windows Server 2008, CentOS Linux, Suse Linux, and Debian GNU/Linux. In 2011, I was brought back to transition the point of sale system and become the IT Manager. At the time the point of sale transition was completed, we had a team of 6 people managing the servers and about 300 desktops.

Datapipe, Inc. - UNIX Developer

Jersey City, NJ - May 2008 - Jan 2011

Datapipe manages thousands of customers servers. Many of these servers are connected to various shared storage systems, including 3Par, Isilon, and backup servers. Datapipe required an ability to do reporting on what data was being stored on these systems for each client, and then report that data back to billing. In addition, Datapipe required monitoring of the backup systems to ensure timely and complete backups of client data. My duties primarily focused on making these systems work well.

My team structure is worth describing briefly as well: My immediate manager worked out of Austin, TX. One coworker worked in the same building as myself, and I had two “extended” teammates who worked in Jersey City, NJ (I worked in Somerset, NJ). The extended team included the Windows developers, while I was on the UNIX development team.

Diversified Systems - Systems Administrator / Developer

Hackettstown, NJ - Sep 2002 - Jul 2005

Diversified Systems is a small company that focuses on low voltage wiring and subcontracting. While there, I wore many hats, and did work on every system. The total number of servers for this company was less than 10, and the entire IT department consisted of myself.

Ciber, Inc. / Decision Consultants - Member of Technical Staff

Greenwood Village, Co - Mar 1999 - Sep 2002

Decision Consultants (DCI) was acquired by Ciber, Inc., in 2002. While working for DCI, I was contracted out to Coors, IBM, and a .com named “X-Care” (no longer in business). The points below come from all of those places.

Robert Half International - Technical Support

Boulder, Co - Jan 1999 - Feb 1999

Robert Half International’s client, StorageTek, provided large enterprises with long term backup solutions (typically involving dozens of tape drives, thousands of tape cartridges, and robotic tape libraries to manage all of it).

Sykes Enterprises - Systems Technologist

Denver, Co - Aug 1998 - Dec 1998

Working for Sykes Enterprises, I was contracted out to Sun’s internal Resolution Center. I worked with Sun employees around the world to resolve their issues with the workstations and servers they relied on daily.

Fabian Corporation - System Administrator

Strousdburg, Pa - Feb 1998 - May 1998

Fabian Corporation was a small virtual hosting provider for web sites during the fledgling web days, even before the dot-com era. A typical customer made a static web site and uploaded it via ftp to show to any web site visitors.

MaxTech Corporation - Developer / System Administrator

Rockaway, NJ - Mar 1995 - Dec 1997

I was hired at MaxTech as a customer service representative. Over the time I worked there, I earned the opportunity to participate in system administration and the development of a new call tracking system to be used by the customer service team.

Personal and Side Projects - Developer, Systems Administrator

1995-Current

When I’m not working on projects for my employer, I’m working on projects for myself, or side projects for people who get in touch with me to make something for them.

Education

Bachelor of Science in Computer Science, 2000
East Stroudsburg University, East Stroudsburg, Pennsylvania

Professional Certificates

Google Cloud Certified Professional Data Engineer (Jan 2024)
Online Course - Google

Project History

Migrate To New Data Center

Period 2022-2023
Company Pulsepoint
Tools Alluxio, Hadoop, Kafka, Python
Platform CentOS, Kubernetes

Pulsepoint is in the process of migrating between data centers. A significant portion of the existing hardware has gone past its end of life, so we chose to build a new data center, with new hardware. At the same time, we used the latest versions of all relevant software that we could (Hadoop, Kubernetes, etc).

This provided us with an opportunity to fix some design flaws in the original big data clusters, and we used this chance to make things better for us overall.

The work remaining at this point comes down to verifying that the new versions of the ETL jobs function as expected, producing valid output. The process is expected to complete in 2025.

Migrate From Python 2 to Python 3

Period 2022-2023
Company Pulsepoint
Tools Python
Platform CentOS, Kubernetes

Pulsepoint built the entire ETL pipeline using Python 2. On January 1, 2020, Python 2 reached its end of life. In order for the ETL pipeline to continue to grow, we needed to migrate to Python 3.

The path we chose was to extract the code that was common to the pipeline, and turn that code into a library. We then began the normal route of making backwards incompatible changes. Because of the scope of this work (nearly 200K lines in Python files), and the work being done during a data center migration, the project is still ongoing. However, over 50K lines have been successfully completed so far.

Dataflow Explorer

Period 2015
Company Pulsepoint
Tools Python, Graphviz Dot, Luigi
Platform Mesos, CentOS, NGINX

At Pulsepoint, we have a large number of data aggregation jobs that are coordinated with each other via Spotify’s Luigi tool. Luigi has the user create a Python codebase that resolves which order to do jobs similar to how GNU Make actually works. A negative side effect of this is difficulty for humans to understand the order of jobs that will be run when the number gets to any significant size.

The Dataflow Explorer would walk the Python code that represented all of the jobs, and extract the attributes that would allow construction of a dependency tree. It would then pass that tree to the Graphviz DOT tool, which would run dot to produce an SVG file showing the graph of all the jobs. Finally, it would publish that output onto Mesos using NGINX, allowing people to browse, zoom, and search the resulting graph.

Cassandra for User Reporting

Period 2015
Company Pulsepoint
Tools Cassandra
Platform CentOS Linux

Pulsepoint has a fairly significant Microsoft SQL Server installation, and we were asked if we could use Cassandra as a replacement for it. We set up a small cluster, and began trying to run various reports against it.

The actual performance was impressive, but we ran into a significant roadblock: Cassandra is, in significant ways, a disk based key/value store. In order to use this as a reporting database, and avoid triggering table scans for the user reporting, we would have had to load up many copies of the same data into different tables with different primary keys.

In the end, this was deemed non-feasible for the number of combinations we would have had to provide, along with the amount of maintenance as new reports could be brought online.

California Hadoop Cluster

Period 2015
Company Pulsepoint
Tools Hadoop
Platform CentOS Linux

Pulsepoint needed to establish a disaster recovery site, and had chosen an existing data center to do so. In the process, establishing a Hadoop cluster was required for business continuity. My task was to get everything configured to the point that the same data jobs running in the primary cluster ran in the backup cluster and provided equivalent data, even though everything was running independently.

Sqoop to FreeBCP(FreeTDS) Conversion

Period 2016
Company Pulsepoint
Tools Sqoop, FreeTDS
Platform Hadoop, Microsoft SQL Server

Apache Sqoop has long been deprecated, with its eventual complete retirement in June 2021. As part of Pulsepoint’s platform, we needed a replacement for Sqoop before it was fully retired. We settled on FreeBCP, which is part of the FreeTDS project. Using this tool, we were able to migrate our processes for transferring data from Hadoop to MS SQL Server.

Vertica Decommissioning

Period 2018
Company Pulsepoint
Tools Vertica, Trino
Platform CentOS Linux

Pulsepoint had used Vertica, but we were outgrowing it in 2017. In 2018, when we came up for the most recent support renewal, we had fully outgrown it and needed to replace it with something else. After trying out several other options (including Clickhouse, Trino, [MariaDB][MARIOADB], and others), we settled on Trino as the option that provided us with the best capabilities while being nearest to the performance that Vertica provided.

Data Management Team Split

Period 2021
Company Pulsepoint
Tools Git
Platform Jira, GitHub

As part of the growth of Pulsepoint, the Data Management team reached a point wherein the team was no longer able to do everything that was required: New data products were needed, and the data platform itself needed both maintenance and new features as well. I made the decision to split the team in two, creating a Data Platform team and a Data Product Development team. Each team would be focused on exactly one role, instead of trying to split the focus between two distinct functions.

Data Management Code Split

Period 2021
Company Pulsepoint
Tools Git
Platform GitHub

Pulsepoint needed to split the Data Management team into a Data Platform team and a Data Product Development team. This also meant splitting the code, since the entirety of the ETL pipeline was in one monolithic repository. The team had to develop a means of crossing repository boundaries to establish the pipeline steps (e.g.: Job A in repository 1 is dependent on Job B in repository 2). We also had to come to agreements on how to determine which team got which pieces of code.

Advanced Search Tool

Period 2014
Company OrcaTec, LLC
Tools Python, jQuery, jQueryUI
Platform Server: TurboGears, Browser (Cross Browser)

At OrcaTec, the primary tool we provided to our customers was the ability to search collections of documents quickly. In addition to having simple search tools, we also had a helper tool in the “Advanced Search”.

This tool allowed the user to search based on a dozen different fields, but was still limited and fragile. It was unable to help the user build queries which combined different fields in a single clause. In addition, it had issues with encoding <> in email addresses, and did not support drag and drop on all of our supported browsers.

When this project was completed, this tool had transformed noticeably. It now is its own miniature investigative tool, allowing customers to easily search through collections of documents. One customer reported narrowing their searches from 80,000 possible documents down to under 2,000 within an hour through use of this tool. Due to extensive test coverage when the code was published, even the problems that were found were quickly fixable. All of this was accomplished while reducing the total code for it by 50%.

Paster to Apache/mod_wsgi Conversion

Period 2013
Company OrcaTec, LLC
Tools Python, Apache, mod_wsgi, Paster
Platform Ubuntu Linux

Paster is meant to be used in a development environment, allowing the developer to use a (single threaded) lightweight, easily managed webserver while writing code before it goes to production. At OrcaTec, we were using Paster both in development and in production. Due to the demands being placed on Paster (in many instances, loading up documents that were over 100M), the entire system could appear (to one user) to freeze up due to it responding to a request from another user.

After analysis, we were able to determine that Paster was no longer suitable for our needs. Since Apache, with mod_wsgi, provides an at least adequate performance web server (in comparison to others like Nginx), and the Apache configuration was already known to the team, we chose to switch from Paster to Apache. This allowed us to have Apache itself serve up static files (like images, css files, and javascript files), leaving the dynamic pages to the Python code.

StorageWeb

Period 2010
Company Datapipe
Tools FreeBSD, Python, Apache, PostgreSQL, TurboGears
Platform FreeBSD, Web Browser

Datapipe manages thousands of servers. Many of these servers are connected to various shared storage systems, including 3Par, Isilon, and backup servers. Datapipe required an ability to do reporting on what data was being stored on these systems for each client, and then report that data back to billing. StorageWeb was written to fill that need.

UNIXOps

Period 2010
Company Datapipe
Tools FreeBSD, Python, Apache, PHP
Platform FreeBSD, Web Browser

Datapipe provides managed hosting for its clients. This means that customers contact Datapipe to report issues on servers, and Datapipe administrators log in to customer machines as root to fix the problems. UNIXOps provides a secure method to allow the administrators a one time SSH key to login to the customer equipment, along with providing detailed logging of everything the administrator does for later review.

SAP to Tyler Conversion

Period 2011
Company 6th Avenue Electronics
Tools AutoIt3, CentOS Linux, Python
Platform Server: CentOS Linux, Client: Windows

6th Avenue Electronics found that SAP was not a workable solution for them. The decision was made to switch back to the Tyler POS system, clearing out old mistakes and improving maintainability. I managed the technical aspects of the migration, while my immediate managers handled the business aspects.

Due to the costs associated with SAP, we had just over three months, in total, to complete the transition. We were successful.

PyTyler - Tyler POS to PostgreSQL Migration Tool

Period 2007, 2011
Company 6th Avenue Electronics
Tools Python, PostgreSQL, Tyler POS System
Platform HP-UX, Debian GNU/Linux

Tyler is a point of sale system used by many smaller retail establishments. Tyler stores data in a set of proprietary ISAM files. These files do not have a modern access tool available (such as Crystal Reports) to perform reporting.

The users needed an easy way to report on the data, and this meant a tool was needed to copy the data from the on-disk files into a formal SQL server of some variety. In less than a month, I wrote a tool in Python to read the Tyler data files and load the information into a PostgreSQL database on a nightly basis.

This tool copied the entire database, comprising approximately 36,000,000 records, 140 tables, and 22 gigabytes of disk space. The program worked by reading the structure definition from the configuration files and recreating the structure in PostgreSQL. PyTyler would then read each table, row by row, parse the data in the row, and load it into PostgreSQL server.

This allowed the users to use standard ODBC drivers to access and report on the data.

VMware Implementation

Period 2005-2007
Company 6th Avenue Electronics
Tools VMware Virtual Infrastructure 3, VMware Virtual Center
Platform Linux (Various distributions), Windows Server 2003

6th Avenue Electronics, like many companies, had a growing need for individual servers for various internal services. They chose to implement VMware to reduce hardware costs, downtime, and environmental costs.

SBN Implementation

Period 2004-2005
Company Diversified Systems
Tools SBN, Sybase 11.0, PHP
Platform Microsoft Windows 2000, Debian GNU/Linux

SBN, published by IBSoft, is an ERP system for the alarm industry. Diversified Systems is a subcontractor working in the low voltage electrical industry, including alarm systems, stereo systems, central intercom systems, structured wiring, and central vacuum systems. I implemented all aspects of SBN at Diversified Systems.

The provided client interface was unsuited for the intended use. This resulted in much in-house development to augment the SBN client with a web-based interface.

SQL-Ledger Implementation

Period 2005
Company Diversified Systems
Tools Perl, Apache
Platform Apache, Debian GNU/Linux

The SBN accounting system was inadequate for the needs of Diversified Systems. This lead to the selection and installation of an external accounting package.

KP-CIS

Period 2001-2002
Company Ciber, Inc., contracted to IBM
Tools Perl, Cygwin, GNU Make
Platform Server: AIX, Client: Windows NT

IBM was under contract to develop a complete clinical information system for Kaiser Permanente clinics. I participated as a member of the environment team, focusing on improving the build processes.

You may download my virtual business card. Alternately, the image below is my vcard, embedded as a QRCode. Using Barcode Reader (Android) or Scanlife (iPhone), you can scan it and add me as a contact directly on your phone.

QR-Code: Michael Pedersen's VCard