Blog.

Data science, machine learning, complex networks

Using network science to diversify your career

Posted: 06/25/13 at 10:19 PM by Sears Merritt under complex networks

These days, conceiving and executing a career path has never been more difficult, especially if you are unsure of exactly where you want to end up. For this reason I asked myself, how can millions of resumes and network science be used as tools that make this task easier and more efficient? In this post, I'll briefly present the results of some preliminary analysis that begins to shed light on the answer to this question. To get started, I'll introduce an occupation network, which I constructed from nearly 1,000,000 resumes. Then, I'll present the network's degree distribution and explain how to compute a simple, yet powerful network property called centrality. Then, I'll discuss how to interpret the measure when one is uncertain of where to start, or go next in their career. And finally, I'll present the top 10 occupation titles that offer the most diversity in future opportunities, ranked according to their centrality.

For readers only interested in results, skip to the top 10 career diversifying occupation titles.

Adjacency matrix notation

Before we can discuss network measures, we need to define a standard way of representing them. Mathematically, a network can be represented as a matrix, denoted as \(\boldsymbol{A}\), and referred to as an adjacency matrix. Each entry in the matrix, \(A_{ij}\) indicates whether an edge connecting vertex \(i\) to vertex \(j\) exists. (The \(i\) and \(j\) in \(A_{ij}\) represent the respective row and column of the adjacency matrix.)

If all edges are considered to be equal in a network, then the values at each \(A_{i,j}\) entry are typically equal to 1 if an edge exists between the two vertices and 0 otherwise.

As an example, consider the undirected, unweighted, network shown above. An adjacency matrix representation of this network would have the form: \[\begin{aligned} \boldsymbol{A} = \begin{bmatrix} 0 & 1 & 0 & 0 & 1\\ 1 & 0 & 1 & 1 & 0\\ 0 & 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0\\ \end{bmatrix} \textrm{ .} \end{aligned} \]

Occupation network

Over the course of a few months, I crawled the web and acquired nearly 1,000,000 publicly available resumes. From these resumes, I extracted the chronologically ordered sequences of occupation titles contained in the documents, among other career defining features such as education level and skills. I then normalized the job titles and constructed an occupation network where an edge is placed between two occupation titles, \(i\) and \(j\), if at least one resume contains a direct transition from occupation \(i\) to occupation \(j\). This procedure produced a network of roughly 300,000 occupation titles and just over 1,000,000 edges.

Degree distrubtions

The first property I study when I analyze a network is the degree distribution. I do this for two reasons; its easy to compute, and the result sheds light on the network's structure and what types of dynamical processes may have created it. These results often motivate what analyses I conduct next.

Computing the degree distribution is straight forward. For each degree, which represents a vertex's number of incoming or outgoing edges, denoted as \(k\), we compute the fraction of vertices that have such a degree: \[\begin{aligned} \Pr(K = k) = \frac{n_k}{n} \end{aligned} \] where \(n_k\) represents the number of vertices with degree \(k\) and \(n\) represents the total number of vertices in the network. When a network is directed, the degree distribution is computed with respect to incoming edges, referred to as in-degree, and with respect to outgoing edges, referred to as out-degree.

The figure above plots the occupation network's in-degree and out-degree distrubtions and indicates that the majority of nodes in the network contain a small number of incoming and outgoing edges, while a select few possess the opposite trait - thousands of incoming and outgoing edges. This heavy tailed or scale-free pattern is found in many other real world social and technological networks and is often associated with heirarchical structure and complex behavioral or dynamical processes.

How might this type of complex network structure be exploited by career minded individuals? Those who are unsure of precisely what type of career they want to pursue, or where to go next, might be inclined to choose a position that does not constrain the diversity of their future opportunities. One way to satisfy this objective is to target occupation titles with high out degrees. By doing so, an uncertain job seeker gains the freedom and additional time to develop a more concrete plan for the future, without limiting their options and without stopping work. So which occupation titles have high degree? Ranking nodes according to their degree is a measure of network centrality, which we'll discuss next.

Centrality

Centrality measures seek to quantify the importance of a vertex using a variety of metrics. In social contexts, a vertex with a high centrality score is interpretted as being influential in the network while a low centrality indicates the opposite. As I mentioned earlier, with respect to the occupation network, a vertex's centrality can be interpretted as how well it diversifies future career options.

Degree centrality

We have already indirectly discussed the simplest of all centrality measures - degree centrality. To compute a vertex's degree centrality, one simply sums the number of edges leading to or from the vertex, depending on the desired direction. For an unweighted network, this is expressed mathematically, using the adjacency matrix, as, \[\begin{aligned} \textrm{Node i's out-degree centrality} = d_{i_{in}} = \sum_{j} \boldsymbol{A}_{ij} \end{aligned} \] \[\begin{aligned} \textrm{Node i's in-degree centrality} = d_{i_{out}} = \sum_{j} \boldsymbol{A}_{ji} . \end{aligned} \]

One limitation of degree centrality is that it places no weight on the centrality of neighboring vertices. For example, if a vertex has many neighbors that all have low degrees, the vertex will obtain a high degree centrality. On the other hand, if a node has only a few neighbors that point to it, but those neighbors have high centralities, the vertex is still given a low centrality. This type of effect is often times undesirable, especially with respect to occupation networks and maximizing the diversity of future career opportunities. Next, I'll present a simple metric designed to account for these properties.

Eigenvector centrality

Eigenvector centrality attempts to compensate for the short-comings found in degree centrality measures by assigning centrality values to vertices based on the centrality of their neighbors. Thus, if a vertex has only few neighbors, but those vertices are of high centrality, the vertex will have higher centrality score than it would if degree centrality was used. To compute vertex \(x_i\)'s eigenvector centrailty the centralities of its neighbors are tallied as follows \[\begin{aligned} x_i = \sum_j A_{ij}x_j. \end{aligned} \] For reader's wanting a more detailed explanation of how eigenvector centrality is computed, see this wikipedia page.

Top 10 most career diversifying occupations

Which occupation titles in the network are the most diversifying? In the table below, I've listed the top 10 occupations, ranked according to their eigenvector centrality scores. For comparison, I've also included their degree centrality scores.

rank occupation eigenvector centrality in-degree centrality out-degree centrality
1owner 0.13155195063
2manager 0.12750985279
3customer service representative 0.12650976081
4administrative assistant 0.12353676153
5sales associate 0.11540434996
6office manager 0.11137424056
7supervisor 0.10936053804
8assistant manager 0.10831374037
9customer service 0.10531493582
10sales 0.10129023126

Overall, this list of occupation titles is customer service, management, and sales centric, as opposed to being science or engineering focused, which isn't surprising. Science and engineering oriented occupations often require highly specialized skill sets and higher levels of education than the occupations listed above, and as a result, they have smaller numbers of opportunities than occupations that require generic skill sets.

Conclusion

At this point you might be asking yourself, where do these occpuations lead? What skills, education, and experience do they require? How much do they pay? And perhaps more importantly, can they get me closer to achieving my long term career goals?

Hopefully this discussion of how network science can be successfully applied to career path dynamics has piqued your interest enough to try it for yourself. I'm in the early development stages of deploying much of what has been presented here to the optimization and prediction of individual career paths in the form of a web application, www.pathop.com. Consider creating an account and see what lies ahead in your career!

perma-link