Multiple Linear Regression Intuition

Multiple linear regression works the same as simple linear regression, except for the introduction of more independent variables and their corresponding coefficients.

multiple linear regression

When there are more independent variables, the assumption is there are multiple factors affecting the outcome of the dependent variable. A predictive regressor model can be more accurate if multiple independent variables are known.

Following are some examples:

If the dependent variable is profit, then independent variables could be R&D spending and marketing expenditures.

If the dependent variable is grade, then independent variables could be time of study, and hours of sleep.

When to Make Dummy Variables for Multiple Linear Regression

categorical variables

Consider the example where profit is the dependent variable. The challenge is find the correlation of how the independent variables affect profit. In the image above, in the independent variables have a blue background in the header. The first three independent variables are expenditures. Thus, it’s easy to associate each expenditure to a variable. But for the state, it’s not that simple. This is called a categorical variable. The approach to use in this instance, is to create dummy variables for the categorical variable.

dummy variables

As shown in the above image, you need to create separate columns for each category. You populate each row of your dummy variable column with a 1, if that row in the state column matches the heading for the dummy variable column. Otherwise, you populate that row with a zero. And you only need to include on of the dummy variables in your equation. In the image above, we know that if D1 is a 1, then the company is in New York. If it’s zero, then the company is in California. We do not lose any information by including only one dummy variable in the equation. This approach may seem bias, because you lose the b4 coefficient when the state is California. But that is not the case. The regression model works by altering the equation for California by changing the b0 coefficient.

Avoid The Dummy Variable Trap

dummy variable trap

The problem with including both dummy variables is it amounts to a duplication of variables. This phenomenon, when one (or more) variable predicts another is called multicollinearity. The result is that the regression model will not work as it should because it is not able to distinguish the effects of the one dummy variable from the other. This problem is referred to as the dummy variable trap. The key takeaway is, when building a multiple linear regression model with dummy variables, you should always omit one dummy variable from the equation. This rule applies irrespective of the number of dummy variables.

Finally, there are different ways to build a multiple linear regression model. The common methods are backwards elimination, forward selection, and stepwise regression.

Simple Linear Regression Intuition

simple linear regressionYou may recognize the equation for simple linear regression as the equation for a sloped line on an x and y axis.

Simple linear regression involves a dependent variable. This is an outcome you want to explain. For example, the dependent variable could represent salary. You could assume that level of experience will impact salary. So, you would label the independent variable as experience.

The coefficient can be thought of as a multiplier that connects the independent and dependent variables. It translates how much y will be affected by a unit change in x. In other words, a change in x does not usually mean an equal change in y.

In this example, the constant represents the lowest possible salary. This would be the starting salary for someone with a zero level of experience.

regression line of best fitSuppose you have data from a company’s employees. You could plot the data with the red marks as shown above. Then would draw a line that “best fits” the data. It’s impossible to connect all the marks with a straight line, so you use a best fitting line. The equation for this line would be the result of your simple linear regression. The regression finds the best fitting line. This line is your regression model.

How does Simple Linear Regression find the best fitting line?

The regression model is found by using the ordinary least squares method. Please refer to the following illustration.

ordinary least squares method First, look at the notation. Notice that the red mark is actual data, and the green mark is the predicted model. For these red and green marks in the boxed-frame, the actual salary is higher than what was predicted in the model. So, this employee makes a higher salary than what the model predicts. Of course, there are other variables besides experience that can affect salary. But in this case, we keep it simple. Hence the term, simple linear regression.

What the regression analysis does is take the sum of all the squared differences between the actual value and the predicted value. The analysis requires this to be done for the many different lines that can “fit” through the data. The line that has the minimum sum of squared differences compared to the other lines is the best fitting line. The equation for this line represent your simple linear regression model.

A detailed example of a regression model in the R programming language may help to understand this concept.

Machine Learning MOOC Instructor Shares Insights

Machine Learning MOOC instructor Hadelin de Ponteves is the primary instructor of Machine Learning A-Z on Udemy. This blog post is a quick summary of a podcast featuring Hadelin and his perspective on data science.

Before building a Machine Learning MOOC, Hadelin worked at Google and Canal+, which is the French competitor to Netflix. He stated that his biggest challenge in data science was to build a recommended system for Canal+. Recommended systems are based on an algorithm that suggest what movies for the user to watch.

He states he was able to quickly land a job after graduating from college. Many corporations are starting a data science team, so demand is high.

What is Machine Learning?

Machine Learning is a broad field. It can be used to predict the future. It can used to find an unknown. It covers many sub-fields, and can also be referred to as Artificial Intelligence. Essentially, it involves machines that learn how to do things.

Data science and machine learning go hand in hand. Linear regression is an example of data science that requires machine learning. Other two important areas are classification and clustering. Other sub-fields are association rule learning, reinforcement learning, deep learning, and natural language processing.

R and Python are most commonly used tools. They have great libraries for machine learning. Hadelin’s Machine Learning MOOC covers all sub-fields, and it gives example in both R and Python.

R vs. Python, Which Is Better?

The debate over R vs Python is fruitless. The fact is they are both widely used, and each one has its strengths. If you are new to data science, then you should get familiar with both languages. This is the best way to learn which tool you prefer. For example,  Hadelin prefers R for visualization. This is the tool he used at Canal+. For deep learning, he prefers Python.

Finally, Hadelin states the best way to learn are to solve challenges base on real world problems. His favorite book on the topic is Data Science for Business. Once you have a grasp of data science concepts, this book will add value to your understanding.

PHP Developer Job Description from a PHP Web Developer

A PHP developer job description will vary depending on who you ask. This is what I believe after working for one year as a PHP developer. Part of being any kind of developer is to fit nicely into the job you are being hired for. In my case, I was hired as a developer to work on web applications for a factoring company. I knew from the start that the programming team was small. I would be given projects from the IT system engineer, and would be required to participate in conference calls with company executives on a weekly basis. There was another programmer who would be available to help me with some of my projects as needed. This was my first job contract as a web developer. This article should provide you with things that I do almost daily as my PHP developer job description.

The first thing I probably do for every session is update my repository. My company uses Tortoise SVN for its version control. I can have multiple sessions in one day, thus repository updates can happen multiple times as well. I quickly learned this would be an important habit to start.

You Should Also Know SQL

I’ve only been on this job year. My first projects were limited to web applications that provide financial reports for the management. The system engineer felt this was the best way to get me familiar with the database. It made sense, because I would be using the database, but only reading from it. In other words, I could do much wrong by not doing any inserts, updates, or deletes on the database. And this brings me to a very important point. Much of a PHP developer job description includes other programming skills besides PHP. I would also be using SQL, and on a Postgres database. SQL was familiar to me, but not to the extent that I needed it to be. I started using stack overflow early in my job as a valuable resource for getting answers to programming questions.

Be Ready to Learn New Stuff

PHP developers should know other programming skills like HTML, CSS, and JavaScript. I also use the Bootstrap framework because it allows me to develop faster, and it makes the web pages look modern. I had not used much JavaScript before this job, and  did not think I would need it much. This was a big mistake. And this brings me to my next important point. If you don’t know something that you need, then start learning it immediately.

Companies do not only hire programmers because of what they know, but also because of their ability to solve problems and learn quickly. In my case, I signed up for some JavaScript courses on Udemy. I also took a course on how to use AJAX calls in web applications, and this really paid off. The users  loved how the pages would load because of the AJAX. And I learned Google Charts to represent data visually.

As for coding, I work on PHP scripts virtually every day. Another skill I’ve picked up is writing code that can be reused in scripts. I breakdown every project into small parts, and this allows me to write code in small parts. Good naming conventions for variables, files, and folders helps produce code that is easy to copy and paste into new scripts. It becomes a huge time saver. Last but not least, commenting is huge! If nothing else, write good comments for your own benefit. Assume that next time you look at your code, you will have forgotten everything. You will appreciate being able to read what your code does in layman’s terms.

PHP Developer Job Description Summary

  • Use Great Communication Skills: Understand the difference between talking to people inside and outside of IT at your company. Learn to take feedback and input from people outside of IT. These are the users of the web applications you build.
  • Keep Your Repository Updated: Version control can intimidate people new to programming. But understanding how it works, and using it are vital to practically all developer jobs.
  • Know SQL: PHP is a server-side scripting language. It goes together with SQL like peas and carrots.
  • HTML, CSS, and JavaScript are Part of the Deal: If you think the company will appoint someone to write your JavaScript, then you may be dreaming – or unemployed.
  • Stay on Your Toes: More than being just a web developer, you are a problem solver! Web technology moves fast, so keep your thirst for knowledge. Learn new things that you can apply and incorporate into your work.
  • Reuse Code: Once you begin to reuse code, then you will write code with consideration for it being easy to reuse. Comment well. This will help you become a fast and efficient developer for your company.

 

ITIL V3 Foundation Study Material – Know 5 Phases

ITIL V3 Foundation Study MaterialA good resource for ITIL V3 Foundation study material is crucial for a successful result on the exam. ITIL was acquired by AXELOS in 2013. It is a widely adopted framework for any business that needs to align its IT service with its business service.

Individuals who wish to take the ITIL Foundation exam should now that AXELOS has delegated the study courses and exam voucher to numerous accredited organizations. This is why a search for ITIL V3 Foundations study material will yield results from multiple organizations. They are all vying for your business to take their course or to sell you the exam voucher.

It is also worth knowing that some organizations provide package deals where the course and exam voucher can be purchased together. In some instances, these packages may be about the same price as purchasing a single exam voucher. Shop wisely, and look for reviews on courses that interest you.

Once you find a resource for ITIL V3 Foundation study material, you should internalize the ITIL paradigm for the IT service life-cycle. Quickly internalizing the ITIL paradigm will best prepare you for the exam.

The ITIL Service Life-Cycle

You should know that ITIL breaks down the service life-cycle into five phases.

These phases are:

  1. Service Strategy
  2. Service Design
  3. Service Transition
  4. Service Operation
  5. Continual Service Improvement

Within each phase, there are a set of processes. It is a bit of memorization work, but it is worthwhile to know the phase which any give process falls under. For example, demand management and financial management are two processes that fall within the service strategy phase. Incident management, problem management, and event management are processes that fall within the service operation phase.

Intimately knowing the five phases, and the sets of processes within each phase, is probably the best tip for anyone who has to learn the ITIL V3 Foundation study material. Another tip would be to understand that ITIL has its roots going back to a 1980 project with the UK Government’s Central Computer and Telecommunications Agency. From there, it organically grew to a globally recognized, vendor neutral, framework. Having this frame of reference should validate the time and effort it takes for someone to learn the study material.

Internet Protocol Layer – Beauty in Simplicity

internet protocol layerThe Internet Protocol Layer is one part within the four layer architecture of the TCP/IP model. This layer is responsible for transmitting packets of information across the network. It has no other concern with the other layers in the model. This narrow focus of the Internet Protocol layer allows the network engineers to deal with a small piece of a very large and complex challenge. It is sometime referred to as the Internetwork Protocol, because it deals with getting messages from network to network.

A nice feature about IP is that it does not have to be perfect. It’s designed in a way that data can sometimes get dropped, or sent different ways, but in the end it corrects itself and ultimately works. This layer had to introduce, and relies heavily on, the address of the destination host. This is what we call the IP address.

The IP address format is four numbers separated by dots. Each number is between zero and 255. The address is broken into two parts. The prefix is the network number. The second part is the computer number within the network. For example, a college campus could have one network number. So, this prefix in the IP address will be the same for every computer on that network. When a packet of information comes zooming across the internet for that campus, the routers only worry about the prefix, i.e., the network number.  This greatly simplifies the job of the router, because it only worries about the prefix. This allows routers to work very fast. Once a message reaches the destination network, it is up to that network to forward the message on the correct computer.

DHCP for Computers that Move Around

network address translationDynamic Host Configuration Protocol (DHCP) is the technology that allows someone to take their laptop to a school, then a coffee shop, and then home. Yet, everything still works. The user can still send messages back and forth regardless of their locations. This is because whenever someone opens their computer up at a coffee shop, or wherever, the computer sends out a message saying “Hey, I’m here, please give me a number to use on your network”. However, you may have noticed that wherever you are, your IP address starts with 192.168. This is actually a non-routable address that you get through a technology called Network Address Translation (NAT). You only see this non-routable address, and you do not see the real unique address assigned to you by the network.

Time to Live Saves Internet Protocol Layer From Infinite Loops

Because routers work imperfectly with imperfect information, they can occasionally send packets of information round and round through the same subset of routers. If this process were to never stop then an infinite loop forms. The router is mistaken by thinking it’s routing the packet of information correctly. It doesn’t know that it’s looping the packet. This problems gets corrected with a Time to Live (TTL) field inside the router. TTL starts a number, say 30, and each time a packet passes through that router, it subtracts one from the TTL field. If TTL goes down to zero, meaning the packet looped through 30 times, then the packet gets thrown out. When a packet gets thrown out, a notification is sent back to the sending computer to inform it that there was a problem. The computer can then send it out again until it successfully hops its way across the internet. If the sending computer wants to find out exactly when and where the package got thrown out, it can fun a program called Traceroute to diagnose the problem.

The simplicity of how routers work is one reason why the TCP/IP model succeeded. Routers don’t have to worry about the order of packets, they don’t have to store information, but rather they just forward on packets according to their best guess. They don’t have to be perfect. This allowed for the internet to be scalable, and to grow quickly.

Network Infrastructure Evolution

store and forwared networkingNetwork infrastructure evolution begins with the store and forward networking model. This model was how early internet adopters (1960s – 1980s) would send messages back and forth to host computers. While being able to send a message across a network infrastructure was a revolutionary computing breakthrough, big deficiencies did not go unnoticed. With this model, a message got sent one at a time. They would get sent through a series of hops from one computer to the next. When a message was received by an intermediary computer, it would be stored there, and then forwarded on to the next computer once the line was open. A big problem was that a long message would clog the system, and drastically slowdown the delivery of other messages waiting in que. Another problem is that there was not a built-in method for dynamically addressing outages in the network.

packet switching
The idea of packet switching lead to a shared network infrastructure.

After more than 20 years of researching ways to address problems in store and forward networking, the idea of packets was innovated. With the notion of packet-switching, a message is broken into small packets. The packets get sent out on the internet to find their way. These packets would also have to traverse a series of hops. However, because messages are broken into smaller packets, it leads to better sharing of resources for transmission of data. Further, packets of the same message are not required to take the same series of hops to reach their final destination. The packets have no regard as to how they find their way, but they do know when all the packets of the message have arrived, and how to assemble back to the complete message.

This notion of packet-switching lead to the shared network infrastructure that we use in our TCP/IP networks today. With this notion, the network of big computers evolved to a shared network of small routers. The main purpose of the routers is to forward packets. Moreover, the existence of a single router would become less relevant than one computer in store and forward networking. In that model, one computer played a critical role in the whole reliability of the network. However, with much more routers setup everywhere with the sole purpose of forwarding packets, it was to become not so critical if one router went offline. There would be other paths available for the packet to be routed through.

network infrastructure
The TCP/IP layered network model

However, this problem of reliability was still a big problem. The way you solve a big problem is to break it down to a subset of smaller problems. Then you can focus on solving each smaller problem. Breaking down this problem lead to the layered network model. There were several variations as to how many layers the problem got broken into, but the model that became most popular is the TCP/IP (Internet Protocol Suite) model.

The TCP/IP model consists of four layers. They are Application, Transport, Internet, and Link. So to solve the whole problem of internet reliability, you can focus on one layer at a time. Each layer presents a difficult problem in itself, but it is manageable.

When discussing the evolution of our shared network infrastructure, it must be noted that there is also a model called the 7 layer OSI model. The Open System Interconnection model competed with the TCP/IP model as the preferred model for building out the internet. TCP/IP has won the mind share, but the OSI model remains valid.

Definition for Open Source Software is Linux

If you searched the definition for open source software, it would make sense to find a description of Linux.

definition for open source software creator
Linsu Torvalds is the creator of Linux – the definition for open source software.

Linus Torvalds is the software engineer who wrote Linux. It started as a personal project and grew to become the largest community driven computing effort ever recorded. Linux is considered an open source version of Unix. The file system is hierarchical, with the top node referred to as the root. Additionally, processes, devices, and network sockets are all represented by file-like objects. The benefit is that these representations can be worked with like they are regular files.

Linux is a multitasking, multiuser operating system. Its built-in networking and service processes are known as daemons in the UNIX world. To understand the power and popularity of Linux, just consider that it powers roughly 80% of financial transactions, and 90% of super computers.

What probably gives it the definition for open source software is that it is a collaborative effort. Technical skills and willingness are all you need to contribute to the effort. The Linux Kernal is 15 million lines of code. A major new kernal comes out every 2-3 months. This rate of development is unmatched in the industry. Thousands of developers contribute to its evolution, but Linus Torvalds has ultimate authority over new releases.

Arguably, the most important decision Torvalds every made was to grant Linus a GPU license. This gave people freedom to use, change, and share Linux.

The Linux Community

If you work in Linux, then at some point you will want to engage with the Linux community: you can post queries on relevant discussion forums, subscribe to discussion threads, and even join local Linux groups.

The popularity of Linux at the enterprise level helped create an ecosystem of enterprise support, with contributions coming from major tech companies. IBM is recognized as one notable contributor.

Linux users connect with each other the following ways:

  • Linux User Groups (both local and online)
  • Internet Relay Chat (IRC) software (such as Pidgin and XChat)
  • Online communities and discussion boards
  • Newsgroups and mailing lists
  • Community events (such as Linux and ApacheCon)

The most powerful resource for the Linux community is linux.org. This site is hosted by the Linux Foundation. It has many discussion threads, tutorials, and tips.

MVC PHP Framework Code Igniter

mvc php codeigniterCode Igniter is an MVC PHP Framework. MVC stands for model view controller. This design pattern is widely used in the software industry. It is a widely proven architecture for good development of software.

MVC architecture has two main advantages. First, it allows for easy reuse of code. Second, it separates the components of development.

Three MVC Components

The first component of development concerns where to store the information. This component is handled by the Model. The Model interacts with the database.

The second is UI or UX (User Interface or User Experience). UX is what the user sees. This is handled by the View.

Finally, there is the processing of information. This prepares the information to either be displayed to the user, or stored in the database. Processing of information is handled by the Controller.

In other words, Model is the memory of MVC, View is the face of MVC, and Controller is the heart of MVC.

Code Igniter for MVC PHP

Code Igniter implements this design pattern in an easy to understand manner. Code Igniter is lightweight and agile. It does not take up many resources on your computer. It has many built-in features that allow you to code PHP fast and easy. Some background on object-oriented programming is helpful to fully understand Code Igniter.

Search Engine Optimization Tools to Start a Site Audit

It is good idea to start with some general search engine optimization tools (“SEO”) to start the audit of a website. Browseo is a good SEO tool. It allows you to see your website the same way a search engine does.

To use Browseo, simply type in the URL you want to analyze, and wait for the results to come up. On the right hand side of the results, there is information regarding how many words and links are on the page. You can also see the given title of the page, keywords, and meta description.

Another useful result is the SERP (“Search Engine Results Page”) preview. Analyzing the SERP may uncover some ways to modify the meta description to encourage more click-throughs.

Being able to see all this information, along with the character count for each section of the page, makes this a useful SEO tool. Having this information grouped together allows you to easily check if it aligns with best SEO practices.

Check Domain Before you Buy

It is suggested that you check a domain name before you buy it. There is a chance this domain was used before you. In most instances, the domain will be okay. However, if it was ever used for malicious or spam activity, then it may be very difficult to rank with that domain.

There is a free SEO tool called WayBack Machine. This site archives web pages, and allows you to see what they looked like in past years. Type in the domain and browse the history.

search engine optimization tools

Look closely at the different archives of the domain in question. Check for spam content, such as link farming, or anything that looks malicious. If you see this, then that domain could work against any SEO efforts.

If your domain has been marked negatively by Google, it is possible to submit the domain to Google, and ask for reconsideration. However, this can be a time-consuming task. Depending where you are in the process, it may be better to register a different domain. Your best bet is to check a domain before you buy it.