Puppeteer.JS – Using Headless Chrome for Site Crawling

PuppeteerJS essentially allows you to automate Chrome.
Headless Chrome allows you to run Chrome without actually rendering the webpage. Sounds silly, but has a lot of useful applications, you could for example simply write a test script that ensures that your website is still working correctly.

Installation

npm i puppeteer
# or
yarn add puppeteer

Usage

We are going to look at a quick example of how to log In to a site and then do some operation.

Inititalize Puppeteer

You need to run it in an async function, simply because you do not know how long it will take until chrome has started.
so with

We start our browser. The flag headless is set to ‘true’ as default, however for debugging purposes, you should set it to ‘false’;

Login

To Login into the site we need three things:
* The URL for the Login Page
* CSS Selector for the Username Field
* CSS Selector for the Password Field

To obtain the the Selectors you can use the Chrome DevTools (F12). Simply select the HTML Field and with Rightclick select Copy Selector.

Fetch all Links

Now since you are logged in to the site, you can navigate to any site and fetch all the links.

Final Code

So good they can’t ignore you

Notes on the Book So good they can’t ignore you by Cal Newport

 

Rule 1

Don’t follow your passion

Rule 2

Be so good they can’t ignore you (Importance of Skill)

Rule 3

Turn down the Promotion (Importance of Control)

Rule 4

Think Small, Act Big (Importance of Mission)

Don’t follow your passion

Steve Jobs story did not start with computers, however over time he became passionate about it

  • Passion is Rare
  • Passion Takes Time
  • Passion is a Side Effect of Mastery

Be so good they can’t ignore you

(Importance of Skill)

Adopting the Craftsman Mindset

Tice is willing to grind out long hours with little recognition, but

That’s because it’s in service to

Something he’s obviously passionate about and

Has been for a long time. He’s found that one job

That’s right for him

The Power of Career Capital

If you want a great job, you have to build up rare and valuable skills – career capital

Passion mindset – what can the world offer me

Craftsman mindset – what can I offer the world

Traits that define good Work

  • Creativity
  • Impact
  • Control

Three Disqualifiers for applying the craftsman mindset

  • The job presents few opportunities to distinguish yourself by developing relevant skills
  • The job focuses on something you think is useless or actively bad for the world
  • The job forces you to work with people you really dislike

Becoming a Craftsman

Deliberate Practice

If you just show up and work hard, you’ll soon hit a performance plateau beyond which you fail to get any better.

To successful adopt the craftsman mindset, we have to approach our jobs in the same way as Garry Kasparov his chess training – with a dedication to deliberate practice.

Five Habits of the Craftsman

  1. Decide in What kind of Capital Market you are in
  2. Identify your Capital Type
  3. Define “Good”
  4. Stretch and Destroy
  5. Be Patient

Turn down the Promotion

(Importance of Control)

Control without career capital is not sustainable.

The point at which you have acquired enough career capital to get meaningful control over your working life is exactly the point you‘ve become valuable enough to your current employer will prevent you from making the change.

The Law of Financial Viability

When deciding whether to follow an appealing persuit that will introduce more control to your worklife, seek evidence of whether people are willing to pay for it. If you find this evidence continue. If not, move on.

Think Small, Act Big

(Importance of Mission)

The Law of Remarkability

For a mission-driven project to succeed, it should be remarkable in two different ways. First, it must compel people who encounter it to remark about it to others. Second, it must be launched in a venue that supports such remarking.

WebDev – Articles – 01

A short summary of some of the interesting articles I came across in the last couple of weeks.

UI

Most Important Color in UI Design
A very interesting article on the Color Blue and why it is used so much.

Tools

Canary 62 Dev Tools

One of the highlights of the new Dev tools is that you can capture screenshots of a specific node.

  1. Select the Node
  2. Open the Command Menu (CTRL-SHIFT-P)
  3. Type ‘capture node’

https://developers.google.com/web/updates/2017/08/devtools-release-notes?utm_source=feed&utm_medium=feed&utm_campaign=updates_feed

VSCode 1.16

  • HTML now finally gets autoclose tag. However it is not enabled for JSX, so you still need to install the Auto-Close-Plugin
  • Support of Typescript 2.5
  • Faster Refactorings for Typescript, by selecting a Code Segment you can rightclick and say “Extract Function”

https://code.visualstudio.com/updates/v1_16

WhatRuns

A useful tool to detect which technologies are used by a specific website.
https://www.whatruns.com/

Security

Edge Insecure by Design: Bypass Content Security Policy (CSP)
A reported Security Issue that effects Safari, Chrome and Edge, got fixed in Safari and Chrome. Microsoft responds: Works as Designed.

Random Articles

Trending Developer Skills based on Job Descriptions
An interesting way to take a look at the vacant Job Positions and concluding which technologies are currently being used for the Production Stack.

(tl;dr: Rapid Rise of React, NodeJS and Postgres)

Why Coding Style Matters
A short article about the benefits of a clear and consistent coding style.ama
“Programs are meant to be read by humans and only incidentally for computers to execute.”
— H. Abelson and G. Sussman (in “Structure and Interpretation of Computer Programs”)

VSCode: Launch create-react-app and Chrome with launch.json

Developing React (with create-react-app) and Visual Studio Code you usually press ESC-` and then npm start.
The script from create-react-app then automatically starts a Browser. That you then close.
hen reopen by pressing F5 to start Chrome in debug mode.

Let’s make these steps a little quicker.

Create a Task for npm start

Press Ctrl-Shift- and Select “Tasks: Configure Default Test Task”
This will create a tasks.json file.

In the tasks.json file you need to add a couple of values:
* isBackground: true – launch.json will not wait untill the task completes
* problemMatcher Needs to be defined to figure out when the task has completet its initialisation phase and it is safe to continue with the next task

Configure create-react-app

To prevent launching the browser you need to add in your .env-file following line:

BROWSER=none

More Info:
* .env-file

Configure the Launch.json file

Press F5 and select Chrome and a launch.json file will be created.
* Change the port to 3000 (create-react-app default)
* Add a preLaunchTask to start the task we defined earlier

Start Working

Tadaa, now you press F5 and can start debugging directly in vscode. The background task will continue running even when you stop debugging.

Stop Working

You need to terminate the task via ctrl-shift-p > terminate Task. (Or you just close vscode)

VSCode Extensions (August 2017)

Here are a couple of VSCode Extensions I currently am using.

tl;dr

Script to install all suggested Plugins

Installing Extensions

There are two ways of installing extensions. Either in the editor or via command line.

Editor

  1. Press Ctrl-P
  2. Paste Code like ext install eamodio.gitlens
  3. Click on install

Command Line

  1. Open command line
  2. Enter command code --install-extensions eamodio.gitlens

Markdown All in One

A litte tool that makes it a little easier to format Markdown documents
Adds useful shortcuts like ctrl-b to make something bold.

code --install-extension yzhang.markdown-all-in-one

Auto Close Tag

If you do not use Emmet shorthand, it is very useful of closing your html tags.

Sadly the plugin cannot figure out if you are defining a type in Typescript and also attempts to close the casting tags.

code --install-extension formulahendry.auto-close-tag

Auto Rename Tag

This plugin takes care of the closing tag of your html-tag. It also works for JSX&TSX files.

code --install-extension formulahendry.auto-rename-tag

GitLens

A great extension to vastly improve your git experience in vscode

code --install-extension eamodio.gitlens

TODO Highlight

Well it Highlights Todos… so well thats good.
You can extend it to display any type of label in your comments.

Plus it can generate a handy list of all todos in the project.
(Sadly jumping to the TODO in the List does not work in Ubuntu)

code --install-extension wayou.vscode-todo-highlight

Path Intellisense

Well it detects when you are trying to define a path to a file and autocompletes it for you.

In typescript it also correctly removes the file extension for imports.

code --install-extension christian-kohler.path-intellisense

edX – Microsoft: DEV275x Writing Professional Code

The course DEV275x Writing Professional Code, is a very short introduction to best practices when it comes to writing code.
As usual this is only my notes I took during the course, you defiantly should check out the course for yourself at
https://courses.edx.org/courses/course-v1:Microsoft+DEV275x+2T2017/course/

Module 1: Elements of Professional Code

Source Control with Git

Source Control is one of the most important aspects of programming.

  • Backup of your Source Code
  • Ability to compare with changes done in the past
  • Restore previous versions if something goes wrong with the new version
  • Easy collaboration with other people

There are many different Software packages that enable Source Control.
Currently the two most popular systems are Git (70% of Programmers) followed by SVN (10% of Programmers) (Survey of 30k Developers)

The core difference between the two is, that for SVN you need to set up a dedicated Source Control Server, and all changes are tracked there.
With Git it is distributed, so you can use it locally and if you choose in combination with a server.

Especially Code Editors like Visual Studio Code have Git directly integrated making it really easy to set up and use Git.

Programs:
* Git

Cloud Providers
* Gitlab For Private Repos
* Github For Public Repos

Workflows
* Comparing Workflows
* GitLab Flow

Markdown

Markdown is really great because you can learn it really fast, and even if you do not convert Markdown into a HTML site or PDF the text is still formatted quite neatly and readable.

Like Git you find support for Markdown files in common editors like Visual Studio Code / Atom.
And of course in blogging software like WordPress have plugins that enable Markdown for the Posts.

Module 2: Communicate with Code

Now this chapter was rather interesting, it focused on how those smaller things like code conventions actually help to improve the codebase.
While the presenter did not use automated tools to improve the code readability it was nice to see that it is a very important aspect of coding to get the really basic elements correct.

Consistency and Naming

Code should be formatted always in the same matter. It improves the readablity and removes all personal style from the code enabling all developers to immediatly take ownership of the code instead of saying well that is the style of developer A, he should fix it.

Naming is important and greatly improves the readability of the code it does not help to say var c = 0, it is much better to say var beanCounter= 0.
You do not write code for the computer but actually for other human beings. The compiler will then convert it into machine code, but you will probably not have to debug that.

Refactor Duplicate Code

A great problem is when the code base has a lot of duplicate code. As soon as that happens and some minor change changes the way how you do things, then you would have to go back and change all the different places where that piece of code is used.

Refactoring early reduces the risk that the next developer says, well, I will just do that with copy and paste.

Simplfy

This one is rather difficult, but by keeping the code and the structures simple and readable has an much higher benefit in the maintainability of the code than some complex structure that executes a micron second faster. Of course that depends on the program you are writing.

As a rule of thumb functions should be rather short, not hundreds of lines long. (Too short is also bad.)
If you needed to add complexity then you also should document why you are adding it and what is the best approach to understand that complex structure.

Module 3: Code Confidently With Unit Tests

Well writing Unit Tests and overall having Tests for your code, allows you to a) know the use cases of your code and b) allows you to see when you change something what else may have been broken while you were developing a new feature.

package.json: Updating Fixed Versions with npm-check

One of the common problems when running a larger project is that you need to use fixed versions in your package.json file. But at the same time you need to regularly update your packages.
The most elegant way is using npm-check. The small tool allows you to select which packages should get an update and update accordingly.

Installation

npm i -g npm-check

Usage

To update the packages in your project you now simply run npm-check -u. If you want to ensure that you are installing the exact package run it with the additional optional flag -E to ensure exact-versions.

npm-check -u -E

With Space you select the packages and with Enter you install the package.