Follow Hakin9 – IT Security Magazine on Feedspot

Continue with Google
Continue with Facebook

Facebook Twitter Google+ LinkedIn
Potential for Hack-Back Legislation

Government officials and experts are weighing in on the concept of ‘hacking back’, the practice of potentially allowing U.S. companies to track down cyber attackers and retaliate.

Former head of the CIA and NSA outlined his thoughts to the Fifth Domain on the Hack Back issue currently being debated by Congress. He is cautious but has expressed an openness to allowing some levels of retaliation by private organizations.

General Hayden is a very sharp and brings unprecedented national intelligence experience to the table, but I must disagree with his position on the risks of enabling businesses to ‘hack back’.

I have had the pleasure of an in-depth 1:1 discussion with him regarding the long-term nation-state threats to the digital domain and have always been impressed with his insights. However, this is a different beast altogether.

Allowing U.S. companies latitude to hack-back against cyber attackers is very dangerous. I believe he is underestimating the unpredictable nature of business management when they find themselves under attack. Unlike U.S. government agencies, which firmly align themselves to explicit guidance from the Executive branch, the guard-rails for businesses is highly variable and can be erratic. Decisions can be made quickly, driven by heated emotion.

The average American business does not understand the principles of active defense, proportional damage, or have insights to establish and operate within specific rules of engagement. They certainly don’t have the capacity to determine proper attribution, gather necessary adversarial intelligence, or even understand the potential collateral damage of weapons they may use.

Instead, we can expect rash and likely volatile responses that lash out at perceived attackers. Unfortunately, cyber adversaries will quickly seize on this behavior and make their attacks appear as if they are coming from someone else. It will become a new sport for miscreants, anarchists, social radicals, and nation states to manipulate their targets into hacking-back innocent parties. As the meme goes, “On the Internet, nobody knows you’re a dog”.

Hack Back Consequences

What happens when threats impersonate hospitals, critical infrastructure, or other sensitive organizations when they attack. The hack-back response may cause unthinkable and unnecessary damage.

Congress is also considering allowing companies to ‘hack back’. Senator Sheldon Whitehouse recently indicated he is considering a proposal to allow companies to “hack back” at digital attackers.

Weaponizing Businesses

I think the whole “hack back” movement is entirely misguided.

Many compare it to ‘stand your ground’ situations, as they try to convince others to join public support. But such verbal imagery it is just not applicable. A better analogy is saying if someone breaks into your house, you should have the right to break into their home or whomever you think did it (because you really won’t know). Most would agree it is not a good idea when framed that way.

Now consider whom you will be empowering to make such decisions. Businesses who were not able or responsible enough to manage the defense of their environment in the first place, will be given authority to attack back. Yet, it is unlikely they will truly understand where the actual attack is originating. They will be acting out of rage, fear, and with weapons they have no concept of potential collateral and cascading damage it may cause.

Every time I have heard an executive wanting to be able to ‘hack back’, it was someone who as not savvy in the nuances of cybersecurity and lacked the understanding of how incredibly easy it is to make an innocent 3rd party look like they are the ones conducting an attack. When I brought up the fact it is easy to make it appear like someone else was behind the strike, such as a competitor, government agency, or hospital, the tone radically changed. Attribution for cyberattacks can take experts months or even years. Businesses have neither the expertise nor the patience to wait, when they want to enact revenge.

Simple Misdirection

If allowed, hacking back will become a new sport for miscreants, anarchists, social radicals, and nation states to manipulate their adversaries into making such blunders or be hacked-back by others who were fooled into thinking they were the source.

Allowing companies to Hack Back will not deter cyberattacks, rather it will become the new weapon for threats to wield against their victims.

Interested in more insights, rants, industry news and experiences? Follow me on MediumSteemit and LinkedIn for insights and what is going on in cybersecurity.

Originally posted:  https://medium.com/datadriveninvestor/should-companies-be-allowed-to-hack-back-after-a-cyberattack-bd6fd709d0d5

Facebook Twitter Google+ LinkedIn

The post Should Companies be Allowed to ‘Hack Back’ after a Cyberattack by Matthew Rosenquist appeared first on Hakin9 - IT Security Magazine.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Facebook Twitter Google+ LinkedIn

My co-founder Matt and I are working on Squally, a game in C++ to teach hacking. However, this creates a small problem: If we want to publish the game on Steam, then we can’t force people to download hacking tools first. Our solution? Put the tools inside the game, and let the game hack itself. 

This had another unintended consequence. If the game hacks itself, then we need some low-level control over the game — and for that, we need a low-level language. C# and Unity were no longer an option, so we had to start looking at some C++ options. We eventually settled on Cocos2d-x for our engine due to its popularity for 2D games.

To emulate real hacking tools, there are only two things that we needed. First, a disassembler for converting the raw machine code into human-readable x86 assembly (or at least nerd-readable). The second thing we need is an assembler to do the opposite.

With these two tools, we effectively have a game that can read and write its own code. We just have to tell the game exactly where to look, which we will explore later.

Editing the code in the game in x86 assembly

Of course this means you can totally crash this game, but as any hacker knows, crashing things is just part of the job.

However, there are some cool ways to mitigate this. One option is to save the state of the program (registers/stack) before the hackable region of code, and restore it after it executes to prevent silly mistakes.

For those who like to live a bit more dangerously, there are libraries for ignoring segfaults and other errors, which is pretty frightening. This is no match for a simple jmp 0x00000000 , but it’s a good start for easing some of the inevitable frustration of an aspiring assembly programmer.

For those interested in the nitty gritty details, the rest of this article will be about setting up some self-modifying code. It’s actually pretty easy.

Just a quick note, all code samples have been uploaded to: https://github.com/Squalr/SelfHackingApp

We’re going to leverage two libraries:
FASM, an assembler: https://github.com/ZenLulz/Fasm.NET
UDIS86, a disassembler: https://github.com/vmt/udis86

There’s a few catches as far as the implementation goes. We’ve got an assembler and disassembler. We find an instruction, and disassemble it. We change it to something else, and assemble it back into bytes. Now all we have to do is write those bytes to memory… and the program crashes.

Upon digging deeper, it turns out that the protections for the page of memorythat contain the code is marked as Execute only. It needs to be readable and writable. In Windows, this can be done using the VirtualProtect function in windows.h . In Unix, this is done via mprotect .

Below, we create a function called hackableRoutine(). Notice the line i += 60 . This is the instruction we are going to self-modify. There’s a few macros surrounding it, which simply get the start and end addresses of the code we want to hack. These are used to set up our HackableCode object, which is just a thin wrapper over FASM and Udis86.

At our program entry, we call the hackableRoutine() function once to initialize the hackableCode object. It returns 100, as you might expect from the code above.

Next, we update the hackableCode object and change it’s code to anything we want! In this example, we settle for a simple nop . As you might be able to guess, the second time we call hackableRoutine() , a value of 40 is returned! The i += 60 line has been effectively removed by self modifying code!

Here is my output, although this can change depending on the compiler:

mov eax, [ebp-0x2c]
add eax, 0x3c
mov [ebp-0x2c], eax

For those with a background in assembly, you may have noticed that 1 nop is far less bytes than the instructioni += 60 . The remaining bytes are filled with nop instructions automatically behind the scenes.

I’m not going to explain the implementation details around theHackableCodeclass — as stated before, it’s really just a thin wrapper over Fasm and Udis86. For reference, I’ll put the code for these at the end of this article.

One thing to note is that there is a limitation to this approach. Setting up the hackableCode object requires that the code be run at least once! This is not ideal for all situations. There should be a solution to this. If you find one, drop a comment below! If no C++ wizards beat me to it, I’ll post a follow-up when I figure it out.

Check out this project on Steam!



Facebook Twitter Google+ LinkedIn

The post How We Wrote a Self-Hacking Game in C++ by Zachary Canann appeared first on Hakin9 - IT Security Magazine.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Facebook Twitter Google+ LinkedIn

Monorail is an open-source issue tracker used by many “Chromium-orbiting” projects, including Monorail itself. Other projects include Angle, PDFium, Gerrit, V8, and the Alliance for Open Media. It is also used by Project Zero, Google’s 0-day bug-finding team.

This article is a detailed explanation of how I could have exploited Google’s Monorail issue tracker to leak sensitive information (vulnerable source code files and line numbers) from private bug reports through a XS-Search attack.

Where to start?

One of the first functionalities I looked into when analyzing Monorail was the ability to download the result of a certain search query as a CSV.

It didn’t take me long to notice that it was vulnerable to a CSRF attack. In other words, it was possible to force an user to download a CSV containing the results of a search query if a malicious link was accessed.


As seen in the image, there were no protections against CSRF attacks. So, for example, a request made with the “Restrict-View-SecurityTeam” tag would end up filtering the results to undisclosed security-related issues only. If a member of the Google security team or a high profile bug reporter were to access this link, they would download a CSV containing all undisclosed issues they have access to.

Duplicate and conquer

Another important discovery was that columns displayed in a search result could be duplicated, allowing us to arbitrarily increase the length of the generated CSV.

To illustrate, if we were to access the URL below:


The downloaded CSV would contain 3 repeated Summary columns, instead of only one.

CSV generated from a query containing the “Summary” column 3 times.
Come again? A XS-Search attack?

Combining these two vulnerabilities we have all that is needed to perform a Cross-Site Search (XS-Search) attack:

  1. Capacity to perform complex search queries.
  2. Capacity to inflate the response of a search query.

The second point is particularly important. If the response of a search query matches a bug, we can make the CSV significantly bigger than a query that doesn’t.

Because of this big difference in response length, it’s possible to calculate the time each request takes to complete and then infer whether the query returned results or not. This way, we achieve the ability to ask cross-origin boolean questions.

The phrase “cross-origin boolean questions” sounds weird, but it essentially means we’re able to ask questions like “is there any private bug that matches the folder `src/third_party/pdfium/`?” and obtain the answer cross-origin. This involves several steps that will be described in the following section.

For now, the examples below demonstrate the core of the issue:

1st case — CSV generated from query “Summary: This bug exists”.

2nd case — CSV generated from query “Summary: This bug doesn’t exist”.

3rd case — CSV generated from query ”Summary: This bug exists OR Summary: This bug doesn’t exist“.

As we can see, on the first and third case we would have an arbitrarily big CSV, because both queries match a bug with summary “This bug exists”. On the second case, the CSV would be empty (containing only the header), because the query didn’t match any bug with the Summary “This bug doesn’t exist”. Note that in the third case we are using the logic operator OR to query the first and second cases together.

To ask or not to ask?

One of the problems I had when trying to create a PoC was deciding what to search. Monorail’s search doesn’t allow us to query for specific letters in a report, only words. This meant that we couldn’t bruteforce the report char by char.

After realizing this, I had to take a step back and search older bug reports looking for information that was relevant and could realistically be exfiltrated by the attack.

That’s when I learned that many Chromium bug reports indicate the file path and line number where the vulnerability can be found.

Example from https://bugs.chromium.org/p/chromium/issues/detail?id=770148

That’s perfect for a XS-Search attack: since the folder structure of Chromium is public and Monorail treats slashes as words delimiters (a query for “path/to/dir” also includes results for bugs containing the string “path/to/dir/sub/dir”), we can easily generate the appropriate search queries.

So our attack would look something like this:

  1. We find out if there’s any private bug report that mentions a file in Chromium’s source tree. We do this using https://cs.chromium.org/chromium/src/ as the base query.
  2. We search for the first half of all the directories under src/ using the OR operator (e.g. src/blink OR src/build…).
  3. We keep repeating step 2 using the binary search algorithm. If anything was found (i.e. a big CSV was generated), we restrict the search space to the first half. Otherwise (i.e., an empty CSV was generated), we restrict the search space to the second half.
  4. After eliminating all directories but one, we restart step 2, but now adding the newly found directory to the end of the base query.

At the end of this process, the full URL will have been leaked and we can now (as an attacker) look into the corresponding file and try to find the vulnerability that was reported.

One request to rule them all

You might be wondering how we obtained the size of the CSV in step 3. Since the Same-Origin policy forbids us from accessing information across different origins, a naive response.length won’t work.

While we can’t know for sure the exact size of a response, we can measure the time each request takes to complete. Using the response-length inflation technique covered in previous sections, searches returning a bug would be a lot slower to finish than ones that do not.

However, to achieve a high degree of certainty, simply doing one request isn’t enough. We would need to request the same page many times and measure the average response time to obtain a reliable exploit.

That’s when the Cache API comes in handy, by only making one request and repeatedly calculating the duration that the response takes to be cached it’s possible to infer with certainty if the result of the search query returned bugs or not.

In other words, a small response takes less time to be cached than a bigger response. Given there are almost no limitations to the Cache API (and it being extremely fast), we can cache and measure the same response several times, and then compare it with the measurements of a known empty search query result, which allows us to easily differentiate a large response from a small/empty one, filtering out hardware and network variances, increasing the exploit’s speed and reliability.

For more information on how this can be implemented you can check the exploit’s code.


In total, I found three different places where this attack could be carried on, which resulted in CVE-2018–10099, CVE-2018–19334 and CVE-2018–19335.

I was also rewarded $3133,7 for each vulnerability, totaling over $9400.


If you have any questions, caught some typo or something that I missed, feel free to contact me on @lbherrera_


[1] Commit fixing the CSRF in Monorail’s CSV file download (https://chromium.googlesource.com/infra/infra/+/bdb78934b151ac75bf41711797bbf81130c5a502).

[2] Commit fixing the duplicated columns bug (https://chromium.googlesource.com/infra/infra/+/0ff6b6453b6192987bd9240c1e872a7de5fb1313).

[3] Commit disallowing double grid axes and Cc axis (https://chromium.googlesource.com/infra/infra/+/77ef00cb53d90c9d1f984eca434d828de5c167a5).

[4] Commit preventing request inflation through the groupby parameter (https://chromium.googlesource.com/infra/infra/+/e27936ef82d33a5f286e1f2f22817aa682f79e90).

Facebook Twitter Google+ LinkedIn

The post XS-Searching Google’s bug tracker to find out vulnerable source code by Luan Herrera appeared first on Hakin9 - IT Security Magazine.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Facebook Twitter Google+ LinkedIn

Note to the reader: Python code is shared at the end

This week I had to scrape a website for a client. I realized I did it so naturally and quickly that it would be useful to share it so you can master this art too. [Disclaimer: this article shows my practices of scraping, if you have more relevant practices please share it in the comments]

The plan
  1. Pinpoint your target: a simple html website
  2. Design your scraping scheme in Python
  3. Run & let the magic operate
How much time do you need to scrape a website? A practitioner would take ~10 minutes to prepare the Python script for a simple html website Part I: Finding your target (a website)

In my case, I needed to gather the name of the Bank from SWIFT codes (or French BIC codes.) The website http://bank-code.net/country/FRANCE-%28FR%29.html has a list of 4000+ SWIFT codes with the associated bank names. The problem is they show only 15 results per page. Going through all the pages and copy paste 15 results at a time was NOT an option. Scraping came in handy for this task.

First, use Chrome “inspect” option to identify the part of html you need to get. Move your mouse on the different items in the inspection window (on the right), and track the part of website which is highlighted by the code (on the left). Once you’ve selected the item, in the inspection window, use “Copy / Copy element” and paste the html code in your python coding tool.

On the right side is the Google Chrome’s “inspection window” you get when using right click / Inspect

In my case, the desired item with 15 SWIFT codes is a “table”

Part II: Design your scraping scheme in Python a) Scrape a first page

And that’s it, 3 lines of code and Python has received the webpage. Now you need to parse the html properly and retrieve the desired item.

Remember the desired html:

It is a “table” element, with id “tableID”. The fact that it has an id attribute is great, because no other html elements on this webpage can have this id. Which means if I look for this id in the html, I cannot find anything else than the desired element. It saves time.

Let’s do that properly in Python

So now we have got the desired html element. But we still need to get the SWIFT codes inside the html, and then store it in Python. I chose to store it in a pandas.DataFrame object, but just a list of list can work out as well.

To do that, go back on Chrome inspection window, analyse the structure of the html tree, and notice until which element you have to go. In my case, the required data was in “tbody” element. Each bank and its SWIFT code were contained in a “tr” element and each “tr” element had multiple “td” elements. The “td” elements contained the data I was looking for.

The html tree can be described as follows: table, tbody, tr, td

I do it in one line with the following:

b) Prepare automation

Now that we have scraped the first webpage, we need to think of how to scrape new webpages we haven’t seen yet. My way of doing that is replicating human behavior: storing results from one page, then going to the next. Let’s focus now on going to the next webpage.

At the bottom of the page, there is a menu that allows you to go on a specific page of the swift code table. Let’s inspect the “next page” button in the inspector window.

The “>” symbol will lead us to the next page

This gives the following html element:

Now to get the url in Python is simple:

And we’re almost there.
So far we have:
– developed the scraping of the table of one page
– identified the url link of the next page

We only need to do a loop, and run the code. Two best practices I recommend following:

1. printing out when you land on a new webpage: to know at which stage of the process your code is (scraping codes can run for hours)

2. saving results regularly: to avoid losing all you scraped if there is an error

As long as I don’t know when to stop scraping, I loop with the idiomatic “while True:” syntax. I print out counter value at each step. And I save results in a csv file at each step as well. This can lose time actually, a better way would be to store data every 10 or 20 steps for instance. But I went for quick implementation.

The code goes like this:

Full code (only 26 lines) can be found here: https://github.com/FelixChop/MediumArticles/blob/master/Scraping_SWIFT_codes_Bank_names.py

Originally posted: https://towardsdatascience.com/a-short-practical-how-to-guide-to-scrape-data-from-a-website-using-python-888373227d4f 

Facebook Twitter Google+ LinkedIn

The post A short & practical HOW-TO guide to scrape data from a website using Python by Félix Revert appeared first on Hakin9 - IT Security Magazine.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Facebook Twitter Google+ LinkedIn

An API(Application Programming Interface) is a software that allows two applications to talk to each other. In this blog, we will be creating an API that allows clients to create and read articles just like Medium blog post. We will explore different ways to create a Django Rest Framework(DFR) API in a 3 part series starting with a plain APIView(PART 1) then using GenericAPIView(PART 2) and finally using ViewSets(PART 3).

The final source code of what we will be creating can be found on GitHub .
I will also be using pipenv for my development environment management i.e things like creating a virtual environment and installing packages.

So in the terminal, create a directory and give it any descriptive name; I will name mine MediumClone. mkdir MediumClone is the command to use for the above process. Next, cd into the project folder that you have created and install Django by typing

pipenv install django

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Facebook Twitter Google+ LinkedIn

An API(Application Programming Interface) is a software that allows two applications to talk to each other. In this blog, we will be creating an API that allows clients to create and read articles just like Medium blog post. We will explore different ways to create a Django Rest Framework(DFR) API in a 3 part series starting with a plain APIView(PART 1) then using GenericAPIView(PART 2) and finally using ViewSets(PART 3).

The final source code of what we will be creating can be found on GitHub .
I will also be using pipenv for my development environment management i.e things like creating a virtual environment and installing packages.

So in the terminal, create a directory and give it any descriptive name; I will name mine MediumClone. mkdir MediumClone is the command to use for the above process. Next, cd into the project folder that you have created and install Django by typing

pipenv install django

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Facebook Twitter Google+ LinkedIn

Reinforcement is a class of machine learning where an agent learns how to behave in the environment by performing actions and thereby drawing intuitions and seeing the results. In this article, you’ll learn to understand and design a reinforcement learning problem and solve in Python.

Recently we’ve been seeing computers playing games against humans, either as bots in multiplayer games or as opponents in one-on-one games like Dota2, PUB-G, Mario. Deepmind(a research company) made history when the news that their AlphaGo program defeated the South Korean Go world champion in 2016. If you’re an intense gamer, probably you must have listened about Dota 2 OpenAI Five match, where machines played against humans and defeated world top Dota2 players in few matches (If you are interested about this, hereis the complete analysis of the algorithm and the game played by the machine).

The latest version of OpenAI Five taking Roshan.(src)

So here’s the central question, Why do we need reinforcement learning? Is it only used for games? Or can it be applied to real-world scenarios and problems? If you are learning about reinforcement learning for the first time, the answer to this question is beyond your imagination. It’s one of the widely used and fastest growing technologies in the fields of Artificial Intelligence.

Here are a few applications that motivate you to build reinforcement systems,

  1. Self Driving Cars
  2. Gaming
  3. Robotics
  4. Recommendation Systems
  5. Advertising and Marketing
A Brief Review and Origins of Reinforcement Learning

So, where has this Reinforcement Learning come from when we have a good number of Machine Learning and Deep Learning techniques available at hand? “It’s invented by Rich Sutton and Andrew Barto, Rich’s Ph.D. thesis advisor.” It has taken its form in the 1980s but was archaic then. Later, Rich believed in its promising nature that it’ll eventually be recognized.

Reinforcement Learning supports automation by learning from the environment it is present in, so does Machine Learning and Deep Learning, not the same strategy, but both support automation. So, why Reinforcement Learning?

It’s very much like the natural learning process wherein, the process/the model would be receiving feedback as to whether it has performed well or not. Deep Learning and Machine Learning, are learning processes as well, but which are most focussed on finding patterns in the existing data. Reinforcement Learning, on the other hand, does this learning by trial and error method, and eventually, gets to the right actions or the global optimum. The significant additional advantage of Reinforcement Learning is that we need not provide the whole training data as in Supervised Learning. Instead, a few chunks would suffice.

Understanding Reinforcement Learning

Imagine you are teaching your cats new tricks, but unfortunately, cats don’t understand our language so we can’t tell them what we want to do with them. Instead, emulate a situation, and your cat tries to respond in many different ways. If the cat’s response is the desired one, we reward them with milk. Now guess what, the next time the cat is exposed to the same situation, the cat executes a similar action with even more enthusiasm in expectation of more food. So this is learning from positive responses, if they are treated with negative responses such as angry faces, they don’t tend to learn from them.

Similarly, this is how Reinforcement Learning works, we give the machines a few inputs and actions, and then, reward them based on the output. Reward maximisation will be our end goal. Now let’s see how we interpret the same problem above as a Reinforcement Learning problem.

  • The cat will be the “agent” that is exposed to the “environment”.
  • The environment is a house/play-area depending on what you teach to it.
  • The situations encounter is called as the “state” which is analogous for example, your cat crawling under the bed or running. These can be interpreted as states.
  • The agents react by performing actions to change from one “state” to another.
  • After the change in states, we give the agent either a “reward” or a “penalty” depending on the action that is performed.
  • The “policy” is the strategy of choosing an action for finding better outcomes.

Now that we have understood what Reinforcement Learning is, let’s deep dive into the origins and evolution of Reinforcement Learning and Deep Reinforcement Learning in the below section and, how it can solve the problems that Supervised or Unsupervised Learning can’t do and here’s the fun fact, Google search engine is optimised using Reinforcement Algorithms.

Getting familiar with Reinforcement Learning Terminology

Agent and the Environment play the essential role in the reinforcement learning algorithm. The environment is the world that agent survives in. The agent also perceives a reward signal from the environment, a number that tells it how good or bad the current world state is. The goal of the agent is to maximize its cumulative reward, called return. Before we write our first reinforcement learning algorithms, we need to understand the following “Terminology”.

  1. States: The state is a complete description of the world, they don’t hide any pieces of information that is present in the world. It can be a position, a constant or a dynamic. We mostly record these states in arrays, matrices or higher order tensors.
  2. Action: Action is usually based on the environment, different environments lead to different actions based on the agent. Set of valid actions for an agent are recorded in a space called an action space. These are usually finite in number.
  3. Environment: This is the place where the agent lives and interacts with. For different types of environments, we use different rewards, policies, etc.
  4. Reward and Return: The reward function R is the one which must be kept tracked all-time in reinforcement learning. It plays a vital role in tuning, optimizing the algorithm and stop training the algorithm. It depends on the current state of the world, the action just taken, and the next state of the world.
  5. Policies: Policy is a rule used by an agent for choosing the next action, these are also called as agents brains.

Now that we have seen all the reinforcement terminology, now let’s solve a problem using reinforcement algorithms. Before that, we need to understand how we design an issue and assign this reinforcement learning terminology when solving the problem.

Solving the Taxi Problem

Now that we have seen all the reinforcement terminology, now let’s solve a problem using reinforcement algorithms. Before that, we need to understand how we design a problem and assign this reinforcement learning terminology when solving the problem.

Let’s say we have a training area for our taxi where we are teaching it to transport people in a parking lot to four different locations (R,G,Y,B) . Before that, we need to understand and set up the environment for which python comes into action. If you are doing python from scratch, I would recommend this article.

You can setup up the Taxi-Problem environment using OpenAi’s Gym, which is one of the most used libraries for solving reinforcement problems. Alright, before using it we need to install the gym on your machine, to do that, you can use python package installer also called as the pip. Below is the command to install.

pip install gym

Now let’s see how our environment is going to render, all the models and interface for this problem is already configured in the gym and named under Taxi-V2. To render this environment below is the code snippet.

“There are 4 locations (labelled by different letters), and our job is to pick up the passenger at one location and drop him off at another. We receive +20 points for a successful drop-off and lose 1 point for every time-step it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions.” (Source: https://gym.openai.com/envs/Taxi-v2/ )

This will be the rendered output on your console:

Taxi V2 ENV

Perfect, env is the core of OpenAi Gym, which is the unified environment interface. The following are the env methods that would be quite helpful to us:

env.reset: Resets the environment and returns a random initial state.
env.step(action): Step the environment by one timestep.

env.step(action) returns the following variables

  • observation: Observations of the environment.
  • reward: If your action was beneficial or not
  • done: Indicates if we have successfully picked up and dropped off a passenger, also called one episode
  • info: Additional info such as performance and latency for debugging purposes
  • env.render: Renders one frame of the environment (helpful in visualizing the environment)

Now that we have seen the environment, let’s understand the problem more deeply, the taxi is the only car in this parking lot. We can break up the parking lot into a 5x5 grid, which gives us 25 possible taxi locations. These 25 locations are one part of our state space. Notice the current location state of our taxi is coordinate (3, 1).

In the environment, there are four possible locations where you can drop the passengers in the taxi which are: R, G, Y, B or [(0,0), (0,4), (4,0), (4,3)]in (row, col) coordinates if you can interpret the above-rendered environment as a coordinate axis.

When we also account for one (1) additional passenger state of being inside the taxi, we can take all combinations of passenger locations and destination locations to come to a total number of states for our taxi environment; there are four (4) destinations and five (4 + 1) passenger locations. So, our taxi environment has 5×5×5×4=500 total possible states. The agent encounters one of the 500 states, and it takes action. The action in our case can be to move in a direction or decide to pick up/drop off a passenger.

In other words, we have six possible actions: pickup, drop, north, east, south, west(These four directions are the moves by which the taxi is moved.)

This is the action space: the set of all the actions that our agent can take in a given state.

You’ll notice in the illustration above, that the taxi cannot perform certain actions in certain states due to walls. In the environment’s code, we will simply provide a -1 penalty for every wall hit and the taxi won’t move anywhere. This will just rack up penalties causing the taxi to consider going around the wall.

Reward Table: When the Taxi environment is created, there is an initial Reward table that’s also created, called P. We can think of it like a matrix that has the number of states as rows and number of actions as columns, i.e. states × actions matrix.

Since every state is in this matrix, we can see the default reward values assigned to our illustration’s state:

>>> import gym
>>> env = gym.make("Taxi-v2").env
>>> env.P[328]
{0: [(1.0, 433, -1, False)], 
 1: [(1.0, 233, -1, False)],
 2: [(1.0, 353, -1, False)],
 3: [(1.0, 333, -1, False)],
 4: [(1.0, 333, -10, False)],
 5: [(1.0, 333, -10, False)]

This dictionary has a structure {action: [(probability, nextstate, reward, done)]}.

  • The 0–5 corresponds to the actions (south, north, east, west, pickup, dropoff) the taxi can perform at our current state in the illustration.
  • done is used to tell us when we have successfully dropped off a passenger in the right location.

To solve the problem without any reinforcement learning, we can set the goal state, choose some sample spaces and then if it reaches the goal state with a number of iterations we assume it’s the maximum reward, else the reward is increased if it’s near to goal state and penalty is raised if reward for the step is -10 which is minimum.

Now let’s code this problem without reinforcement learning.

Since we have our P table for default rewards in each state, we can try to have our taxi navigate just using that.

We’ll create an infinite loop which runs until one passenger reaches one destination (one episode), or in other words, when the received reward is 20. The env.action_space.sample()method automatically selects one random action from set of all possible actions.

Let’s see what happens:


credits: OpenAI

Our problem is solved but isn’t optimized or this algorithm doesn’t work all the time, we need to have a proper interacting agent so that the number of iterations that the machine/algorithm takes is very less. Here comes the Q-Learning algorithm let’s see how it is implemented in the next section.

Introduction to Q-Learning

This algorithm is most used and basic reinforcement algorithm, this uses the environment rewards to learn over time, the best action to take in a given state. In the above implementation, we have our reward table “P” from where the agent will learn from. Using the reward table it chooses the next action if it’s beneficial or not and then they update a new value called Q-Value. This new table created is called the Q-Table and they map to a combination called (State, Action) combination. If the Q-values are better, we have more optimized rewards.

For example, if the taxi is faced with a state that includes a passenger at its current location, it is highly likely that the Q-value for pickup is higher when compared to other actions, like dropoff or north.

Q-values are initialized to an arbitrary value, and as the agent exposes itself to the environment and receives different rewards by executing different actions, the Q-values are updated using the equation:

Here comes a question, how to initialize this Q-Values and how to calculate them, for that we initialize the Q-values with arbitrary constants and then as the agent exposes to the environment it receives various rewards by executing different actions. Once the actions are executed, the Q-Values are executed by the equation.

Here Alpha and Gamma are the parameters for the Q-learning algorithm. Alpha is known as the learning rate and gamma as the discount factor both the values range between 0 and 1 and sometimes equal to one. Gamma can be zero while alpha cannot, as the loss should be updated with some learning rate. Alpha here represents the same which is used in supervised learning. Gamma determines how much importance we want to give to future rewards.

Below is the algorithm in brief,

  • Step 1: Initialize the Q-Table with all zeros and Q-Values to arbitrary constants.
  • Step 2: Let the agent react to the environment and explore the actions. For each change in state, select any one among all possible actions for the current state (S).
  • Step 3: Travel to the next state (S’) as a result of that action (a).
  • Step 4: For all possible actions from the state (S’) select the one with the highest Q-value.
  • Step 5: Update Q-table values using the equation.
  • State 6: Change the next state as the current state.
  • Step 7: If goal state is reached, then end and repeat the process.

Q-Learning in Python

Perfect, now all you’re values will be stored in the variable q_table .

That’s all you’re model is trained and the environment can now drop the passengers more accurately. There you go with this you can understand reinforcement learning and able to code new problem.

More Reinforcement Techniques:
  • MDPs and Bellman Equations
  • Dynamic Programming: Model-Based RL, Policy Iteration and Value Iteration
  • Deep Q Learning
  • Policy Gradient Methods

Code for this article can be found at

Thanks for reading. This article is authored by Vihar Kurama and Samhita Alla.

Stay tuned for more articles, also check cool articles written by Samhita Alla.

References: OpenAI, Playing Atari with Deep Reinforcement Learning, SkyMind, LearnDataSci.

Facebook Twitter Google+ LinkedIn

The post Reinforcement Learning with Python by Vihar Kurama appeared first on Hakin9 - IT Security Magazine.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Facebook Twitter Google+ LinkedIn

Press Release

Proxy service review site Proxyway.com assessed nine leading proxy providers in a first global proxy provider research report. The report finds that two providers clearly lead the market in terms of performance and customer support, while the largest provider currently lags behind in every metric apart from marketing efforts.

This month, Proxyway concluded dozens of tests on the leading proxy providers globally, testing their performance, speed and connection quality, as well as marketing efforts throughout 2018. The report marks the first comprehensive evaluation of the proxy industry to date. The review site’s report reveals the strengths and limitations of the test subject’s networks, and, most importantly, provides valuable insights about the key proxy service providers out there.

Surprisingly, the report found that market share leaders Luminati underperformed when it came to the quality of products and services on offer: its network’s success rate fell over 9–11% below that of their top competitors, as well as being relatively slow as to other providers.

Key findings show that Oxylabs and Geosurf were ahead of the pack and are the best performing proxy providers when it comes to enterprise-level network usage. Both networks sustained a stable 85 percent success rate regardless of the number of concurrent connections.

However, the marketing research section of the report revealed that the compelling work of Luminati’s marketing team allows it to gain a competitive advantage over others, and consequently claim a higher market share – gaining over two times more visitors than their closest competitor and more visitors than the rest of the industry combined.

What’s more, the customer service review essentially split the proxy providers in half, when it came to the number of support channels available, accepted payment methods, and their average response time. Notably, a rather new proxy provider – Smartproxy – provided the best customer service in 2018, followed closely by The Proxy Store, and Oxylabs.

Proxyway’s proxy market research report for 2018 turns a new page in the proxy industry by letting customers base their choices on facts and reliable data, as opposed to marketing and advertising material.

About Proxyway

Proxyway is a community blog started in 2018 by two technology geeks: Adam Dubois and Chris Becker. Every month, Proxyway tests and evaluates leading proxy providers in order to deliver unbiased and data-based service reviews. So far, 21 proxy services have been tested, including top players such as Luminati, Oxylabs, and Geosurf.

Proxyway aims to be the most reliable and unbiased source of information for market intelligence companies and average internet users alike.

If you wish to contact us about this market research or for any other business-related reasons, email us at info@proxyway.com.

Contact information


Facebook Twitter Google+ LinkedIn

The post Proxy providers under scrutiny in the first-of-its-kind market research report by Adam Dubois appeared first on Hakin9 - IT Security Magazine.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview