Skip to main content

Command Palette

Search for a command to run...

Reinventing Authentication for Dummies

HackTheBox Mumbai - May Meetup

Updated
33 min read
Reinventing Authentication for Dummies

In the latest HTB Mumbai Meetup, we reinvented authentication from the ground up.

The session was conducted by Adhokshaj Mishra, who guided us through the evolution of authentication by tackling the same real-world engineering problems that early system designers faced when authentication first became a necessity. By solving these problems step by step, we gained a much clearer understanding of how modern authentication mechanisms came into existence. This blog is the first in a series where we will explore: Authentication, RADIUS, Kereberoes, authorization, SAML, JWT, OAuth, and OIDC.

Today's topic is Authentication.

Many of us have heard statements such as:

"If you see Active Directory, run BloodHound."

But have we ever stopped to ask:

  • Why do we need BloodHound?

  • Why do we need Active Directory?

  • Why was this entire ecosystem created in the first place?

Most of the time, we use these technologies without questioning the problems they were designed to solve.

Now, it's time to reinvent them.

"Time to reinvent authentication. Again. And Again. And Again."
— Adhokshaj Mishra

Special thanks to Ayush Shukla for helping with the notes for this blog and Adhokshaj Mishra for delivering the session and inspiring this journey through the history and evolution of authentication.

Centralized Identity

Username-Password Authentication

  • Let's go back to 1980s where one institution only has one computer. Back then, computers were very expensive. Not everyone was allowed to use them.

  • Problem: So how do we ensure that other users can access the same computer?

  • Solution: By authenticating them using user and password

  • Back then, authentication was simple. The user was granted two things

    • Username (public) → Hum falane hai

    • Password (private) → Hum sach me falane hai

  • The state flow was

Locked
↓
Operator identity verified - Username and Password
↓
Unlocked
  • Authentication was simple. Life was good.

Manual Provisioning

  • As the office starts expanding it started buying more computers for the employees. Now, we have to manually create user in every machine which led to the below scenarios

  • User: "Main login kyu nahi kar paa raha?"

  • Admin: "Machine update nahi hui hogi."

machine1 → user exists
machine2 → user exists
machine3 → user exists
machine4 → forgot
  • Now somehow we have users created in every machine manually.

  • The office has 40 machines.

  • Problem: User changed his/her password. Now how do we manually update the password of the specific user in each and every machine?

update machine1
update machine2
...
update machine40
  • De-provisioning was more of a pain then updation of password in every machine.

  • Suppose an employee has been fired. But we forgot to remove the account!

  • Congratulations! You now have an ex-employee with valid access.

  • And the problems start becoming apparent. It became harder to manually update each and every machine in the below cases:

    • Provisioning (creating a new user i.e. for an employee who joined the company)

    • Deprovisioning (deleting a user i.e. an ex-employee who left the company)

  • So what should we do now?

Centralized Provisioning

  • Solution: Instead of manually updating each and every machine. Why don't we setup a provisioning server whose job is to simultaneously update the user on each and every machine connected to the internal network.

  • Our task is to push the configuration job on the provisioning server to update details of the user on each and every computer connected to the internal network.

  • So the flow will be like this:

Machines
|
|
Provisioning Server
  • Example:

    • User create hua? Push everywhere.

    • User delete hua? Delete everywhere.

  • Modern examples of Provisioning Server would be:

    • Ansible

    • Puppet

    • Chef

    • Salt

  • But this too has a problem!

Continuous Polling onto the Provisioning Server

  • Problem: The network is unreliable.

  • Example: Suppose we have 40 machines. We performed the de-provisioning via the Provisioning Server. But, out of 40 machines:

    • 36 machines were online - as they were connected to the internal network

    • 4 machines were offline - because the switch connecting them to the internal network got burnt

  • Now we have, 36 machines on which the user account got deleted and 4 machines on which the user still has access as it was not deleted.

36 ✓
4 ✗
  • Employee has been fired. But his credentials are still valid on 4 machines in the network.

  • Now the pain continues. But, we have make-shift solution

  • Solution: Every machine on the internal network should periodically poll (i.e. send requests to the provisioning server to ask for updates).

  • As soon as those 4 machines got online after the switch got fixed, they will ask the provisioning server for updates and will de-provision the user.

  • Problem: After how much time should the machine poll the provisioning server for updates?

  • Solution: Once a day

  • Example:

Machine: "Boss koi update hai?"
Provisioning Server: "Nope"
---- After 24 hours ----
Machine: "Boss koi update hai?"
Provisioning Server: "Nope"
---- After another 24 hours ----
Machine: "Boss koi update hai?"
Provisioning Server: "Yes"
--- Send updates to the Machine ---
Machine gets updated!
  • But, there is a catch!

  • Problem: Bandwidth is expensive! WAN links are slow. Continuous polling is wasteful as it burns up bandwidth faster.

  • Networking in 1980s-1990s is not equal to today's networking.

Reverse Authentication Trick

  • Solution: Only poll the provisioning server when the user authenticates.

  • Instead of Server → Machine do Machine → Server

  • The flow would be like this

--- User authenticates ---
Machine: Do I know this user?
- If yes, authenticate
- If no, fetch the latest record from the provisioning server
  • This created the below policy:

    • Only fetch records when the user authenticates and the records does not exist.
  • We have saved the bandwidth!

Centralized Identity System

  • As Infrastructure evolved, the office started setting up routers, switches, firewalls, etc. Now we have to manage authentication for all of these devices too.
Users
|
Switches
|
Routers
|
Services
  • Managing Authentication was not limited to application. It now became an infrastructure problem.

  • Lets take an example: Suppose we have purchased a router.

  • Problem: Routers are closed appliances. How do we include the router in our network?

  • They have user and password stored in the local cache. But once the credentials are set. It is pretty difficult to reset them.

  • Problem: We have to effectively factory reset the whole router every time the password is updated. This doesnt become a problem when we have 1 router. But what should we do when the quantity goes up to 20 routers?

  • Solution: Add an Authentication Server

  • Every time the user authenticates to the router, it sends the credentials to the authentication server. The authentication server verifies those credentials by looking it up its internal database. And then it tells the router to accept/deny the authentication request.

  • So the flow is

User
|
Router
|
Authentication Server
|
Router (Accepts/Deny User Authentication)
  • We have created a centralized identity system where identities (like users, devices, etc) can authenticate themselves to a centralized authentication server by sending their credentials to it.
  • But why does this matter?

  • During the late 1980s and early 1990s, Kevin Mitnick started his hacking journey. He taught his attacks to everyone.

  • Now, trusting endpoints blindly is a terrible idea. Trusting local cache is a terrible idea because the router/machine can be compromised.

  • Therefore, Identity has to be centralized!

Reinventing RADIUS

Designing Remote Authentication Dial-In User Service (RADIUS) Protocol

  • Now ISPs has entered the scene. Because, we do not trust public. As a result, we do not trust the public network.

  • ISP's will sell us:

    • Connectivity

    • Access

    • Bandwidth

  • ISP will also provide us an Internal Network of our own. Thats why we have routed the authentication of firewall through ISP.

  • In this case, let's become the ISP.

  • Problem: Without getting into our network we want the user to authenticate. But, the user cannot authenticate without getting into our network.

  • Solution: We add a Remote Access Server (RAS) on the ISP Network.

  • On one End of it we have the ISP network. And on another end of it we have the public network.

  • Features:

    • It is connected to both networks: Private (ISP Network) and Public.

    • RAS acts as gatekeeper effectively keeping the access of the ISP network away from the public.

  • So the flow effectively become like this

Public Network
|
Remote Access Server (RAS)
|
Private Network - maintained by the ISP
  • Problem: So how do we apply some sort of authentication on this RAS? Because we know the average guy has become Kevin Mitnick. So we have to stop them from entering into our Private Network.

  • Solution: Cache the credentials in Remote Access Server. If the user authenticates and verify the credentials using the cache. If the credentials are valid, let the user access the internal network. If it is not, block the user.

  • Problem: Even if we do caching on this Remote Access Server (RAS). And, if this got compromised. We are essentially cooked! So, how do we deal with this?

  • Solution: We added an Authentication Server (AS) which ensures authentication through Remote Access Server (RAS).

  • If a user logins to RAS, the RAS will ask AS to valid the credentials. If its valid, the user will get access to the Internal Network. If its not, the user gets denied.

  • Why did we setup Authentication Server (AS) inside the private network? Because:

    • We are trusting the side of Internal network which is under the ISP.

    • We are not trusting the public because anyone can become Kevin Mitnick.

  • Absolutely no caching in the RAS as we cannot trust it; because it is public facing asset.

  • So the design will become like this

Public Network
|
Remote Access Server (RAS)
|
Authentication Server (AS)
|
Private Network - maintained by the ISP
  • Problem: We do not want to consume too much bandwidth!

  • Remember, we are in the business of bandwidth and the bandwidth is premium!

  • Question: In network, where does the most overhead comes from? Especially during TLS Handshake.

  • Answer: Key Exchange is the overhead!

  • Solution: Because the network is trusted. And because the key exchange the overhead. We want a symmetric key and we don't want to deal with key exchange! Hence, reducing the bandwidth.

  • We kept the key as symmetric for both sides of RAS and AS

  • So the flow will be like this

RAS: This is username. This is password. Do we let him access the internal network?
AS: Answers in Yes/No
  • This protocol is called RADIUS.

  • Congratulations! We just invented RADIUS.

  • In real world

    • RAS (Remote Access Server) = the device receiving the user's login request. This could be a VPN server, Wi-Fi controller, NAS, BRAS/BNG, switch, etc.

    • AS (Authentication Server) = the server that verifies the credentials and responds with "Yes" or "No". The Authentication Server (AS) is what we call the RADIUS Server.

Authentication Protocols

  • We have solved one problem.
User
  |
  | ?
  v
RAS --------> RADIUS Server
  • The RAS knows how to ask the RADIUS Server whether a user is allowed.

  • Problem: But how does the user prove their identity to the RAS in the first place?

  • Solution: We need a protocol between the User and the RAS. This is where PPP enters the scene.

Point-to-Point Protocol (PPP)

  • PPP was designed to establish communication between two devices connected over a point-to-point link.

  • PPP RFC Reference

  • Examples:

    • Dial-up Internet

    • DSL Broadband (PPPoE)

    • VPN Tunnels

    • Serial Links

  • PPP provides:

    • Link establishment

    • Authentication

    • IP address assignment

    • Link termination

  • The flow becomes:

User
  |
  | PPP
  |
RAS
  |
  | RADIUS
  |
Authentication Server
  • Notice that:

    • PPP is User ↔ RAS

    • RADIUS is RAS ↔ Authentication Server

  • They solve different problems.

  • So, PPP needs authentication.

  • Now PPP asks: "How do I verify that the user is who they claim to be?"

  • Historically, PPP supported multiple authentication methods depending on the environment.

  • The common ones are:

    • PAP - Password Authentication Protocol

    • CHAP - Challenge Handshake Authentication Protocol

    • MS-CHAP - Microsoft Challenge Handshake Authentication Protocol

    • MS-CHAPv2 - Improved Microsoft variant

  • The simplest one was PAP.

Password Authentication Protocol (PAP)

  • PAP is extremely simple.
User ---> Username
User ---> Password
  • The RAS receives:
Username: logan
Password: password123
  • The RAS then forwards the request to the RADIUS Server.
User
  |
 PAP
  |
RAS
  |
RADIUS
  |
Authentication Server
  • In PAP, we send passwords in plain text. The password in plain text gets hashed and verified against the password hash stored on the RADIUS Server.

  • If the password hash matches, the user gets access to the Internal Network. If it does not, the access is denied.

  • The home router uses PAP when communicating with the ISP. That's why, credentials pass in plain text. Because, its PAP!

  • PAP RFC Reference:

  • So whats the problem here?

  • Problem: The password is effectively sent in clear text. Anyone intercepting the connection can obtain the credentials.

  • Kevin Mitnick says thank you!

Challenge Handshake Authentication Protocol (CHAP)

  • Solution: To solve the PAP problem, CHAP was introduced.

  • CHAP stands for: Challenge Handshake Authentication Protocol.

  • Instead of sending the password directly:

RAS ---> Sends Random Challenge
User ---> Sends Hash(Challenge + Password)
  • The password never crosses the network.

  • Example:

    RAS ---> 123456
    User ---> MD5(123456 + password)
    
  • The Authentication Server performs the same calculation.

  • In CHAP, we send password hashes to the RADIUS Server via Remote Access Server (RAS). The password is stored in clear text on the RADIUS Server. The RADIUS Server hashes the clear text password. Then, it compares the hashed password against the ones sent by the Remote Access Server (RAS).

  • If both hashes match: Access Granted

  • Otherwise: Access Denied

  • CHAP RFC Reference:

  • Now an attacker cannot simply sniff the password from the wire.

  • The whole industry uses PAP Authentication where password transmits in plain text.

  • Question: If we have CHAP option then why the hell do we use PAP?!!

  • Answer: In PAP, we send the password in plain text, but it is verified against the stored hash on the RADIUS server. In CHAP, we send a hash of the password instead. However, the RADIUS server already stores the user's password in plain text and uses it to generate its own hash for comparison with the one we sent. If the RADIUS server gets compromised, then we're cooked - all user passwords are compromised. So that's why we use PAP!

MS-CHAP / MS-CHAPv2

  • Microsoft introduced its own CHAP variants for Windows environments.

  • These were widely used in older VPN and dial-up systems.

Extensible Authentication Protocol (EAP)

  • Every few years we invent a new authentication mechanism.

    • PAP

    • CHAP

    • MS-CHAP

    • MS-CHAPv2

  • Tomorrow someone invents Ultra-CHAP-Pro-Max.

  • Problem: Do we keep modifying PPP every single time?

  • Solution: We create a framework instead of creating new protocols over and over again.

  • This framework is called: Extensible Authentication Protocol (EAP)

  • The keyword here is: Extensible

  • Meaning: "We can add new authentication methods without redesigning PPP."

  • Instead of PPP understanding hundreds of authentication methods directly, PPP only needs to understand EAP.

  • EAP then carries the actual authentication method.

  • Examples:

    • EAP-MD5

    • EAP-TLS

    • EAP-TTLS

    • PEAP

    • EAP-SIM

    • EAP-AKA

  • The flow becomes:

User
|
EAP
|
RAS
|
RADIUS
|
Authentication Server
  • Now the Remote Access Server (RAS) does not necessarily need to understand the internals of every authentication mechanism.

  • It simply transports EAP messages between the user and the Authentication Server.

  • This is why EAP became the foundation for:

    • Enterprise Wi-Fi

    • 802.1X

    • Network Access Control (NAC)

    • Modern VPN authentication

    • Certificate-based authentication

    • Multi-factor authentication

Authentication Messages

  • Problem: How does the Remote Access Server (RAS) communicates with the RADIUS Server? How does it ensure that authentication is successful?

  • Solution: When the Remote Access Server (RAS)/ Network Access Server (NAS) wants to verify a user, it communicates with the RADIUS Server using authentication packets.

Access-Request

  • The RAS sends:
Username: logan
Password: ********
Source IP: x.x.x.x
  • to the RADIUS Server.

  • This packet is called: Access-Request

  • Think of it as: "Hey RADIUS Server, this user wants access."

Access-Accept

  • The RADIUS Server validates the credentials.

  • If valid: Access-Accept is returned.

  • Think of it as: "Yes, let him in."

  • The packet may also contain authorization information:

    • VLAN assignment

    • Bandwidth profile

    • Session timeout

    • IP address

    • ACLs

Access-Reject

  • If credentials are invalid: Access-Reject is sent by the RADIUS Server.

  • The RAS denies access.

  • Think of it as: "Nope. Kick him out."

Access-Challenge

  • Sometimes the RADIUS Server needs more information.

  • Example:

    • OTP

    • MFA

    • Smart card challenge

    • Token code

  • Instead of immediately accepting or rejecting, the RADIUS Server sends: Access-Challenge

  • The RAS then asks the user for additional information.

  • Think of it as: "I need more proof."

  • So the flow will go like this

User
 |
RAS ---- Access-Request ---->
 |
<--- Access-Challenge -------
 |
Enter OTP
 |
RAS ---- Access-Request ---->
 |
<--- Access-Accept ----------

Authorization Messages

  • Authentication answers:

"Can the user enter?"

  • Accounting answers:

"What happened after they entered?"

  • ISPs particularly love accounting because bandwidth equals money.

Accounting-Start

  • Sent when the session begins.

  • Example:

User: wolfe
Time: 09:00
Session-ID: 12345
  • Think of it as: "User has logged in. Start Calculating!"

Accounting-Stop

  • Sent when the session ends.

  • Example:

User: wolfe
Time: 10:00
Bytes Sent: 500 MB
Bytes Received: 2 GB
  • Think of it as: "User disconnected. Stop Calculating!"

Interim-Update

  • Some sessions last for hours or days. Waiting until the end of the session is not ideal.

  • So periodically: Interim-Update is sent.

  • Example every 5 minutes:

Session-ID: 12345
Current Usage: 1.2 GB
Session Time: 35 minutes
  • Think of it as: "The user is still connected and here is the current usage."

  • The Remote Access Server (RAS) sends Interim-Update messages to the RADIUS Server throughout the session.

  • For ISPs, Interim-Update is commonly used to keep track of bandwidth consumption without waiting for the user to disconnect.

  • Example:

09:00 - Accounting-Start
09:05 - Interim-Update
09:10 - Interim-Update
09:15 - Interim-Update
...
17:00 - Accounting-Stop
  • Each Interim-Update may contain information such as:

    • Session Duration

    • Bytes Sent

    • Bytes Received

    • Current IP Address

    • Session Identifier

  • The RADIUS Server can use this information for:

    • Usage Tracking

    • Billing

    • Quota Enforcement

    • Auditing

    • Reporting

  • Interim-Update can also be used for time-based access control.

  • For example, suppose we operate a hacker lab and a student has purchased:

3 Hours of Access
  • Every Interim-Update tells the RADIUS Server how long the user has been connected.
Session Time: 1 hour
Session Time: 2 hours
Session Time: 3 hours
  • Once the allowed time has been consumed, the RADIUS Server can take action. It can either terminate access or throttle the bandwidth.

Change of Authorization (CoA)

  • Problem: What if we want to change the user's permissions after they have already connected?

  • Examples:

    • Upgrade the user's bandwidth from 100 Mbps to 1 Gbps

    • Move the user into a different VLAN

    • Apply a quarantine policy

    • Block Internet access

    • Grant additional privileges after MFA succeeds

  • Do we disconnect the user and force them to authenticate again?

  • That would be annoying.

  • Solution: RADIUS introduced: CoA - Change of Authorization

  • CoA allows the RADIUS Server to modify an active session without forcing the user to reconnect.

  • Think of it as: "The user is already connected. Let's change the rules."

  • The flow becomes

User
 |
RAS
 |
 |<---- CoA-Request ----
 |
RADIUS Server
  • Instead of waiting for the RAS to ask a question, the RADIUS Server initiates the change.

  • Example: Bandwidth Upgrade

  • User purchases: 100 Mbps Plan

  • The user authenticates.

Access-Request
Access-Accept
  • The RADIUS Server returns:
Bandwidth = 100 Mbps
  • Later the customer upgrades. Instead of disconnecting the session:
RADIUS Server
      |
      | CoA-Request
      v
RAS
  • The RAS immediately updates the session.
Bandwidth = 1 Gbps
  • Advantage: No reconnect required.

Disconnect Message (DM)

  • Sometimes changing permissions is not enough. We want the user gone immediately.

  • Problem: How do we immediately terminate the session?

  • Solution: RADIUS can send: Disconnect-Request

  • Think of it as: "Kick this user off right now."

  • Examples:

    • Suspicious activity detected

    • Account disabled

    • Subscription expired

    • Security incident

  • The RAS terminates the session immediately.

AAA

  • The beauty of RADIUS is that the RAS no longer needs to store user credentials.

  • Without RADIUS:

VPN Server #1
VPN Server #2
VPN Server #3

All store credentials
  • With RADIUS:
VPN Server #1
VPN Server #2
VPN Server #3
      |
      |
      v
 RADIUS Server
  • One central place for:

    • Authentication

    • Authorization

    • Accounting

  • Which is why RADIUS is often called an AAA protocol:

    • Authentication → Who are you?

    • Authorization → What can you access?

    • Accounting → What did you do?

RADIUS - From ISP's Perspective

  • From an ISP's point of view:
Access-Request
  • "Who is this customer?"
Access-Accept
  • "Allow 500 Mbps plan."
Accounting-Start
  • "Customer connected."
Interim-Update
  • "Customer has used 12 GB so far."
CoA-Request
  • "Upgrade customer to 1 Gbps immediately."
Disconnect-Request
  • "Terminate the customer's session."
Accounting-Stop
  • "Customer disconnected."

Reinventing Kerberos

Reinventing LAN

1980s - The Trusted Network Era

  • Initially, life was simple.

  • All the examples we discussed earlier assume that:

    • The network is trusted.

    • Everything is managed by us.

  • Think of a small office. You own:

    • The computers

    • The servers

    • The switches

    • The users

  • Everything belongs to you.

  • If a user wants to access a service:

User
  |
  v
Service
  • The service authenticates the user.

  • Problem: What if the network is not trusted? Suppose I buy office space in a building. The building already has an internal network managed by someone else. My systems are connected to that network because replacing the entire infrastructure is not practical.

  • Now I have a problem. Although:

    • My servers belong to me.

    • My applications belong to me.

    • My users belong to me.

  • The network over which they communicate does not. An attacker could:

    • Observe traffic

    • Capture packets

    • Replay requests

    • Pretend to be a user

    • Pretend to be a service

  • The network is internal. But internal does not mean trusted.

The Password Problem

  • Solution: Let's authenticate users. The flow will be like this:
User
   |
Password
   |
   v
Service
  • Whenever the user wants to access a service, the user will authenticate using their password.

  • When the service verifies the password, the user gets access.

  • Problem: What if someone is sniffing the network? For context, Kevin Mitnick has started his activities. He can be inside our internal network. If he is sniffing inside the network, the password is exposed.

User ---> Password ---> Network
  • An attacker captures the password. Game over!

Late 1980s / Early 1990s - LAN Manager (LM)

  • Solution: Instead of sending the password across the network:

    • The user enters a password.

    • The password is transformed into an LM Hash and stored by the system.

    • When authentication is required, the server sends a challenge.

    • The client uses the LM Hash to compute a response to that challenge.

    • The server performs the same calculation and verifies the result.

    • If the results match, access is granted.

LM Authentication Flow

  • Step 1: Client says: I want to authenticate.

  • Step 2: Server generates a random challenge.

Server
   |
Challenge
   |
   v
Client
  • Step 3: Client uses:
LM Hash
     +
Challenge
  • to generate a response. The response is sent back to the server.

  • Step 4: Server performs the same calculation.

  • If both values match: Access Granted

  • The password never crosses the network, but the challenge-response value does.

  • Congratulations, we have invented LM!

  • Microsoft introduced: LAN Manager (LM)

  • Goal: Never send the password directly.

  • This was a major improvement over plaintext authentication.

Reinventing NTLM

Weakness of LM

  • LM authentication suffered from several weaknesses:

    • Passwords were converted to uppercase.

    • Passwords were split into two 7-character chunks.

    • Weak DES-based cryptography was used.

  • As a result, attackers could often crack LM hashes with relative ease using offline password-cracking methods.

Capture LM Response
          ↓
Obtain LM Hash
          ↓
Offline Cracking
          ↓
Recover Password
  • In practice, an LM hash was so weak that obtaining the hash was often almost as valuable as obtaining the user's actual password.

  • We solved one problem and created another.

1993 - New Technology Lan Manager (NTLM)

  • Solution: Microsoft improved LM. This became:

    • New Technology Lan Manager (NTLM)
  • The idea remained simple:

Prove you know the password without sending the password.

NTLM Flow

  • Step 1: Client says: I want to authenticate.

  • Step 2: Server says: Prove it. Server generates a random challenge.

  • Step 3: Client takes:

    • NT Hash

    • Challenge

    • and generates a response. The response is sent back to the server.

  • Step 4: Server performs the same calculation.

  • If both results match: Access Granted.

  • Password never crossed the network.

Advantages of NTLM

  • NTLM improved the authentication process by:

    • Preserving case sensitivity.

    • Eliminating the 7-character chunk limitation.

    • Replaced the weak LM hash with the stronger NT hash.

    • Continuing to use challenge-response authentication so that passwords were not sent across the network.

  • The core idea remained: Prove you know the password without sending the password.

Note: NTLM Attacks and their remediations

The concept of challenge-response authentication extends far beyond NTLM and has influenced numerous authentication and cryptographic protocols. The fundamental idea is simple:

Prove knowledge of a secret without transmitting the secret itself.

This principle appears throughout modern security standards and cryptographic designs, including technologies based on algorithms such as MD5 (RFC 1321) and HMAC (RFC 2104).

Microsoft's LM and NTLM (NT LAN Manager) implemented this concept using proprietary challenge-response mechanisms. Unlike Kerberos and many other authentication protocols, LM and NTLM are Microsoft protocols rather than IETF-standardized RFC protocols.

Although NTLM significantly improved upon LM, attackers eventually developed techniques such as replay attacks and Pass-the-Hash (PtH), where a stolen NT hash could be used for authentication without knowing the actual password. To address several weaknesses in the original NTLM protocol, Microsoft introduced NTLMv2, which strengthened the challenge-response process and provided better protection against replay attacks.

However, NTLM still relied on password hashes as the underlying credential. As Windows environments grew larger and organizations demanded stronger security, better scalability, and mutual authentication, a more robust solution was needed.

Reinventing Kerberos

Mid 1990s - Organizations Grow

  • Everything looks good. Then the company grows. Now we have:

    • Hundreds of users

    • Thousands of computers

    • Hundreds of services

  • Think:

User -> File Server

User -> Database

User -> Email Server

User -> Web Server
  • Every service performs authentication.

  • Every service manages authentication.

  • Every service manages trust.

  • Problem: Authentication logic is now everywhere. Every service is solving the same problem. Again and Again and Again. This creates:

    • Complexity

    • Duplication

    • Administrative overhead

  • Authentication starts becoming messy.

  • Question: Can we centralize authentication? Instead of every service authenticating users independently?

Centralized Authentication

  • Solution: Create a dedicated Authentication Server. Whenever someone wants to prove their identity they will connect to this server:
User
   |
   v
Authentication Server
  • Now authentication exists in one place. Services no longer need to maintain separate authentication logic.
  • Problem: Now every service request requires authentication. Example:
User -> Authentication Server -> File Server

User -> Authentication Server -> Database

User -> Authentication Server -> Email Server
  • The Authentication Server becomes a bottleneck. As a result, network overhead increases.

  • Question: Can we authenticate once and reuse that proof later?

Tickets

  • Solution: Authenticate once. Generate a ticket. Use the ticket later.

  • The flow will be like this:

Authentication
      |
      v
    Ticket
      |
      v
Access Services
  • The ticket becomes proof that authentication already happened.

  • Problem: What stops me from creating my own ticket? Suppose I generate a ticket:

Rehan authenticated successfully.
  • and send it to the File Server. Why should the File Server trust me? What guarantees does the ticket hold for the File Server to trust it?

Case 1 - Ticket Forgery

  • Anyone can create ticket. Anyone can claim: I am authenticated.

  • How does the service know the ticket came from a trusted source?

  • Solution: The Authentication Server generates tickets using secret keys.

  • The ticket contains information such as:

    • User Identity

    • Timestamp

    • Expiry

    • Session Information

  • The ticket is protected using secret keys known only to trusted components.

  • Because attackers do not possess these keys:

    • They cannot generate valid tickets.

    • They cannot modify tickets.

    • They cannot forge authentication.

How Ticket Validation Works

  • Suppose a ticket is created for the File Server by the Authentication Server (AS).

  • The Authentication Server encrypts the ticket using a secret key associated with the File Server.

  • Later, the user presents the ticket to the File Server.

  • The File Server decrypts and validates the ticket using its own secret key.

  • Because the Authentication Server and the File Server are the only trusted components that possess the required cryptographic material:

    • Attackers cannot create valid tickets.

    • Attackers cannot modify ticket contents.

    • Attackers cannot impersonate the Authentication Server.

  • If validation succeeds:

    • The ticket is genuine.

    • The ticket was not modified.

    • The ticket originated from a trusted component.

  • If validation fails: Rejected

  • But there is a problem.

Case 2 - Replay Attack

  • Problem: Now the ticket is trusted. What if somebody steals the ticket?

  • Attacker captures a valid ticket.

  • The attacker simply reuses it.

  • Boom! Impersonation!

  • Solution: Make tickets time-bound. Tickets contain:

    • Timestamp

    • Expiry

  • Example: 10:00 AM → 10:10 AM

  • After expiration: The ticket is rejected.

  • Even if the ticket is stolen, it eventually becomes useless.

Splitting Responsibilities - Birth of Ticket Granting Server (TGS)

  • Problem: The Authentication Server is handling both the responsibilities of creating tickets and handling access to services. We don't want any bhasad on the Authentication Server.

  • The Authentication Server should only answer: Has this user authenticated?

  • That's it. It should not decide: Which services can this user access?

  • Otherwise it becomes overloaded.

  • Solution: Split responsibilities. We create another server called Ticket Granting Server (TGS) whose responsibility is to issue tickets to the user if they have access to the service called service tickets.

  • Now we have two components:

    • Authentication Server (AS) - Responsible for: Who are you?

    • Ticket Granting Server (TGS) - Responsible for: What are you allowed to access?

Key Distribution Center (KDC)

  • To organize everything, we have introduced: Key Distribution Center (KDC)

  • KDC contains:

Authentication Server (AS)

+

Ticket Granting Server (TGS)
  • So the flow goes like this:
Client
   |
   v
  KDC
   |
   v
Services

Step 1 - Authentication

  • The client proves its identity to: Authentication Server (AS).

  • The AS verifies credentials. If successful: The AS issues: Ticket Granting Ticket (TGT).

  • Think of TGT like a Temporary Passport.

  • The TGT grants access to nothing.

  • It only proves: This user has successfully authenticated.

  • Question: Why doesn't the TGT directly grant access?

  • Answer: Authentication and Authorization are different things.

    • Authentication answers: Who are you?

    • Authorization answers: What are you allowed to access?

  • The TGT only proves authentication.

Step 2 - Authorization

  • The user now wants access to a service. Examples:

    • File Server

    • Email Server

    • Database

  • The user presents the TGT to: Ticket Granting Server (TGS)

  • The TGS verifies:

    • Validity

    • Expiry

    • Authorization Rules

  • If everything is valid: The TGS issues: Service Ticket

  • The Service ticket is specific to the requested service.

Step 3 - Service Access

  • The user presents the Service Ticket to the target service.

  • The service validates the ticket.

  • If valid: Access Granted.

  • No password required.

  • We now have:

    • An Authentication Server

    • Tickets

    • Ticket Validation

    • Expiration

    • Authorization Separation

  • Congratulations! We have essentially invented: Kerberos!

  • MIT's Project Athena faced exactly this problem. Their question was:

"Can a user authenticate once and then reuse that trust to access multiple services?"

  • Instead of requiring the user to repeatedly prove their identity to every service, a different idea emerged:

Authenticate once, obtain a trusted ticket, and use that ticket to access other services.

  • This design became Kerberos

  • Kerberos Version 5 was originally standardized in:

    • RFC 1510 - Kerberos Network Authentication Service (V5) (historic, now obsolete)
  • Later, the specification was revised and updated by:

    • RFC 4120 - The Kerberos Network Authentication Service (V5)

    • RFC 4120 remains the primary Kerberos specification used today.

NTLM Fallback

  • Although Kerberos is the preferred authentication protocol in Active Directory environments, Windows can fall back to NTLM when Kerberos cannot be used.

  • Common situations include:

    • The target service is not Kerberos-enabled.

    • A Service Principal Name (SPN) is missing or incorrect.

    • The client cannot contact a Domain Controller/KDC.

    • Authentication occurs across unsupported trust boundaries.

  • In these cases, Windows automatically attempts NTLM authentication to maintain compatibility.

Kerberos first, NTLM as a fallback.

Note: Origin of Golden and Silver Ticket Attacks

  • The entire ticket system relies on one assumption:

Attackers do not possess the secret keys used to generate and validate tickets.

  • Everything works because trusted Kerberos components possess those keys.

  • But what if that's not the case?

Golden Ticket

  • If an attacker compromises the KRBTGT key, they can create their own TGTs.

  • Effectively:

The attacker can pretend that authentication already happened.

Silver Ticket

  • If an attacker compromises a service account key, they can create their own Service Tickets.

  • Effectively:

The attacker can pretend that authorization already happened for that specific service.

Why These Attacks Exist

  • Recall Ticket Forgery.

  • We trusted tickets because attackers were assumed not to possess the secret keys.

  • Golden and Silver Ticket attacks become possible when that assumption breaks.

LDAP

Where Are All These Identities Stored?

  • Problem: Now another problem appears.

  • Where do:

    • Users

    • Groups

    • Computers

    • Services

  • actually live?

  • We need a central repository.

Birth of Directory

  • Solution: A Directory.

  • For Example: A company phonebook.

  • The directory stores:

    • Users

    • Groups

    • Computers

    • Services

    • Policies

Lightweight Directory Access Protocol (LDAP)

  • Problem: How do we query the directory? How do we search for users? How do we find services? How do we modify entries?

  • Solution: Lightweight Directory Access Protocol (LDAP).

  • LDAP is the protocol used to interact with the directory.

  • For Example:

    • Directory = Library

    • LDAP = Librarian

  • LDAP allows systems to:

    • Search

    • Read

    • Add

    • Modify

    • Delete

  • directory entries.

  • LDAP is not authentication. LDAP is simply how we interact with the directory.

  • Hence, its called Directory Access Protocol with the key term Directory Access in it

  • LDAP is defined through a family of RFCs:

    • RFC 1777 – LDAP v2 (historical, obsolete)

    • RFC 4510–4519 – LDAP v3 specifications and related standards

    • RFC 4511 – Defines the core LDAP protocol and is the primary LDAP specification used today.

Active Directory

  • Microsoft eventually combined:

    • Kerberos

    • LDAP

    • DNS

    • Group Policies

  • into a single ecosystem.

  • That ecosystem became: Active Directory

Domain Controller (DC)

  • To deliver these services, Microsoft packaged the core Active Directory components into a server role called a Domain Controller (DC).

  • A Domain Controller typically hosts:

    • Kerberos (KDC)

    • LDAP directory services

    • Active Directory database

    • DNS services

  • In practice:

A Domain Controller is Microsoft's implementation of the identity and authentication infrastructure required by Active Directory.

  • At a high level:

Domain Joined = Kerberos Authentication Available

  • When a machine joins the domain, it establishes trust with Active Directory and can obtain Kerberos tickets from the KDC hosted on a Domain Controller.

Conclusion

So far, we have invented RADIUS, and Kerberos while discovering LM, NTLM, LDAP, and AD in the process. In the next lecture, we will reinvent Security Markup Language (SAML) while discovering their problems, and caveats in the process.

Notes

Part 1 of 5

A collection of community-contributed notes from local cybersecurity meetups. Anyone can share their notes, helping us all learn and grow together.

Up next

May Highlights

BreachForce Meetup May - Security Automation and Malware Research