The document is correct as authored.
The --request is the minimum scheduling guarantee for the compute resource. The --limit is the maximum amount of compute resource on the local node before being throttled (it is not a guarantee).
If a limit is specified, but the request is not, the request will default to the limit for that compute resource - I suspect this is what you are encountering.
The ratio of limit/request defines the amount any single container may burst for the specified compute resource. If the limit is unspecified, you actually may burst up the available node capacity.
If you do this with memory, you are subject to inducing OOM and the OOMKiller will target offenders based on the QoS guarantee. For CPU, time is allotted based on the request and is throttled at the limit. In the future, we will look to more proactively evict containers before inducing OOM and depending on the OS OOMKiller.
In your example 8 CPU node scenario, if each container --requests=cpu=100m and --limits=cpu=500m, the scheduler will schedule up to 80 containers before CPU is exhausted.
Each container is guaranteed to get a min of 100m CPU, but absent contention, containers may burst up to 500m.
Hope this helps,