Frequently asked
on Kaizen Secure Voiz

We have been facing a lot of common queries from prospective users / customers like what happens if a customer catch cold or someone records the voice for a playback attack or how about mimicry etc. We have documented these as FAQ document, which can be downloaded and viewed at KSV-FAQ.pdf

Kaizen Secure Voiz is a new organization with unique IPRs. The Kaizen Secure has references in SRI, USA, Lebara UK. We are striving towards achieving top references in govt, bank, defense, NGO trusts and other sectors. We will share names as and when pilot implementation starts

The percentage of validation attempts in a biometric system that are either Falsely Accepted or Falsely Rejected where the probability of acceptance and rejection are equal. EER is a trade off or a ratio between FAR and FRR. The false acceptance rate is allowing unrecognized/unwanted customers to log in and False rejection rate is rejecting previously validated users or preventing valid customers from entering the system. Customers can choose to fix their ratio in favor of any one of them. It Kaizen team will configure the system to suit customer needs and also monitor this

Each user will have to determine and choose the ratio suiting the risk profile. FAR cannot be zero and we are testing a high vector algorithm to make it 0.01% failure rate. Every customer needs to understand the Voice Authentication application to their industry needs

Cold, nasal congestion may affect 2 parameters out of total 12 unique distinctions of the human voice. The scoring system will allow the user to log-in, even if the 2 parameters are not acceptable, for that specific day. This setting is done by implementation team and can be set No initial investments on scanners/readers as it is voice/telephone based high or low, based on customer preferences.

Yes, it is scientifically possible. Our product roadmap has a high vector algorithm in the beta test stage now. The acceptance rate is closer to zero.

The system can be configured for the desired number of attempts. E.g. A validated user, who gets rejected, can
Be allowed 2 more attempts or re-directed to the OTP pin generation or to speak to a customer service desk. It can be configured to give 3-5 attempts for re-authentication.

Yes, it can be used as a privilege or differentiator for banks or other organizations. Allow high net worth account holders or high value customers to use this tool.

Banks: savings on call center costs, bandwidth, manpower costs and cost of a data breach. GDPR guidelines or banking regulatory authority guidelines can be better met through multi factor authentication.

Telecom: retain high ARPU customers, save on contact centers costs, ensure low ARPU clients use only IVRS option and not cut into call center time / AHT, GDPR compliance

Government treasury: avoid invalidated payouts to pensioners, facilitate better compliance to proof of life process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof

Government treasury: avoid invalidated payouts to pensioners, facilitate better compliance to proof of life
Process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof

Not for profit /NGO: control fund flow from trust /donor to field implementation agencies, track beneficiaries every month, track them as alumni and map them against expected impact assessment, ensure money spent on social dev projects are validated in real time

Manufacturing/logistics/others: real time attendance mapping of field staff with Geo-tagging, ensure scheduled shifts don’t slip up, the customer’s customer is happy and retains the contracted business, save manpower costs in manual monitoring of staff, cut costs and align field attendance with time stamp to actual salary calculations

First time enrollment: this is a one timed exercise for any user/customer and will not take more than 30 seconds of speaking into the phone. They can speak in any language/any text. This enrollment creates customer voiceprint.

Successive authentication: every time an enrolled customer calls into the specified IVRS/ contact center number, the biometrics engine needs only 7 seconds to authenticate against a database of voiceprints.

No, they can’t. The human voice has 12 unique parameters like pitch, tone, breathlessness, concatenation, etc./ Mimicry may match 2-3 parameters of another person’s voice and hence it is rejected during authentication. Prerecorded voice will have different waveforms and is easily identified as a false attempt, leading to rejection

No, the database captures and retains only the parameters of a voice in the voiceprint. There is no recorded voice or audio file in the biometrics engine, that can be hacked. Hence no-one can copy/re-create any audio file.

No, it won’t. Each biometric voiceprint is average 30KB file size only and does not need extensive storage space, as there are no audio files stored. We will make a sizing based on number of users, concurrency, growth etc and recommend storage capacity in server or SAN/ NAS disks

Noisy work environment will lead to high rate of false rejection. Customers must be in a less noisy environment, use a mobile phone in normal mode and not use the loud speaker mode. Authentication depends on clean reception of voice by the biometric engine

Yes, you can do that. This is independent of language and text.

Plain vanilla installation and configuration of voice biometrics application will take less than 3 weeks. Integration with other application /CTI/IVRS will be based on our effort estimation. It will vary. Web services based APIs can be used to shorten this integration.

Yes, this will be made available through Azure, AWS and other service providers. Hosted/dedicated instance on cloud or SaaS options available. Customers can pay USD/INR per user per month for a complete product suite.

This is being made available. Test reports can be shared with registered partners and customers under NDA

Yes, we are seeking a Microsoft technology center Bangalore as one such POC test bed. Customers can walk-in for an immersion experience of product and surround technologies like database, datacenter automation, IVRS, networking etc.

  1. Number of customers/user based blocks, shall be the base for a product license fee
  2. One-time setup cost and implementation costs
  3. AMC for product license at 15%

Geo-tagging is an app that be installed on the user mobile phone. This picks up the latitude-longitude with
timestamp, while the voice authentication is done. Instead of a fixed line, this app can serve as a proof of fixed lat-long coordinates, via a mobile phone based voice authentication.

Yes, it is platform agnostic. It will work on all server O/S

You mean, the customer wants other biometrics? No worries, we can do that. Fingerprinting, facial recognition is two more options that can be provided directly by KSV. We can configure other 3rd party solutions if need be, to complement Voice Authentication and provide a comprehensive solution to end customer.

Done in a jiffy. Kaizen has published APIs for common web services and data call out from other apps. Partners/ customer IT teams can do it themselves or our delivery team can do it for you. Easily available.

  • Relatively stable characteristics
    • Vocal tract length
    • Vocal tract shape
    • Vocal cord length (pitch)
    • Gender (Breathiness)
    • Nasal cavity size and shabe
    • Speaking rate and prosody
    • Language, dialect and accent
  • Transient characteristics
    • Health
    • Emotional state
    • Environment

Gaussian mixture models are a probabilistic model for representing normally distributed subpopulations within an overall population. Mixture models in general don’t require knowing which subpopulation a data point belongs to, allowing the model to learn the subpopulations automatically. Since subpopulation assignment is not known, this constitutes a form of unsupervised learning.
For example, in modeling human height data, height is typically modeled as a normal distribution for each gender with a mean of approximately 5’10” for males and 5’5” for females. Given only the height data and not the gender assignments for each data point, the distribution of all heights would follow the sum of two scaled (different variance) and shifted (different mean) normal distributions. A model making this assumption is an example of a Gaussian mixture model (GMM), though in general a GMM may have more than two components. Estimating the parameters of the individual normal distribution components is a canonical problem in modeling data with GMMs.
GMMs have been used for feature extraction from speech data, and have also been used extensively in object tracking of multiple objects, where the number of mixture components and their means predict object
Locations at each frame in a video sequence.

You want to deliver the project fully? Yes, that can be organized. Partners can focus on selling the product initially in chosen territory. If they can invest in resources, Kaizen team will train them on best practices and involve in one project implementation, working alongside the KSV implementation team. Learn, get certified and get independent on implementation

Kaizen secure voiz has published web services for integration touch points with other 3rd party apps like CRM, IVRS, ERP etc. During the deployment, voice stream [min 9 Sec string] of users is routed from the IVRS port to the cache of CTI, later to the biometric server. This is an established best practice. Media gateway through the SIP protocol manages all other streams [VOIP] and sends it to the voice biometric server

National Institute of Standards and Technology (NIST) Special Publication 800-63-2 discusses various forms of two-factor authentication and provides guidance on using them in business processes requiring different levels of assurance. We adhere to all the norms and comply with
We adhere to all the norms and comply with NIST standards. Customers can seek 3rd party validation of the same
NIST standards. Customers can seek 3rd party validation of the same

  1. Active authentication is where the customer is aware of the process, voluntarily participates in the authentication and provides time for the same. This is before he is connected to a live agent/contact center axis
  2. Passive authentication is where the customer has contacted service help desk, is describing about his/her problem and the biometric server runs an authentication check in the background, as he speaks. The help desk may encourage the customer to speak for more than 10 secs, to allow for correct authentication/recognition against voiceprints. This passive system needs more ports, E1/PRI lines on the CTI to allow for real time processing. This is resource hungry

Seriously? We feel what’s the hurry? Humor apart, the alternative landscape by using customer authentication by live help desk agent, needs nearly 90 seconds. The first 40 secs are spent on choosing the IVRS options or problem description, later 40-50 secs on quoting the relevant PIN/date of birth/answers to secret questions etc. This is in a scenario where a customer calls into a contact center and seeks help through contact center exec. Instead, if we deploy the Voice Authentication, the authentication is completed in 10 secs and allows for faster resolution. This increases customer satisfaction rates and reduces AHT time/contact center billing costs

These are very well established biometrics options. They offer good results, but they need specific readers/ scanners at all access points. Most of the access equipment wears out in a few months, to reduce the effectiveness score. Dust, moisture, wear-tear, low maintenance affects the results of a scan. Voice biometrics will work through any mobile/landline and does not need any specific equipment. It is highly aware of “liveness” [actual human participating in the authentication], less intrusive, inexpensive and scores very high on EER [equal error rate]. Go for it.

Off course, the system results vary due to noise/network disturbances. Too much noise is the only enemy. Users must be in less noisy environment to allow successful enrollment and authentication of human voice. We recommend using other factors in addition to voiceprints, if it is a noisy work environment

Open standards have been deployed in Kaizen Voice Authentication, through Java, Python and works with most of the popular databases like DB2, MS SQL, Oracle etc.