object LinearDataGenerator
Generate sample data used for Linear Data. This class generates
uniformly random values for every feature and adds Gaussian noise with mean eps
to the
response variable Y
.
- Annotations
- @Since("0.8.0")
- Source
- LinearDataGenerator.scala
- Alphabetic
- By Inheritance
- LinearDataGenerator
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def generateLinearInput(intercept: Double, weights: Array[Double], xMean: Array[Double], xVariance: Array[Double], nPoints: Int, seed: Int, eps: Double, sparsity: Double): Seq[LabeledPoint]
- intercept
Data intercept
- weights
Weights to be applied.
- xMean
the mean of the generated features. Lots of time, if the features are not properly standardized, the algorithm with poor implementation will have difficulty to converge.
- xVariance
the variance of the generated features.
- nPoints
Number of points in sample.
- seed
Random seed
- eps
Epsilon scaling factor.
- sparsity
The ratio of zero elements. If it is 0.0, LabeledPoints with DenseVector is returned.
- returns
Seq of input.
- Annotations
- @Since("1.6.0")
- def generateLinearInput(intercept: Double, weights: Array[Double], xMean: Array[Double], xVariance: Array[Double], nPoints: Int, seed: Int, eps: Double): Seq[LabeledPoint]
- intercept
Data intercept
- weights
Weights to be applied.
- xMean
the mean of the generated features. Lots of time, if the features are not properly standardized, the algorithm with poor implementation will have difficulty to converge.
- xVariance
the variance of the generated features.
- nPoints
Number of points in sample.
- seed
Random seed
- eps
Epsilon scaling factor.
- returns
Seq of input.
- Annotations
- @Since("0.8.0")
- def generateLinearInput(intercept: Double, weights: Array[Double], nPoints: Int, seed: Int, eps: Double = 0.1): Seq[LabeledPoint]
For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)2 / 12 which will be (1.0/3.0)
For compatibility, the generated data without specifying the mean and variance will have zero mean and variance of (1.0/3.0) since the original output range is [-1, 1] with uniform distribution, and the variance of uniform distribution is (b - a)2 / 12 which will be (1.0/3.0)
- intercept
Data intercept
- weights
Weights to be applied.
- nPoints
Number of points in sample.
- seed
Random seed
- eps
Epsilon scaling factor.
- returns
Seq of input.
- Annotations
- @Since("0.8.0")
- def generateLinearInputAsList(intercept: Double, weights: Array[Double], nPoints: Int, seed: Int, eps: Double): List[LabeledPoint]
Return a Java List of synthetic data randomly generated according to a multi collinear model.
Return a Java List of synthetic data randomly generated according to a multi collinear model.
- intercept
Data intercept
- weights
Weights to be applied.
- nPoints
Number of points in sample.
- seed
Random seed
- returns
Java List of input.
- Annotations
- @Since("0.8.0")
- def generateLinearRDD(sc: SparkContext, nexamples: Int, nfeatures: Int, eps: Double, nparts: Int = 2, intercept: Double = 0.0): RDD[LabeledPoint]
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and unregularized variants.
Generate an RDD containing sample data for Linear Regression models - including Ridge, Lasso, and unregularized variants.
- sc
SparkContext to be used for generating the RDD.
- nexamples
Number of examples that will be contained in the RDD.
- nfeatures
Number of features to generate for each example.
- eps
Epsilon factor by which examples are scaled.
- nparts
Number of partitions in the RDD. Default value is 2.
- returns
RDD of LabeledPoint containing sample data.
- Annotations
- @Since("0.8.0")
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def main(args: Array[String]): Unit
- Annotations
- @Since("0.8.0")
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)