本文共 2298 字,大约阅读时间需要 7 分钟。
流程
一、虚拟机中共享本地目录,见前文:《》
二、python安装或相关问题见《》
三、当然,spark是必须的,见《》(用到hadoop,见《》)
四、remote端安装、设置
vi /etc/profile
添加一行:PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip source /etc/profile# 安装pip 和 py4j
下载pip-7.1.2.tar
tar -xvf pip-7.1.2.tar cd pip-7.1.2 python setup.py install pip install py4j# 避免ssh时tty检测
cd /etc
chmod 640 sudoers vi /etc/sudoers #Default requiretty五、本地Pycharm设置
Settings > Project Interpreter:
Project Interpreter > Add remote(前提:remote端python安装成功):
注意,如果python安装在其它路径,要把路径改过来,如:
Run > Edit Configuration (前提:虚拟机中共享本地目录成功):
六、测试
import osimport sysos.environ['SPARK_HOME'] = '/root/spark-1.4.0-bin-hadoop2.6'sys.path.append("/root/spark-1.4.0-bin-hadoop2.6/python")try: from pyspark import SparkContext from pyspark import SparkConf print ("Successfully imported Spark Modules")except ImportError as e: print ("Can not import Spark Modules", e) sys.exit(1)
Result:
ssh://root@192.168.22.250:22/usr/bin/python -u /mnt/shared/test01/test01a.pySuccessfully imported Spark ModulesProcess finished with exit code 0
来个复杂些的:
import syssys.path.append("/root/programs/spark-1.4.0-bin-hadoop2.6/python")try: import numpy as np import scipy.sparse as sps from pyspark.mllib.linalg import Vectors dv1 = np.array([1.0, 0.0, 3.0]) dv2 = [1.0, 0.0, 3.0] sv1 = Vectors.sparse(3, [0, 2], [1.0, 3.0]) sv2 = sps.csc_matrix((np.array([1.0, 3.0]), np.array([0, 2]), np.array([0, 2])), shape=(3, 1)) print(sv2)except ImportError as e: print("Can not import Spark Modules", e) sys.exit(1)
Result
ssh://root@192.168.22.250:22/root/programs/python3/bin/python -u /mnt/shared/test01/test01a.py (0, 0) 1.0 (2, 0) 3.0Process finished with exit code 0
Q&A
Q: sudo: sorry, you must have a tty to run sudo
A: cd /etc chmod 640 sudoers vi /etc/sudoers #Default requiretty #注释掉 Default requiretty 一行。意思就是sudo默认需要tty终端,注释掉就可以在后台执行了。 Q: VirtualBox的Shared folder功能出现broken shared folder错误 A: 见上文中提到的虚拟机中共享本地目录 Q: 一会儿什么cannot import name accumulators, 一会儿什么cannot import name py4j A: 下载pip-7.1.2.tar tar -xvf pip-7.1.2.tar cd pip-7.1.2 python setup.py install pip install py4j 搞定! 参考 https://edumine.wordpress.com/2015/08/14/pyspark-in-pycharm/ http://renien.github.io/blog/accessing-pyspark-pycharm/ http://www.tuicool.com/articles/MJnYJb等等。。。